Catalog of best practice rules for performance
All recommendations, defects and remarks of this open catalog are automatically detected and reported by Codee, which also assists the programmer providing source code rewriting capabilities.
Take a look at the simple code snippets available in GitHub โบ
Recommendations
PWR001: Declare global variables as function parameters
PWR002: Declare scalar variables in the smallest possible scope
PWR003: Explicitly declare pure functions
PWR004: Declare OpenMP scoping for all variables
PWR005: Disable default OpenMP scoping
PWR006: Avoid privatization of read-only variables
PWR007: Disable implicit declaration of variables
PWR008: Declare the intent for each procedure parameter
PWR009: Use OpenMP teams to offload work to GPU
PWR010: Avoid column-major array access in C/C++
PWR012: Pass only required fields from derived type as parameters
PWR013: Avoid copying unused variables to or from the GPU
PWR014: Out-of-dimension-bounds matrix access
PWR015: Avoid copying unnecessary array elements to or from the GPU
PWR016: Use separate arrays instead of an Array-of-Structs
PWR017: Using countable while loops instead of for loops may inhibit vectorization
PWR018: Call to recursive function within a loop inhibits vectorization
PWR019: Consider interchanging loops to favor vectorization by maximizing inner loop’s trip count
PWR020: Consider loop fission to enable vectorization
PWR021: Consider loop fission with scalar to vector promotion to enable vectorization
PWR022: Move invariant conditional out of the loop to avoid redundant computation and potentially enable vectorization
PWR023: Add ‘restrict’ for pointer function parameters to hint the compiler that vectorization is safe
PWR024: Loop can be rewritten in OpenMP canonical form
PWR025: Consider annotating pure function with OpenMP ‘declare simd’
PWR026: Annotate function for OpenMP Offload
PWR027: Annotate function for OpenACC Offload
PWR028: Remove pointer increment preventing performance optimization
PWR029: Remove integer increment preventing performance optimization
PWR030: Remove pointer assignment preventing performance optimization for perfectly nested loops
PWR031: Replace call to pow by multiplication, division and/or square root
PWR032: Avoid calls to mathematical functions with higher precision than required
PWR033: Move invariant conditional out of the loop to avoid redundant computations
PWR034: Avoid strided array access to improve performance
PWR035: Avoid non-consecutive array access to improve performance
PWR036: Avoid indirect array access to improve performance
PWR037: Potential precision loss in call to mathematical function
PWR038: Apply loop sectioning to improve performance
PWR039: Consider loop interchange to improve the locality of reference and enable vectorization
PWR040: Consider loop tiling to improve the locality of reference
PWR042: Consider loop interchange by promoting the scalar reduction variable to an array
PWR043: Consider loop interchange by replacing the scalar reduction value
PWR044: Avoid unnecessary floating-point data conversions involving constants
PWR045: Replace division with a multiplication with a reciprocal
PWR046: Replace two divisions with a division and a multiplication
PWR048: Replace multiplication/addition combo with an explicit call to fused multiply-add
PWR050: Consider applying multithreading parallelism to forall loop
PWR051: Consider applying multithreading parallelism to scalar reduction loop
PWR052: Consider applying multithreading parallelism to sparse reduction loop
PWR053: Consider applying vectorization to forall loop
PWR054: Consider applying vectorization to scalar reduction loop
PWR055: Consider applying offloading parallelism to forall loop
PWR056: Consider applying offloading parallelism to scalar reduction loop
PWR057: Consider applying offloading parallelism to sparse reduction loop
PWR060: Consider loop fission to separate gather memory access pattern
Defects
PWD002: Unprotected multithreading reduction operation
PWD003: Missing array range in data copy to the GPU
PWD004: Out-of-memory-bounds array access
PWD005: Array range copied to or from the GPU does not cover the used range
PWD006: Missing deep copy of non-contiguous data to the GPU
PWD007: Unprotected multithreading recurrence
PWD008: Unprotected multithreading recurrence due to out-of-dimension-bounds array access
PWD009: Incorrect privatization in parallel region
PWD010: Incorrect sharing in parallel region
PWD011: Missing OpenMP lastprivate clause
Remarks
RMK001: Loop nesting that might benefit from hybrid parallelization using multithreading and SIMD
RMK002: Loop nesting that might benefit from hybrid parallelization using offloading and SIMD
RMK003: Potentially privatizable temporary variable
RMK007: SIMD opportunity within a multi-threaded region
RMK008: SIMD opportunity within an offloaded region
RMK009: Outline loop to increase compiler and tooling code coverage
RMK010: The vectorization cost model states the loop is not a SIMD opportunity due to strided memory accesses in the loop body
RMK012: The vectorization cost model states the loop is not a SIMD opportunity because conditional execution renders vectorization inefficient
RMK013: The vectorization cost model states the loop is not a SIMD opportunity because loops with low trip count unknown at compile time do not benefit from vectorization
RMK014: The vectorization cost model states the loop is not a SIMD opportunity due to unpredictable memory accesses in the loop body
Glossary
Locality of Reference
Loop fission
Loop interchange
Loop sectioning
Loop tiling
Loop unswitching
Loop-carried dependencies
Memory access pattern
Multithreading
Offloading
OpenMP Canonical Form
OpenMP
Parallelization
Patterns for performance optimization (page under construction)
Perfect-loop nesting
Pointer aliasing
Row-major and column-major order
Scalar to vector promotion
Strength reduction
Variable scope
Variable scoping in the context of OpenMP
Vectorization
Discover the performance issues hidden in your source code using Codee