Catalog of best practice rules for performance
All recommendations, opportunities, defects and remarks of this catalog are automatically detected and reported by Codee , and specifically, it can apply the opportunities by automatically rewriting the code.
Recommendations
PWR001: Declare global variables as function parameters
PWR002: Declare scalar variables in the smallest possible scope
PWR003: Explicitly declare pure functions
PWR004: Declare OpenMP scoping for all variables
PWR005: Disable default OpenMP scoping
PWR006: Avoid privatization of read-only variables
PWR007: Disable implicit declaration of variables
PWR008: Declare the intent for each procedure parameter
PWR009: Use OpenMP teams to offload work to GPU
PWR010: Avoid column-major array access in C/C++
PWR012: Pass only required fields from derived type as parameters
PWR013: Avoid copying unused variables to or from the GPU
PWR014: Out-of-dimension-bounds matrix access
PWR015: Avoid copying unnecessary array elements to or from the GPU
PWR016: Use separate arrays instead of an Array-of-Structs
PWR017: Using countable while loops instead of for loops may inhibit vectorization
PWR018: Call to recursive function within a loop inhibits vectorization
PWR019: Consider interchanging loops to favor vectorization by maximizing inner loop’s trip count
PWR020: Consider loop fission to enable vectorization
PWR021: Consider loop fission with scalar to vector promotion to enable vectorization
PWR022: Move invariant conditional out of the loop to avoid redundant computation and potentially enable vectorization
PWR023: Add ‘restrict’ for pointer function parameters to hint the compiler that vectorization is safe
PWR024: Loop can be rewritten in OpenMP canonical form
PWR025: Consider annotating pure function with OpenMP ‘declare simd’
PWR026: Annotate function for OpenMP Offload
PWR027: Annotate function for OpenACC Offload
PWR028: Remove pointer increment preventing performance optimization
PWR029: Remove integer increment preventing performance optimization
PWR030: Remove pointer assignment preventing performance optimization for perfectly nested loops
PWR031: Replace call to pow by multiplication, division and/or square root
PWR032: Avoid calls to mathematical functions with higher precision than required
PWR033: Move invariant conditional out of the loop to avoid redundant computations
PWR034: Avoid strided array access to improve performance
PWR035: Avoid non-consecutive array access to improve performance
PWR036: Avoid indirect array access to improve performance
PWR037: Potential precision loss in call to mathematical function
PWR038: Apply loop sectioning to improve performance
PWR039: Consider loop interchange to improve the locality of reference and enable vectorization
PWR040: Consider loop tiling to improve the locality of reference
PWR042: Consider loop interchange by promoting the scalar reduction variable to an array
PWR043: Consider loop interchange by replacing the scalar reduction value
Defects
PWD002: Unprotected multithreading reduction operation
PWD003: Missing array range in data copy to the GPU
PWD004: Out-of-memory-bounds array access
PWD005: Array range copied to or from the GPU does not cover the used range
PWD006: Missing deep copy of non-contiguous data to the GPU
PWD007: Unprotected multithreading recurrence
PWD008: Unprotected multithreading recurrence due to out-of-dimension-bounds array access
PWD009: Incorrect privatization in parallel region
PWD010: Incorrect sharing in parallel region
PWD011: Missing OpenMP lastprivate clause
Remarks
RMK001: Loop nesting that might benefit from hybrid parallelization using multithreading and SIMD
RMK002: Loop nesting that might benefit from hybrid parallelization using offloading and SIMD
RMK003: Potentially privatizable temporary variable
RMK007: SIMD opportunity within a multi-threaded region
RMK008: SIMD opportunity within an offloaded region
RMK009: Outline loop to increase compiler and tooling code coverage
RMK010: The vectorization cost model states the loop is not a SIMD opportunity due to strided memory accesses in the loop body
RMK012: The vectorization cost model states the loop is not a SIMD opportunity because conditional execution renders vectorization inefficient
RMK013: The vectorization cost model states the loop is not a SIMD opportunity because loops with low trip count unknown at compile time do not benefit from vectorization
RMK014: The vectorization cost model states the loop is not a SIMD opportunity due to unpredictable memory accesses in the loop body
Glossary
Locality of Reference
Loop fission
Loop interchange
Loop sectioning
Loop tiling
Loop unswitching
Loop-carried dependencies
Memory access pattern
Multithreading
Offloading
OpenMP Canonical Form
OpenMP
Parallelization
Patterns for performance optimization (page under construction)
Perfect-loop nesting
Pointer aliasing
Row-major and column-major order
Scalar to vector promotion
Strength reduction
Variable scope
Variable scoping in the context of OpenMP
Vectorization
Discover the performance issues hidden in your source code using Codee