A loop is a SIMD opportunity if it is a good candidate to run faster if vectorized, i.e. if special vector instructions of the processor are used to process the data in the loop.
Compilers automatically vectorize simple loops, when vectorization is possible and the cost model predicts speed improvements from vectorization. However, explicit vectorization is needed when the cost model is mistaken or when automatic vectorization is not possible for various reasons (e.g. pointer aliasing analysis fails).
Codee can detect loops that are SIMD opportunities, which can help developers improve their performance in various ways (e.g. automatic compiler vectorization, explicit vectorization, SIMD intrinsics, etc). In addition to it, Codee can automatically rewrite them, either by applying portable OpenMP SIMD pragmas, or compiler specific pragmas.
For a loop to be a SIMD opportunity, following conditions need to be met1:
- The loop should be countable, i.e. the number of iterations of the loop should be known before entering the loop
- The loop should only have one entrance and one exit (no jumping inside the loop and no
gotostatements inside the loop)
- The loop should not have loop-carried dependencies
- The loop should not call external functions
- Only the innermost loops are candidates for vectorization
In addition, the usefulness of vectorization depends on additional constraints:
- Loop trip count – loops with short trip counts generally don’t profit from vectorization. This applies to loops whose trip count is unknown at compile time.
- Control flow in the loop body – control flow in the loop body decreases the usefulness of vectorization
- Memory access pattern – inefficient memory access pattern renders vectorization less efficient
The main differences between a loop which is SIMD opportunity and a loop which is a multithreading opportunity are:
- In the case of loop nests, multithreading is typically applied to the outermost. In contrast, vectorization is applied to the innermost loop in the loop nest.
- A good memory access pattern is mandatory for the benefits of vectorization. In contrast, a good memory access pattern will improve the performance of a multithreading opportunity, but it is not a prerequisite, i.e. with multithreading the code will become faster regardless of the memory access pattern.
- Loops that are good multithreading opportunities can have an arbitrary complex control flow. This is not the case with vectorization.
- Multithreading opportunities can in principle invoke functions, which vectorized loops cannot (although inlining can help vectorization by removing the function call).
1 These are classical properties, but occasionally loops can be vectorized even if not all classical properties hold. For example, although rare, it is possible to vectorize an outer loop as well.
Building performance into the code from day one with Codee