A loop is a multithreading opportunity if it is a good candidate to run faster if its workload is distributed to several CPUs.
Codee can detect loops that are multithreading opportunities, which can help developers improve their performance in various ways (e.g. using OpenMP pragmas, using pthread API, std::thread in C++, etc). In addition, Codee can automatically rewrite them using portable OpenMP pragmas.
When a piece of code meant for multithreading is reached, the program switches to a multithreaded mode: it spawns several threads and distributes the workload among them. The goal is to finish the work faster.
For a loop to be a multithreading opportunity, following constraints need to be met:
- The loop should be countable, i.e. the number of iterations should be known before entering the loop
- The loop should only have one entrance and one exit (no jumping inside the loop and no
gotostatements inside the loop)
- The loop should not have loop-carried dependencies
In addition, the usefulness of multithreading depends on additional constraints:
- Workload size – spawning threads and thread synchronization is the source of overhead, therefore multithreading for small workloads rarely pays off. In fact, multithreading on small workloads can be a source of a large slowdown.
The main differences between a loop which is a multithreading opportunity and a loop which is a SIMD opportunity are:
- Typically, multithreading is applied to the outermost loop in the loop nest. In contrast, vectorization is applied to the innermost loop in the loop nest.
- A good memory access pattern is mandatory for the benefits of vectorization. In contrast, a good memory access pattern will improve the performance of a multithreading opportunity, but it is not a prerequisite, i.e. with multithreading the code will become faster regardless of the memory access pattern.
- Loops that are good multithreading opportunities can have an arbitrary complex control flow. This is not the case with vectorization.
- Multithreading opportunities can in principle invoke functions, which vectorized loops cannot (although inlining can help vectorization by removing the function call).
Some compilers support automatic parallelization using multithreading, but typically this is not recommended and it is disabled by default. The compilers do not have enough information at compile-time to distinguish between hot and cold loops, and doing multithreading to a cold loop can actually result in a large slowdown.
Building performance into the code from day one with Codee