Use multi-threading and SIMD hybrid parallelization to maximize the parallel performance of the nested loop.
Certain nested loops may benefit from using a hybrid parallelization approach where the outer loop is parallelized using multi-threading while using vectorization for the inner loop. In this way, the use of modern hardware is maximized by using as many CPU cores as possible through multi-threading and at the same time using the vectorization hardware available in each core.
Parallelize the outer loop using multi-threading and the inner loop using vectorization. You can do so for OpenMP by invoking:
pwdirectives --omp multi+simd <foo.c:5> -o foo-hybrid.c
Building performance into the code from day one with Codee