There is a SIMD vectorization opportunity within an offloaded region.
SIMD vectorization is performed at the lowest level in hardware and is usually compatible with higher forms of parallelization. In this case, it could potentially be used to further increase the performance of computation offloaded to accelerator devices such as GPUs.
Add vectorization directives to instruct the compiler to vectorize the loop.
Building performance into the code from day one with Codee