Best practices for performance

PWR026: Annotate function for OpenMP offload

Issue #

A OpenMP offload version of the function can be generated by the compiler.

Relevance #

When a loop is offloaded to the GPU, the compiler creates proper instructions understandable by the GPU which are distinct from those of the CPU. In this way, offloaded sections are translated into mini GPUprograms embedded into the main CPU program. The runtime is in charge of executing those mini-programs in the GPU, as well as of doing the proper data movement between the CPU and GPU memories. If an offloaded loop invokes functions, a GPU-version of those functions must also be created. In order for the compiler to create a GPU version of the function, the relevant functions must be marked with the OpenMP ‘declare target’ directive. When this is not done, the CPU version will be called instead, with the corresponding performance loss due to moving computation from the GPU to the CPU to execute the function and then back to the GPU once it returns.

Actions #

Annotate the function with #pragma omp declare target.

Reference #

Declare target Directive – OPENMP API Specification: Version 5.0 November 2018