Issue #
Copying data to the GPU from an array whose size is not known to the compiler requires specifying information about the desired array data range to be copied.
Relevance #
When offloading to the GPU, copying array data from the host memory to the GPU memory requires information about the data range of the array that must be copied. If the compiler knows the array size (e.g., for C static arrays or Fortran assumed shape arrays) and the whole array must be copied, specifying the data range is optional for both OpenMP and OpenACC standards. However, in the case of arrays whose size is not known to the compiler, specifying the array range is compulsory. Some compilers do not enforce this, which leads to undefined behavior. For instance, for C dynamic arrays the pointer scalar value might be copied instead of any pointed-to data; for Fortran assumed size arrays, an invalid memory access might occur or erroneous memory (i.e., from wrong memory locations) might be copied.
Actions #
Specify the array range to be copied to device memory.
Code example #
In the following OpenMP code, a pointer is being copied to the offloading target device instead of the dynamic array data pointed by it.
void foo(int* a, int* b, int* sum, int size) {
#pragma omp target map(to: a, b) map(from: sum)
#pragma omp parallel for
for (int i = 0; i < size; i++) {
sum[i] = a[i] + b[i];
}
}
In this case, it suffices to specify the array bounds in the OpenMP map clauses:
void foo(int* a, int* b, int* sum, int size) {
#pragma omp target map(to: a[0:size], b[0:size]) map(from: sum[0:size])
#pragma omp parallel for
for (int i = 0; i < size; i++) {
sum[i] = a[i] + b[i];
}
}
The same applies to the analogous OpenACC example.
void foo(int* a, int* b, int* sum, int size) {
#pragma acc data copyin(a, b) copyout(sum)
#pragma acc parallel loop
for (int i = 0; i < size; i++) {
sum[i] = a[i] + b[i];
}
}
And again, specifying the array bounds fixes the problem:
void foo(int* a, int* b, int* sum, int size) {
#pragma acc data copyin(a[0:size], b[0:size]) copyout(sum[0:size])
#pragma acc parallel loop
for (int i = 0; i < size; i++) {
sum[i] = a[i] + b[i];
}
}
You can find this and more examples at GitHub.
Related resources #
- PWD003 examples at GitHub
- OpenMP 4.5 Complete Specifications, November 2015 [last checked July 2019]
- The OpenACC Application Programming Interface, Version 2.6, November 2017 [last checked July 2019]
References #
- OpenMP 4.5 Complete Specifications (see page 44, Section 2.4 Array Sections) [last checked July 2019]
- The OpenACC Application Programming Interface, Version 2.6 (see page 33, Section 2.7.1. Data Specification in Data Clauses) [last checked July 2019]
- Race condition