Problem running OpenMP offload application on AMD platform with CRAY compilers


I am working on OPS-DSL library which helps to auto-generate OpenMP offload version for the application (structured grid based) when written using OPS-DSL APIs.

In the backed following piece of code is used to do some operation related to offload

ops_device_malloc - allocate host memory and its associate copy of device memory

ops_device_free - free both the host and associated device memory

ops_device_memcpy_h2d - Host to device copy, copying from some other host pointer here (not from the pointer which we allocated in device_malloc)

ops_device_memcpy_d2h - Device to host copy, copying to some other host pointer here

ops_device_memcpy_d2d - device to device copy

ops_device_memset - initialize device memory

For one of the application, i tried this on Nvidia GPU with nvhpc compilers. The code generates matching output as compared with CUDA version.

But when i tried running this appliation on AMD gpu with CRAY compilers,

i am getting either following runtime error or NaN in result

ACC: libcrayacc/acc_present.c:679 CRAY_ACC_ERROR - Host region (400dede8080 to 400dee94880) overlaps present region (400dede8dc0 to 400dee88dc0 index 622) but is not contained for 'data_d[0:bytes]' from ops_ompoffload_common_omp4.cpp:69

Does the #pragma used for OpenMP offload looks correct??
can you please point if there is any mistakes in the same.

A very easy spot check would be to use LLVM/Clang to offload to AMD (or NVIDIA), and see what happens.

The error suggests you are trying to map overlapping regions of the same object, e.g.

char *Data = malloc(100);
#pragma omp target data enter map(to:Data[0:75])
#pragma omp target data enter map(to:Data[25:100])

which is not allowed. LLVM should also diagnose those problems and it provides you with ways to dump the mapping which can help you debug.
See bit 1, 2, and 4 of