Compiling OpenMP for GPUs with there are function calls in the parallel regions


Using the flow described here:, I can compile and run OpenMP code on GPUs when the parallel region is self-contained (i.e., does not include calls to functions).

When the parallel region includes a call to a function (e.g., foo()), I get this error.

nvlink error : Undefined reference to ‘foo’ in ‘/tmp/test.o-e8741d.cubin’

“foo” is indeed declared and defined in the same file before the main function, but clang driver does not include it in the final PTX file (test.s.tgt-nvptx64sm_30-nvidia-linux).

Using CUDA terminology, Is having “device functions” not supported yet in OpenMP ?


Most probably you forget to enclose definition of ‘foo()’ function in ‘pragma omp declare target’ region. You can do it like this:

#pragma omp declare target
void foo() {

#pragma omp end declare target