OpenMP target offload in a subprocess

Hi all,

We are trying to use OpenMP target offload in a subprocess and get the following error:

Target CUDA RTL → Start initializing CUDA
Target CUDA RTL → Init requires flags to 1
Target CUDA RTL → Getting device 0
Target CUDA RTL → Error returned from cuDeviceGet
Target CUDA RTL → CUDA error is: initialization error
Libomptarget error: Device 0 is not ready.
Libomptarget error: run with env LIBOMPTARGET_INFO>1 to dump host-target pointer maps
Libomptarget error: Build with debug information to provide more informationLibomptarget fatal error 1: failure of target construct while offloading is mandatory

We are targeting a NVIDIA V100 GPU using a LLVM/Clang-12 build from December. We assume the error is because the CUDA context is started before there has been any OpenMP API calls or directives. How can we use OpenMP offload in the subprocess only? How can we use OpenMP offload in the original process and the subprocess? We have attached a toy code that fails with the message above when defining USE_CHILD_PROCESS. The key code is shown below:

int main()
{
int mypid = fork();
if( 0 == mypid ) {
#ifdef USE_CHILD_PROCESS
printf(“Calling from child process\n”);
cfunc(); // contains OpenMP target offload
#endif
} else {
#ifndef USE_CHILD_PROCESS
printf(“Calling from parent process\n”);
cfunc(); // contains OpenMP target offload
#endif
}
return 0;
}

Thanks,
Chris

main.c (971 Bytes)

Looks like a CUDA thing. Google indicates MPS may help this issue but I had no experience with MPS.

Ye