[...]
Libomptarget error: Failed to synchronize device.
Libomptarget error: Call to targetDataEnd failed, abort target.
Libomptarget error: Failed to process data after launching the kernel.
Libomptarget error: run with env LIBOMPTARGET_INFO>1 to dump
host-targetpointer maps
Libomptarget fatal error 1: failure of target construct while
offloading is mandatory
after this point, the process gets in the state of unresponsive and
don't receive a signal from the user. Is this due to a new feature of
LLVM?
I don't think so.
The only thing that comes to mind is that we switched to `abort` instead of `exit` after the fatal error message.
Though, I'm not sure why that would cause the program to hang, except if SIGABRT is somehow caught.
I think I was running my offloading app with CUDA Toolkit which is
I've loaded via Spack, but
the app itself is built with Clang (+CUDA Toolkit local admin provided
via modules).
However, the effect is this drastic; I mean locking totally up a
ThunderX2 node?
With the Trunk Clang running with CUDA Toolkit 10.1.105 on JURECA at
JSC, I started seeing a hang up:
Libomptarget --> Call to omp_get_num_devices returning 1
Libomptarget --> Default TARGET OFFLOAD policy is now mandatory
(devices were found)
Libomptarget --> Entering data begin region for device -1 with 1 mappings
Libomptarget --> Use default device id 0
Libomptarget --> Checking whether device 0 is ready.
Libomptarget --> Is the device 0 (local ID 0) initialized? 0
Target CUDA RTL --> Init requires flags to 1
Target CUDA RTL --> Getting device 0
Target CUDA RTL --> The primary context is inactive, set its flags to
CU_CTX_SCHED_BLOCKING_SYNC
Getting back to the prompt takes time or I needed to hit Ctrl + C or Z
hard many times.
Target CUDA RTL --> Init requires flags to 1
Target CUDA RTL --> Getting device 0
Target CUDA RTL --> The primary context is inactive, set its flags to
CU_CTX_SCHED_BLOCKING_SYNC
[New Thread 0x2aaaae5e3700 (LWP 4154)]
^C
Thread 1 "nest" received signal SIGINT, Interrupt.
0x00002aaaad2e5a1c in cuVDPAUCtxCreate ()
from /usr/local/software/jureca/Stages/2019a/software/nvidia/driver/lib64/libcuda.so.1
are you able to run a very simple omp code like just an empty “omp target”
My current feeling is that when you build libomptarget plugins, the cuda.h may not be consistent with /usr/local/software/jureca/Stages/2019a/software/nvidia/driver/lib64/libcuda.so.1
I’m not aware of a way to find what header it was.
I think it is worth trying to use the same CUDA toolkit for building clang+libomptarget and your app.