I have a templated C++ code that can use either OpenMP, CUDA, or both. The case when both CUDA and OpenMP code paths are simultaneously active is of primary interest. Due to use of templates the compiler needs to be able to handle both CUDA and OpenMP offload in the same translation unit. Is it possible with clang?
I have a small example here. The example is O(100) lines of code and can be compiled with GNU make:
make -f Makefile.clang17.nv. I’ve tested with clang 17 from git (< month old) but when I pass
-x cuda --cuda-gpu-arch=sm_75 -fopenmp --offload-arch=sm_75 to clang cuda parts work but offloading is not done. The error message when adding
error: No offloading entry generated while offloading is mandatory.
I’ll need to check again, but I believe we explicitly do not allow for both languages to be active at the same time. I know for a fact we do this with HIP and I think CUDA is that same. I’m not entirely sure what steps would be required to get this to work, but right now the only solution is to split them into different TUs and link them together. Right now that’s only possible using the “new” driver phase that’s still opt-in for CUDA.
clang++ -x cu input.cpp --offload-arch=sm_70 --offload-new-driver
clang++ -fopenmp openmp.cpp --offload-arch=sm_70
clang++ input.o openmp.o -fopenmp --offload-arch=sm_70 -lcudart // --offload-new-driver implied by OpenMP, otherwise --offload-link
what is meant by opt-in? I was able to refactor the simple example and get it to work. I had to explicitly instantiate templates which is no fun. I’m not sure about doing this in the real code which is more complex. I hope cuda/hip + offload will be supported in the future.
Sorry I totally forgot to include the flag that indicates it’s
opt-in in the example
--offload-new-driver. That’s required to do CUDA RDC-mode linking in Clang currently.
OK, it seems to work without the flag. It may be the case that git clang is already using the new driver
Clang doesn’t use it by default, it’ll work without the flag if you don’t rely on RDC-mode for CUDA. E.g. you’re not calling any CUDA code from OpenMP or vice-versa. I’ll update the example to actually be correct for that behavior.