When I build a shared library w/ CMake that contains both CUDA (compiled w/ clang++) and OpenMP offload code, I have unexplained runtime crashes, eg the above kernel launch failure, and I’ve also seen crash in CUDA in first call to the CUDA runtime (before any kernel launch).
This is only happening with a shared library. When I use a static library the code runs correctly. Also, I can compile the same code into 2 libraries and then link into my exe and run without issue. I believe that the code itself is correct but that there is some issue created by linking them together into the same shared library.
I am using CMake, and telling it to use clang as both CMAKE_CXX_COMPILER and CMAKE_CUDA_COMPILER. I have also CMAKE_CUDA_SEPARABLE_COMPILATION ON (for relocatable device code). CMake handle CUDA w/ clang seamlessly, but does not know anythong about OpenMP offload, and those flags I have to pass manually via target_compile_options and target_link_options. Perhaps the flags are not correct?
The problematic library and executable are configured as :
# library with both OpenMP and CUDA
set_source_files_properties(../cu_impl.cpp PROPERTIES LANGUAGE CUDA)
add_library(cump_impl ../mp_impl.cpp ../cu_impl.cpp)
target_compile_options(cump_impl PUBLIC -fopenmp --offload-arch=sm_75 -fopenmp-offload-mandatory --offload-new-driver)
set_target_properties(cump_impl PROPERTIES CUDA_ARCHITECTURES "75")
# executable that uses the library
add_executable(cump_both_2 ../main.cpp)
target_link_libraries(cump_both_2 cump_impl omp omptarget)
target_compile_definitions(cump_both_2 PUBLIC -DCUMP_USE_OPENMP -DCUMP_USE_CUDA)
target_compile_options(cump_both_2 PUBLIC -fopenmp --offload-arch=sm_75 -fopenmp-offload-mandatory --offload-new-driver)
target_link_options(cump_both_2 PUBLIC -L/home/bloring/work/llvm/llvm-install/lib/ -fopenmp --offload-arch=sm_75 -fopenmp-offload-mandatory --offload-new-driver)
There is a reproducer here:
This can be compiled and run with:
mkdir build
cd build
cmake -DBUILD_TESTING=ON -DBUILD_SHARED_LIBS=ON -DCMAKE_CXX_COMPILER=`which clang++` -DCMAKE_CUDA_COMPILER=`which clang++` -DCMAKE_BUILD_TYPE=Debug ../
make -j8
ctest
the last of the 4 tests fails, the one that uses the library with both CUDA and OpenMP offload. Change -DBUILD_SHARED_LIBS=OFF
and all tests run correctly.
I’m using clang17 from git early June, and CMake 3.26.2 from Fedora 37