Dear clang developers,
thanks to the recent work by Alexey Bataev, Jonas Hahnfeld, and others, the trunk version of clang includes support for compiling device code into relocatable object files [1].
These object files can be linked with nvlink (once per GPU architecture), combined with fatbin, embedded in a host object file, and linked with the other host code by the host linker.
Usually nvcc can take care of this part - but it refuses to do so for unsupported host compilers (gcc 8, clang 6).
It would be great if support for this “device link” step could be added to the clang driver.
I am interested to work on it myself, but I would need some guidance on how to start.
In the meantime, to show each step and validate that different approaches are equivalent, I have adapted the original example by NVIDIA and set up an example on GitHub at https://github.com/fwyzard/cuda-linking/ :
clone the repository
git clone git@github.com:fwyzard/cuda-linking.git
cd cuda-linking
build and link with nvcc
make clean nvcc
./app
build with nvcc, link explicitly with nvlink/fatbin
make clean nvlink
./app
build with clang, link explicitly with nvlink/fatbin
make clean clang
./app
Best regards,
.Andrea
[1] Separate Compilation and Linking of CUDA C++ Device Code, https://devblogs.nvidia.com/separate-compilation-linking-cuda-device-code/