Dear clang developers,
thanks to the recent work by Alexey Bataev, Jonas Hahnfeld, and others, the trunk version of clang includes support for compiling device code into relocatable object files .
These object files can be linked with nvlink (once per GPU architecture), combined with fatbin, embedded in a host object file, and linked with the other host code by the host linker.
Usually nvcc can take care of this part - but it refuses to do so for unsupported host compilers (gcc 8, clang 6).
It would be great if support for this “device link” step could be added to the clang driver.
I am interested to work on it myself, but I would need some guidance on how to start.
In the meantime, to show each step and validate that different approaches are equivalent, I have adapted the original example by NVIDIA and set up an example on GitHub at https://github.com/fwyzard/cuda-linking/ :
git clone firstname.lastname@example.org:fwyzard/cuda-linking.git
make clean nvcc
make clean nvlink
make clean clang
 Separate Compilation and Linking of CUDA C++ Device Code, https://devblogs.nvidia.com/separate-compilation-linking-cuda-device-code/