linking relocatable device object code with clang

Andrea_Bocci · June 26, 2018, 10:39pm

Dear clang developers,
thanks to the recent work by Alexey Bataev, Jonas Hahnfeld, and others, the trunk version of clang includes support for compiling device code into relocatable object files [1].

These object files can be linked with nvlink (once per GPU architecture), combined with fatbin, embedded in a host object file, and linked with the other host code by the host linker.
Usually nvcc can take care of this part - but it refuses to do so for unsupported host compilers (gcc 8, clang 6).

It would be great if support for this “device link” step could be added to the clang driver.
I am interested to work on it myself, but I would need some guidance on how to start.

In the meantime, to show each step and validate that different approaches are equivalent, I have adapted the original example by NVIDIA and set up an example on GitHub at https://github.com/fwyzard/cuda-linking/ :

clone the repository

git clone git@github.com:fwyzard/cuda-linking.git

cd cuda-linking

build and link with nvcc

make clean nvcc
./app

build with nvcc, link explicitly with nvlink/fatbin

make clean nvlink

./app

build with clang, link explicitly with nvlink/fatbin

make clean clang

./app

Best regards,
.Andrea

[1] Separate Compilation and Linking of CUDA C++ Device Code, https://devblogs.nvidia.com/separate-compilation-linking-cuda-device-code/

hahnjo · June 28, 2018, 2:38pm

You might want to start looking around in the Driver code, how things are done and especially how you invoke external tools.
There is ⚙ D47394 [OpenMP][Clang][NVPTX] Replace bundling with partial linking for the OpenMP NVPTX device offloading toolchain which does somewhat related things for OpenMP, maybe you can reuse some parts once the change lands?

Cheers,
Jonas

Topic		Replies	Views
Separate compilation of CUDA code? LLVM Dev List Archives	4	66	June 19, 2017
CUDA separate compilation LLVM Dev List Archives	3	55	August 29, 2017
NVPTX Back-end: relocatable device code support for dynamic parallelism LLVM Dev List Archives	1	64	August 25, 2017
Separate compilation of CUDA code? LLVM Dev List Archives	2	72	June 17, 2017
clang-offload-bundler Clang Frontend	4	85	January 11, 2018

linking relocatable device object code with clang

clone the repository

build and link with nvcc

build with nvcc, link explicitly with nvlink/fatbin

build with clang, link explicitly with nvlink/fatbin

Related Topics