Separate compilation of CUDA code?


I wonder whether the current version of LLVM supports separate compilation and linking of device code, i.e., is there a flag analogous to nvcc’s --relocatable-device-code flag? If not, is there any plan to support this?

Yuanfeng Peng

Hi Yuanfeng,

i have asked about this a few days ago. [0]
As far as i know there is no such flag available and i did not find any information whether this feature will be supported. However i would be very interested on any updates on this feature.

Kind Regards


Hi Lorenz,

Thanks for letting me know! It seems that relocatable device code for CUDA isn’t being taken care of by anyone in the LLVM developers community now, so I ended up with a little hacking with nvcc to achieve my goal. I found the approach on github: . Basically, I used this method to get the unoptimized nvvm IR from nvcc( more precisely, cicc), then invoked my own llvm pass on it to do the transformation I need, and feed the transformed IR back to nvcc’s libnvvm backend. This way, nvcc still takes care of the compilation and linking of device code.

I’m not sure whether this can be useful for your problem as well, but I think I should let you know. Hope it helps!


------------------ Original ------------------

Thanks for the link to this hack!
I have thought about doing something similar. But since the reallocatable device code is only nice to have for me i don’t want to put to much work in it.

Maybe its possible to generate IR with clang, hand it over to the libnvvm backend and get a fatbinary which then can be included in the host code compiled by clang. I think this would need the least amount of hacking to being able to use the rdc flag.


As I understand it, this is not exactly true. Relocatable device code is necessary for OpenMP offloading support, and essentially works in that context, we just don’t currently have an option to enable it in CUDA mode. Hopefully, some of the folks from IBM who have been working on this can comment. -Hal