I’m using Clang on CUDA files (kernels) to create a syntax tree which I use to create C# wrappers. The kernel is completely in device code, needs no compilation and linking.
Everything works fine so far, except that I don’t understand one thing. Clang needs the __clang_cuda_runtime_wrapper.h which is automatically included when clang detects a cu-file. Do I really still need it? As far as I can see if I use -nocudainc no syntax tree for my cuda kernel is generated, although clang it does not complain.
The question is, if I include the cuda_runtime.h instead of everything that is needed on my side would be available, like blockDim, threadIdx, textures fetches etc.
I admit it is a bit unclear to me why I need the toolkit and the reference to __clang_cuda_runtime_wrapper.h. Or saying it differently. I understand why I need one of them, but I don’t understand why I need both. Maybe someone can enlighten me
In general, CUDA sources make a lot of assumptions about what’s available at the beginning of CUDA compilation and that is determined largely by whatever NVCC happened to do.
In addition to that, clang needs to include some headers in order to:
a) provide builtin variables like thereadIdx, blockDim, etc.
b) provide various compiler builtins/wrappers for them, that are needed to compile CUDA headers.
c) provide additional device overloads for some standard functions like printf.
If your code does not need anything on this list, it’s prefectly fine to compile with -nocudainc, and, probably -nocudalib.
@Artem-B. Thanks for your hints. I just want to add a few comments.
a): “provide builtin variables like thereadIdx , blockDim , etc.”
As far as I have seen and tried CUDA itself also defines threadIdx, etc. in device_launch_parameters.h
b) provide various compiler builtins/wrappers for them, that are needed to compile CUDA headers.
Also here, CUDA defines tex1D, etc.
c) provide additional device overloads for some standard functions like printf.
OK, that’s one thing I didn’t try
As you said, what I need is that the functions and built-in types are defined and since I do not get any code out of this compilation process it should work. But when I supply nocudainc and do not include the header files I do not get the syntax tree but also do not get any errors.
So I’m curious if there are some checks that require some things.
Those do not do anything useful for clang. Builtin variables are special. NVCC has hardcoded support for them, but clang implements them in a header. Here’s where the magic lives in clang tree:
Again, that’s something hardcoded in NVCC, but requires headers in clang:
Most of the standard math functions will also need clang headers:
In addition to everything mentioned above, if you do want to include CUDA’s own headers, for whatever reason, you will need the workarounds Clang has implemented to make CUDA headers compileable by clang. Clang does use very different compilation model and he headers written with NVCC in mind are not compileable by clang without rather gross preprocessor abuse in the already mentioned __clang_cuda_runtime_wrapper.h