Compiling CUDA code fails

Do you know why this doesn’t cause an issue with nvcc? Is there a better way to fix this?

nvcc uses a very different compilation strategy. NVCC physically separates host and device code, which allows compilation of host-side code which uses types not supported on GPU.

On the other hand, Clang sees both host and device code simultaneously which allows us to handle C++ compilation better than NVCC. E.g. NVCC has to jump through some hoops to deal with some templates being instantiated by the code on the other side of the compilation.

The downside of seeing both sides is that the code must be ‘reasonably’ valid for both the host and the GPU, as complete TU (unless you rely on preprocessing and CUDA_ARCH macro).

Some of the issues in this category we work around by introducing delayed diagnostics. If the code that triggered it does not get emitted (e.g. host code during GPU-side compilation), then compilation succeeds. With types it’s trickier as using a type does not necessarily generate any code, so it’s hard to tie the error to anything specific.

Depending on where __float128 pops up in the headers, you may be able to do something like this:

# if __CUDA_ARCH__
#define __STRICT_ANSI__ 1
#endif
#include <header that may use float128.h>

It may not work if that header is included via the cuda runtime wrapper header clang -include s itself. It may also lead to differences in code/types as seen by the compiler during host and device compilations which may cause further trouble if it affects data exchanged between the host and the GPU.

1 Like