Compiling CUDA code fails

Is there any plan to add support for __float128 when building nvptx with Clang in the future?

Not to my knowledge. There’s no FP128 support on existing NVIDIA GPUs, so it would be of limited practical use on the GPU. Even if we were to emulate it via float/double it would be prohibitively slow (even double is rather slow on most GPU variants).

We may be able to add storage-only support for it, but I’m not sure if we can easily add fp128 emulation using fp64/fp32.