LLVM@14.0.0 doesn't support well on CUDA@11.5.0 about variadic function and other definitions

I am compiling CUDA program using LLVM@14.0.0 and the CUDA backend is CUDA@11.5.0. I want to use the new feature of contolling L2 cache via annotated pointer, but it seems that clang doesn’t support it well.

In file included from L2_stream_binding.cu:5:
In file included from /data/home/mzw/spack/opt/spack/linux-centos7-zen/gcc-7.5.0/cuda-11.5.0-m5vnfghinm5zygqtqh2s6uxvnbvo4wht/bin/../targets/x86_64-linux/include/cuda/annotated_ptr:58:
In file included from /data/home/mzw/spack/opt/spack/linux-centos7-zen/gcc-7.5.0/cuda-11.5.0-m5vnfghinm5zygqtqh2s6uxvnbvo4wht/bin/../targets/x86_64-linux/include/cuda/barrier:10:
In file included from /data/home/mzw/spack/opt/spack/linux-centos7-zen/gcc-7.5.0/cuda-11.5.0-m5vnfghinm5zygqtqh2s6uxvnbvo4wht/bin/../targets/x86_64-linux/include/cuda/std/barrier:17:
In file included from /data/home/mzw/spack/opt/spack/linux-centos7-zen/gcc-7.5.0/cuda-11.5.0-m5vnfghinm5zygqtqh2s6uxvnbvo4wht/bin/../targets/x86_64-linux/include/cuda/std/atomic:41:
In file included from /data/home/mzw/spack/opt/spack/linux-centos7-zen/gcc-7.5.0/cuda-11.5.0-m5vnfghinm5zygqtqh2s6uxvnbvo4wht/bin/../targets/x86_64-linux/include/cuda/std/cstddef:35:
In file included from /data/home/mzw/spack/opt/spack/linux-centos7-zen/gcc-7.5.0/cuda-11.5.0-m5vnfghinm5zygqtqh2s6uxvnbvo4wht/bin/../targets/x86_64-linux/include/cuda/std/type_traits:24:
/data/home/mzw/spack/opt/spack/linux-centos7-zen/gcc-7.5.0/cuda-11.5.0-m5vnfghinm5zygqtqh2s6uxvnbvo4wht/bin/../targets/x86_64-linux/include/cuda/std/detail/libcxx/include/type_traits:520:12: error: CUDA device code does not support variadic functions
false_type __sfinae_test_impl(...);
           ^
/data/home/mzw/spack/opt/spack/linux-centos7-zen/gcc-7.5.0/cuda-11.5.0-m5vnfghinm5zygqtqh2s6uxvnbvo4wht/bin/../targets/x86_64-linux/include/cuda/std/detail/libcxx/include/type_traits:1059:69: error: CUDA device code does not support variadic functions
    template <class _Tp> _LIBCUDACXX_INLINE_VISIBILITY static __two __test(...);
                                                                    ^
/data/home/mzw/spack/opt/spack/linux-centos7-zen/gcc-7.5.0/cuda-11.5.0-m5vnfghinm5zygqtqh2s6uxvnbvo4wht/bin/../targets/x86_64-linux/include/cuda/std/detail/libcxx/include/type_traits:1171:5: error: CUDA device code does not support variadic functions
    __any(...);
    ^
/data/home/mzw/spack/opt/spack/linux-centos7-zen/gcc-7.5.0/cuda-11.5.0-m5vnfghinm5zygqtqh2s6uxvnbvo4wht/bin/../targets/x86_64-linux/include/cuda/std/detail/libcxx/include/type_traits:1848:16: error: CUDA device code does not support variadic functions
   static void __test(...);
               ^
/data/home/mzw/spack/opt/spack/linux-centos7-zen/gcc-7.5.0/cuda-11.5.0-m5vnfghinm5zygqtqh2s6uxvnbvo4wht/bin/../targets/x86_64-linux/include/cuda/std/detail/libcxx/include/type_traits:2272:18: error: CUDA device code does not support variadic functions
    static __two __test (...);
                 ^
/data/home/mzw/spack/opt/spack/linux-centos7-zen/gcc-7.5.0/cuda-11.5.0-m5vnfghinm5zygqtqh2s6uxvnbvo4wht/bin/../targets/x86_64-linux/include/cuda/std/detail/libcxx/include/type_traits:4332:16: error: CUDA device code does not support variadic functions
  static __nat __try_call(...);
               ^
In file included from L2_stream_binding.cu:5:
In file included from /data/home/mzw/spack/opt/spack/linux-centos7-zen/gcc-7.5.0/cuda-11.5.0-m5vnfghinm5zygqtqh2s6uxvnbvo4wht/bin/../targets/x86_64-linux/include/cuda/annotated_ptr:58:
In file included from /data/home/mzw/spack/opt/spack/linux-centos7-zen/gcc-7.5.0/cuda-11.5.0-m5vnfghinm5zygqtqh2s6uxvnbvo4wht/bin/../targets/x86_64-linux/include/cuda/barrier:10:
In file included from /data/home/mzw/spack/opt/spack/linux-centos7-zen/gcc-7.5.0/cuda-11.5.0-m5vnfghinm5zygqtqh2s6uxvnbvo4wht/bin/../targets/x86_64-linux/include/cuda/std/barrier:17:
/data/home/mzw/spack/opt/spack/linux-centos7-zen/gcc-7.5.0/cuda-11.5.0-m5vnfghinm5zygqtqh2s6uxvnbvo4wht/bin/../targets/x86_64-linux/include/cuda/std/atomic:56:22: error: no member named 'thread_scope' in namespace 'cuda::std::__detail'
using std::__detail::thread_scope;
      ~~~~~~~~~~~~~~~^
/data/home/mzw/spack/opt/spack/linux-centos7-zen/gcc-7.5.0/cuda-11.5.0-m5vnfghinm5zygqtqh2s6uxvnbvo4wht/bin/../targets/x86_64-linux/include/cuda/std/atomic:57:22: error: no member named 'thread_scope_system' in namespace 'cuda::std::__detail'
using std::__detail::thread_scope_system;
      ~~~~~~~~~~~~~~~^
/data/home/mzw/spack/opt/spack/linux-centos7-zen/gcc-7.5.0/cuda-11.5.0-m5vnfghinm5zygqtqh2s6uxvnbvo4wht/bin/../targets/x86_64-linux/include/cuda/std/atomic:58:22: error: no member named 'thread_scope_device' in namespace 'cuda::std::__detail'
using std::__detail::thread_scope_device;
      ~~~~~~~~~~~~~~~^
/data/home/mzw/spack/opt/spack/linux-centos7-zen/gcc-7.5.0/cuda-11.5.0-m5vnfghinm5zygqtqh2s6uxvnbvo4wht/bin/../targets/x86_64-linux/include/cuda/std/atomic:59:22: error: no member named 'thread_scope_block' in namespace 'cuda::std::__detail'
using std::__detail::thread_scope_block;
      ~~~~~~~~~~~~~~~^
/data/home/mzw/spack/opt/spack/linux-centos7-zen/gcc-7.5.0/cuda-11.5.0-m5vnfghinm5zygqtqh2s6uxvnbvo4wht/bin/../targets/x86_64-linux/include/cuda/std/atomic:60:22: error: no member named 'thread_scope_thread' in namespace 'cuda::std::__detail'
using std::__detail::thread_scope_thread;
      ~~~~~~~~~~~~~~~^
/data/home/mzw/spack/opt/spack/linux-centos7-zen/gcc-7.5.0/cuda-11.5.0-m5vnfghinm5zygqtqh2s6uxvnbvo4wht/bin/../targets/x86_64-linux/include/cuda/std/atomic:63:22: error: no member named '__thread_scope_block_tag' in namespace 'cuda::std::__detail'
using std::__detail::__thread_scope_block_tag;
      ~~~~~~~~~~~~~~~^
/data/home/mzw/spack/opt/spack/linux-centos7-zen/gcc-7.5.0/cuda-11.5.0-m5vnfghinm5zygqtqh2s6uxvnbvo4wht/bin/../targets/x86_64-linux/include/cuda/std/atomic:64:22: error: no member named '__thread_scope_device_tag' in namespace 'cuda::std::__detail'
using std::__detail::__thread_scope_device_tag;
      ~~~~~~~~~~~~~~~^
/data/home/mzw/spack/opt/spack/linux-centos7-zen/gcc-7.5.0/cuda-11.5.0-m5vnfghinm5zygqtqh2s6uxvnbvo4wht/bin/../targets/x86_64-linux/include/cuda/std/atomic:65:22: error: no member named '__thread_scope_system_tag' in namespace 'cuda::std::__detail'
using std::__detail::__thread_scope_system_tag;
      ~~~~~~~~~~~~~~~^
/data/home/mzw/spack/opt/spack/linux-centos7-zen/gcc-7.5.0/cuda-11.5.0-m5vnfghinm5zygqtqh2s6uxvnbvo4wht/bin/../targets/x86_64-linux/include/cuda/std/atomic:66:7: error: no member named '__atomic_signal_fence_cuda' in namespace 'cuda::std::__detail'; did you mean '::__atomic_signal_fence'?
using std::__detail::__atomic_signal_fence_cuda;
      ^~~~~~~~~~~~~~~
/data/home/mzw/spack/opt/spack/linux-centos7-x86_64_v3/gcc-4.8.5/gcc-7.5.0-575nvplvtbr5fqwbmtzwhz537i3k3cty/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/atomic_base.h:106:5: note: '::__atomic_signal_fence' declared here
  { __atomic_signal_fence(__m); }
    ^
In file included from L2_stream_binding.cu:5:
In file included from /data/home/mzw/spack/opt/spack/linux-centos7-zen/gcc-7.5.0/cuda-11.5.0-m5vnfghinm5zygqtqh2s6uxvnbvo4wht/bin/../targets/x86_64-linux/include/cuda/annotated_ptr:58:
In file included from /data/home/mzw/spack/opt/spack/linux-centos7-zen/gcc-7.5.0/cuda-11.5.0-m5vnfghinm5zygqtqh2s6uxvnbvo4wht/bin/../targets/x86_64-linux/include/cuda/barrier:10:
In file included from /data/home/mzw/spack/opt/spack/linux-centos7-zen/gcc-7.5.0/cuda-11.5.0-m5vnfghinm5zygqtqh2s6uxvnbvo4wht/bin/../targets/x86_64-linux/include/cuda/std/barrier:17:
/data/home/mzw/spack/opt/spack/linux-centos7-zen/gcc-7.5.0/cuda-11.5.0-m5vnfghinm5zygqtqh2s6uxvnbvo4wht/bin/../targets/x86_64-linux/include/cuda/std/atomic:67:7: error: no member named '__atomic_thread_fence_cuda' in namespace 'cuda::std::__detail'; did you mean '::__atomic_thread_fence'?
using std::__detail::__atomic_thread_fence_cuda;
      ^~~~~~~~~~~~~~~
/data/home/mzw/spack/opt/spack/linux-centos7-x86_64_v3/gcc-4.8.5/gcc-7.5.0-575nvplvtbr5fqwbmtzwhz537i3k3cty/lib/gcc/x86_64-pc-linux-gnu/7.5.0/../../../../include/c++/7.5.0/bits/atomic_base.h:102:5: note: '::__atomic_thread_fence' declared here
  { __atomic_thread_fence(__m); }
    ^
In file included from L2_stream_binding.cu:5:
In file included from /data/home/mzw/spack/opt/spack/linux-centos7-zen/gcc-7.5.0/cuda-11.5.0-m5vnfghinm5zygqtqh2s6uxvnbvo4wht/bin/../targets/x86_64-linux/include/cuda/annotated_ptr:58:
In file included from /data/home/mzw/spack/opt/spack/linux-centos7-zen/gcc-7.5.0/cuda-11.5.0-m5vnfghinm5zygqtqh2s6uxvnbvo4wht/bin/../targets/x86_64-linux/include/cuda/barrier:10:
In file included from /data/home/mzw/spack/opt/spack/linux-centos7-zen/gcc-7.5.0/cuda-11.5.0-m5vnfghinm5zygqtqh2s6uxvnbvo4wht/bin/../targets/x86_64-linux/include/cuda/std/barrier:17:
/data/home/mzw/spack/opt/spack/linux-centos7-zen/gcc-7.5.0/cuda-11.5.0-m5vnfghinm5zygqtqh2s6uxvnbvo4wht/bin/../targets/x86_64-linux/include/cuda/std/atomic:81:22: error: unknown type name 'thread_scope'
template <class _Tp, thread_scope _Sco = thread_scope::thread_scope_system>
                     ^
/data/home/mzw/spack/opt/spack/linux-centos7-zen/gcc-7.5.0/cuda-11.5.0-m5vnfghinm5zygqtqh2s6uxvnbvo4wht/bin/../targets/x86_64-linux/include/cuda/std/atomic:81:42: error: use of undeclared identifier 'thread_scope'
template <class _Tp, thread_scope _Sco = thread_scope::thread_scope_system>
                                         ^
/data/home/mzw/spack/opt/spack/linux-centos7-zen/gcc-7.5.0/cuda-11.5.0-m5vnfghinm5zygqtqh2s6uxvnbvo4wht/bin/../targets/x86_64-linux/include/cuda/std/atomic:101:16: error: no member named '__cxx_atomic_fetch_max' in namespace 'cuda::std::__detail'; did you mean '__c11_atomic_fetch_max'?
        return std::__detail::__cxx_atomic_fetch_max(&this->__a_, __op, __m);
               ^~~~~~~~~~~~~~~
/data/home/mzw/spack/opt/spack/linux-centos7-zen/gcc-7.5.0/cuda-11.5.0-m5vnfghinm5zygqtqh2s6uxvnbvo4wht/bin/../targets/x86_64-linux/include/cuda/std/atomic:101:31: note: '__c11_atomic_fetch_max' declared here
        return std::__detail::__cxx_atomic_fetch_max(&this->__a_, __op, __m);
                              ^
fatal error: too many errors emitted, stopping now [-ferror-limit=]

If I use nvcc, everything goes well. So, I guess it’s the problem of Clang. How to work around it? Thank you!

With libcu++ you are in uncharted territory. I’m aware that clang can compile some headers from there, but not others. Making it work is on my TODO list, but it’s way too far down from the top for me to predict when/if it will get done.

In this particular case, try compiling with -Xclang -fcuda-allow-variadic-functions. This may get you past this particular error. E.g. Compiler Explorer

Thanks for your immediate reply, I truly appreciate it! Your proposed solution did deal with the “variadic” thing, but something like error: no member named 'thread_scope' in namespace 'cuda::std::__detail' remain.
I understand it takes time to fully support modification produced by CUDA, and I am truly eager to solve it because my research project is to optimize CUDA program via LLVM by utilizing annotated ptr. If I want to solve such a question, what should I do to get it work? I am new to this kind of problem and very open to co-operate with you or other developers to fix it. Thank you!

my research project is to optimize CUDA program via LLVM by utilizing annotated ptr.

It would help if you were a bit more specific about the details of what you’re trying to do and how exactly it does not work. Providing a reproducer on cuda.godbolt.org is extremely helpful if you want someone’s help with the compiler.

This is not sufficient for me to give you any specific suggestion. Details are important – command line options, which CUDA version you’re using, which clang version did produce the error, did you use the headers from CUDA itself, or did you use the upstream sources, etc.

In general, poking at the upstream sources of NVIDIA’s libcudacxx it appears that it relies on some compiler builtins that clang has not implemented yet. E.g. __nv_associate_access_property. Those would have to be implemented first. Until them you may need to provide your own implementation as a function with inline asm that would do whatever that function is supposed to do. Unfortunately NVIDIA does not document it, so I currently have no clue what exactly this builtin is supposed to do. You would need to experiment and check what NVCC produces for that function. It’s possible that __nv_associate_access_property is what they use to give NVCC hints about intended use of the pointers. If that’s the case, it may be possible to provide no-op implementations early on so you can get the code to compile.

Once the builtins are available, you will need to deal with portability issues in libcudacxx. I’m willing to bet that being compileable with clang was not high on the list of author’s priorities. Someone would need to find and fix the issues in a way that works for both clang and NVCC and upstream them to libcudacxx.

Once that is done, then you would be able to see what clang ends up generating for the code using the libcudacxx headers.

Thanks for your considerate suggestions. I am sorry that I didn’t make it clear.

It would help if you were a bit more specific about the details of what you’re trying to do and how exactly it does not work. Providing a reproducer on cuda.godbolt.org is extremely helpful if you want someone’s help with the compiler.

I cannot find a compiler the same as what I used in my local on the given website. What I am using is like below.

  1. LLVM@14.0.0 installed from spack by spack install llvm@14.0.0 +cuda cuda_arch=80 targets=nvptx,x86 ^cuda@11.5.0 ^hwloc +cuda and the version of clang++ is 14.0.0

  2. CUDA backend is of version 11.5.0 and I use the header from CUDA itself.

  3. The compiling command is clang++ -std=c++17 -O3 --cuda-gpu-arch=sm_80 --cuda-path=/data/home/mzw/spack/opt/spack/linux-centos7-zen/gcc-7.5.0/cuda-11.5.0-m5vnfghinm5zygqtqh2s6uxvnbvo4wht -I/data/home/mzw/spack/opt/spack/linux-centos7-zen/gcc-7.5.0/cuda-11.5.0-m5vnfghinm5zygqtqh2s6uxvnbvo4wht/include -L/data/home/mzw/spack/opt/spack/linux-centos7-zen/gcc-7.5.0/cuda-11.5.0-m5vnfghinm5zygqtqh2s6uxvnbvo4wht/lib64 L2_stream_binding.cu -lcudart_static -ldl -lrt -pthread -Xclang -fcuda-allow-variadic-functions -o L2_stream_binding

And here is a sample producing similar error but to note that the version of compiler is not the same.

In general, poking at the upstream sources of NVIDIA’s libcudacxx it appears that it relies on some compiler builtins that clang has not implemented yet. E.g. __nv_associate_access_property.

I also notice that this instruction is not supported by clang++ yet. It seems to me that most problems are related to the parsing of namespace. Noticing that std:: are all replaced by cuda::std:: which is recommended by libcu++, I wonder this might be not that hard to solve?

Thanks for making the process clear!

You may want to start by looking at https://github.com/NVIDIA/libcudacxx/blob/main/include/nv/detail/__target_macros which may need to be adjusted to provide correct definitions for clang.

Then I’d try to compile your sample with nvcc --keep, which would save preprocessed output produced by NVCC which you can then compare with preprocessed output produced by clang and look for the differences that would explain the missing thread_scope enum.

Good luck.

Thanks for your suggestion and I really appreciate it! I will work on it.
P.S.: CUDA 11.7 claims that it has the host compiler support for clang 13. I am not sure whether it can solve my problem.

It will not. NVCC’s use of clang as a C++ compiler for the host compilation is irrelevant in this case.

Indeed, you are right.