I wanted to ask what’s the expected support for CUDA 11.4 in Clang? We want to update from 10.2 to 11.4, and came across the following issue when compiling:
Just by including this header, and compiling for “–cuda-gpu-arch=sm_61”, we get the following error building with Clang:
/usr/local/cuda-11.4/targets/x86_64-linux/include/cooperative_groups/details/partitioning.h:85:32: error: use of undeclared identifier ‘__match_any_sync’
unsigned int subMask = __match_any_sync(thisMask, pred);
The error goes away if we use sm_70 or later, but we need sm_61 in my application.
Cooperative groups have been working just fine since they were introduced in CUDA 9, so I wonder why this would break now. NVCC has no problems with the above code. Clang can also build it just fine on CUDA 10.2.
Digging a bit deeper, I can narrow it down to this:
device void foo()
The above code compiles fine with NVCC + sm_61. However, if I remove the template, then NVCC complains that __match_any_sync is not defined. Clang complains regardless.
Does this make any sense? Is NVCC doing some magic that allows it to ignore undeclared functions if they are used in a non-instantiated template function?
PS: from what I understand, __match_any_sync is indeed only available on SM 7.0 or later. I still wonder how NVCC can handle this, is it a bug or a feature? Should Clang be updated to match NVCC?
PS2: The problem goes away in CUDA 11.5, even though it’s not officially supported in Clang. I get a warning but the code compiles. I can still see the __match_any_sync function in the CUDA header so I don’t understand what’s happening. Is Clang really compiling, or what does it do when using a “too new” CUDA version?