Hi,
First of all, i’m not sure if i should be posting this here or in cfe-dev, but here it goes.
In order to instrument CUDA kernels i first generate device IR with:
clang++ -x cuda --cuda-device-only -emit-llvm --cuda-gpu-arch=sm_52 -o device.bc
I also have a library that contains the instrumentation stubs for which i generate IR similarly and i link it with the device IR programmatically with Linker::linkModules(…)
Then after some analysis i use llc to get ptx:
llc device.bc --march=nvptx64 --mcpu=sm_52 --filetype=asm -o device.ptx
This works fine but the problem is that the instrumentation code uses __shfl_sync() and ptxas gives me the following error:
ptxas device.ptx, line 1033; error : Feature ‘shfl.sync’ requires PTX ISA .version 6.0 or later
Now according to , __shfl_sync is supported by compute capability >= 3 and according to my GTX950 has Compute Capability 5.2. Also according to PTX ISA 6.0 does support sm_52. However llc generates: .version 4.1 .target sm_52, debug .address_size 64 Any ideas why this is happening? Or am i doing something wrong? PS. I’m using CUDA 10, driver 440 ~George