cuda __shfl_sync problem

Hi,

First of all, i’m not sure if i should be posting this here or in cfe-dev, but here it goes.

In order to instrument CUDA kernels i first generate device IR with:

clang++ -x cuda --cuda-device-only -emit-llvm --cuda-gpu-arch=sm_52 -o device.bc

I also have a library that contains the instrumentation stubs for which i generate IR similarly and i link it with the device IR programmatically with Linker::linkModules(…)

Then after some analysis i use llc to get ptx:

llc device.bc --march=nvptx64 --mcpu=sm_52 --filetype=asm -o device.ptx

This works fine but the problem is that the instrumentation code uses __shfl_sync() and ptxas gives me the following error:

ptxas device.ptx, line 1033; error : Feature ‘shfl.sync’ requires PTX ISA .version 6.0 or later

Now according to , __shfl_sync is supported by compute capability >= 3 and according to my GTX950 has Compute Capability 5.2. Also according to PTX ISA 6.0 does support sm_52. However llc generates: .version 4.1 .target sm_52, debug .address_size 64 Any ideas why this is happening? Or am i doing something wrong? PS. I’m using CUDA 10, driver 440 ~George

Not that I am an expert but it looks like it defaults to the minimal PTX version that supports the compute capability. You might be able to choose PTX 6.0 though.

~ Johannes

Do you mean in llc? Because i don't see such an option i'm afraid.

~George

Have you tried `-target-feature +ptx60`?

I couldn't find `-target-feature`. I am on llvm 10. Has the interface changed maybe?

Fortunately, `-mattr=+ptx60 ` did the trick.

George

I guess target-feature is a cc1 option. So maybe -Xclang -target-feature works.
However, given that you solved the problem, I'd call that a win :wink:

~ Johannes