Missing TargetPrefix for NVVM intrinsics

Justins:

I noticed that the intrinsics in IntrinsicsNVVM don't specify a
TargetPrefix. This seems like a simple omission, so I was going to
simply throw a `let TargetPrefix = "nvvm" ` block around them, but this
doesn't quite work.

There seem to be three prefixes that are used in this file. About 900
are int_nvvm_*, 30 are int_ptx_*, and 1 is int_cuda. It isn't clear to
me if this inconsistency is intentional or warranted - should these all
be named int_nvvm_*? Is there a good reason to differentiate int_ptx_*?
Why does __syncthreads map to int_cuda_syncthreads, rather than
int_nvvm_syncthreads?

I'm probably going to go ahead and add the TargetPrefix to the nvvm
intrinsics, but I'm not familiar enough with NVPTX to know what to do
with the others.

Thanks,
-- Justin

It's all historical, unfortunately. Before NVIDIA open sourced NVPTX, we had the PTX target. The NVPTX target continued to accept the old PTX intrinsics to ease compatibility. That was ~3.2, we can likely just get rid of the PTX intrinsics now. The llvm.cuda intrinsic is an unfortunate naming quirk used internally. This should be handled by llvm.nvvm.bar0, so it should be okay to remove that intrinsic.

@jlebar, any issues on your end with removing all of the non-nvvm intrinsics?

@jlebar, any issues on your end with removing all of the non-nvvm intrinsics?

I don't think so, if the functionality is all there from the nvvm
intrinsics. Some if it may involve clang changes, which I'm not
exactly volunteering to do, but they should be simple.

Justin Lebar <jlebar@google.com> writes:

@jlebar, any issues on your end with removing all of the non-nvvm intrinsics?

I don't think so, if the functionality is all there from the nvvm
intrinsics. Some if it may involve clang changes, which I'm not
exactly volunteering to do, but they should be simple.

Okay, so if I've understood you both correctly, the attached six patches
should do the trick, for both LLVM and clang. First, they rename the
shfl intrinsics from __builtin_ptx_shfl to __nvvm_shfl. Second,
they replace uses of cuda.syncthreads with nvvm.barrier0. Third, all of
the ptx intrinsics and builtins are removed. Finally, I wrap the nvvm
intrinsics with a TargetPrefix.

Let me know if you want me to break any of these out into specific
review threads on -commits.

llvm-0001-NVPTX-Make-the-llvm.nvvm.shfl-intrinsics-and-builtin.patch (4.8 KB)

clang-0001-NVPTX-Rename-__builtin_ptx_shfl-__nvvm_shfl.patch (2.69 KB)

llvm-0002-NVPTX-Replace-uses-of-cuda.syncthreads-with-nvvm.bar.patch (10.3 KB)

llvm-0003-NVPTX-Remove-the-legacy-ptx-intrinsics.patch (26.5 KB)

clang-0002-NVPTX-Remove-the-legacy-ptx-builtins.patch (15.2 KB)

llvm-0004-IR-Set-a-TargetPrefix-for-nvvm-intrinsics.patch (1.04 KB)