Confused by inconsistencies in GPU magic constants

I looked deeper into memory spaces and I must say I am confused.

I see:

and

I don’t know if there are others.

Can someone please explain the discrepancies here ?

Could we please have either “one common thing” (i.e. GPU dialect) or “no common thing” (nothing in GPU dialect) ?

I am not sure what innovation at this layer brings to the table but I suspect some things are tied to the actual LLVM backend.

What’s the best way forward here ?

@ThomasRaoux @ftynse @antiagainst @krzysz00 @Hardcode84

The concept of global/shared(workgroup)/private memory is common across all gpu runtimes, the only differences are the actual memory space numbers. We can probably just have 3 separate attributes (global/shared/private) in GPU dialect without any numbers. Lowering then just convert this attribute to memory space, needed to specific gpu runtime.

1 Like

Basically, what @Hardcode84 said. The idea of marking memory global/workgroup/private is common across GPUs, but what that lowers to is a function of the backend. For example, on Nvidia cards, private memory lives in LLVM address space 0, but on AMD ones, it has LLVM address space 5. And then SPIR-V has an entirely separate system of pointer types that we also want to be able to lower to.

This is imprecise, it is allocated in address space 5, alloca just returns 0. As a really old hack, it just tolerates alloca returning address space 0 and the backend inserts some casts later

I was under the impression - going off how MLIR’s gpu backends were coded historically - that the NVPTX backend actually wanted alloca()s to be in address space 0 unlike ours.

In any case, even if AMD and Nvidia both use address spaces 1, 3, and 5 with the same semantics, part of the motivation for this attribute was to handle GPU backends that use a different address space mapping.

I’ve never seen this as an intrinsically good thing. I think it’s obviously a > 10 year old hack to avoid changing the IR to support non-0 alloca

1 Like

We should have proper abstract attributes in the GPU dialects, but delaying to the translation to LLVM may not convenient: I don’t know if we have informations about the target backend when we handle an attribute. Even if we do, it would couple the translation of a generic attribute to know about all the possible backend.

Instead it seems better to me to handle this during the lowering out of the GPU dialect to NVVM/ROCM/…
However I would still want abstract named attribute for these dialects as well: using numbers is just a unfortunate artifact from LLVM (and even in LLVM I wish there would have been a bit more design work around these!).

In Rust it is called new typing. You just wrap a struct around an integer. Instead: unsigned AS you AddrSpace AS`.

We are going to undo the hack @arsenm mentioned. We should annotate AS early for NVIDIA too. See also [RFC] Cleaning up the NVIDIA (and potentially AMD) GPU backend

You seem to be confused by the fact that enum attributes have numbers associated with enum values. Those numbers have no meaning. Everything else is pretty consistent IMO.

It looks almost like what we already have: the GPU dialect has the target-agnostic attribute with names for global, workgroup and private (top snippet). This is an enum attribute to we have to specify some values for each entry, but they have no meaning otherwise. NVVM and ROCDL have specific numeric address spaces consistent with what LLVM IR expects. The attribute conversion happens in GPU-to-NVVM/ROCDL, e.g., here https://github.com/llvm/llvm-project/blob/main/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp#L251-L269.

We could have named address space attributes at NVVM and ROCDL levels, but that would just push the conversion to numbers from an MLIR conversion to MLIR-to-LLVM translation, or the preceding “legalization” pass. Not sure the complexity is worth it.

Yes this is exactly my point above. I don’t see much reason to use “magic numbers” when we can avoid it, and I don’t see any downside to use named-enums.

We’d need to support non-numeric address spaces in the LLVM dialect types, which we currently do not because it would be inconsistent with LLVM IR. And we’d need to make the translation aware of the targeted GPU somehow, presumably in a opaque way that needs design and adds complexity in (1) configuration of the translation and (2) possibility to abuse translation to do type/attribute conversion that should really be done as conversions proper. We could do address space change in the magic pass legalize-for-export pass that runs automatically before translation, but that doesn’t remove the need for it to become somehow configurable.

If the attributes are per-dialects (nvvm, rocdl, …) then this seems like it should already be handled trivially? Each dialect matches one LLVM target and controls the translation.
(It’s not as easy for the LLVM->MLIR translation of course…)

1 Like

In the later case, the llvm::Module should come with a target triple. It should not be that hard to disambiguate between the different vendors.

1 Like

That would not suffice for SPIR-V, as a SPIR-V module can be consumed by different execution environments using different client APIs (and different representations for address spaces in LLVM). We implemented a solution for this in our LLVM project fork (upstreaming pending) using an option in the -convert-spirv-to-llvm pass (see).

We would be happy to upstream if this is also relevant to others and we agree it is the way to go.

Is this here that a “CrossWorkgroup” SPIRV address-space corresponds to different integers in LLVM depending on the target?

Yep, so the whole SPIR-V storage class to LLVM address space integer mapping will depend on the client API (defining how the execution environment, e.g., OpenCL or Vulkan, consumes SPIR-V) storage class to address space mapping and how the execution environment represents such address spaces in LLVM. I see it as something like:

Storage class (SPIR-V) -> Address space (execution environment) -> LLVM

e.g.

Generic Storage Class -> Generic address space (OpenCL) -> 4 (LLVM)

Coming back to this: I’ve pushed a patch with the solution I and my team came up with to handle this in our SYCL-MLIR project (built on DPC++). Everyone’s welcome to take a look and discuss.

Note: We only provided a mapping to OpenCL, as that’s mostly what we care about.

1 Like

On the other hand SYCL-MLIR is a branch from DPC++ which targets CUDA & ROCm too.
So, I guess a larger “we” should care. :slight_smile:

SYCL-MLIR only works for SPIR-V targets for now. That’s what I meant. Of course we’d like to support other targets DPC++ supports :smile: