The concept of global/shared(workgroup)/private memory is common across all gpu runtimes, the only differences are the actual memory space numbers. We can probably just have 3 separate attributes (global/shared/private) in GPU dialect without any numbers. Lowering then just convert this attribute to memory space, needed to specific gpu runtime.
Basically, what @Hardcode84 said. The idea of marking memory global/workgroup/private is common across GPUs, but what that lowers to is a function of the backend. For example, on Nvidia cards, private memory lives in LLVM address space 0, but on AMD ones, it has LLVM address space 5. And then SPIR-V has an entirely separate system of pointer types that we also want to be able to lower to.
I was under the impression - going off how MLIR’s gpu backends were coded historically - that the NVPTX backend actually wanted alloca()s to be in address space 0 unlike ours.
In any case, even if AMD and Nvidia both use address spaces 1, 3, and 5 with the same semantics, part of the motivation for this attribute was to handle GPU backends that use a different address space mapping.
We should have proper abstract attributes in the GPU dialects, but delaying to the translation to LLVM may not convenient: I don’t know if we have informations about the target backend when we handle an attribute. Even if we do, it would couple the translation of a generic attribute to know about all the possible backend.
Instead it seems better to me to handle this during the lowering out of the GPU dialect to NVVM/ROCM/…
However I would still want abstract named attribute for these dialects as well: using numbers is just a unfortunate artifact from LLVM (and even in LLVM I wish there would have been a bit more design work around these!).
We could have named address space attributes at NVVM and ROCDL levels, but that would just push the conversion to numbers from an MLIR conversion to MLIR-to-LLVM translation, or the preceding “legalization” pass. Not sure the complexity is worth it.
We’d need to support non-numeric address spaces in the LLVM dialect types, which we currently do not because it would be inconsistent with LLVM IR. And we’d need to make the translation aware of the targeted GPU somehow, presumably in a opaque way that needs design and adds complexity in (1) configuration of the translation and (2) possibility to abuse translation to do type/attribute conversion that should really be done as conversions proper. We could do address space change in the magic pass legalize-for-export pass that runs automatically before translation, but that doesn’t remove the need for it to become somehow configurable.
If the attributes are per-dialects (nvvm, rocdl, …) then this seems like it should already be handled trivially? Each dialect matches one LLVM target and controls the translation.
(It’s not as easy for the LLVM->MLIR translation of course…)
That would not suffice for SPIR-V, as a SPIR-V module can be consumed by different execution environments using different client APIs (and different representations for address spaces in LLVM). We implemented a solution for this in our LLVM project fork (upstreaming pending) using an option in the -convert-spirv-to-llvm pass (see).
We would be happy to upstream if this is also relevant to others and we agree it is the way to go.
Yep, so the whole SPIR-V storage class to LLVM address space integer mapping will depend on the client API (defining how the execution environment, e.g., OpenCL or Vulkan, consumes SPIR-V) storage class to address space mapping and how the execution environment represents such address spaces in LLVM. I see it as something like:
Storage class (SPIR-V) -> Address space (execution environment) -> LLVM
Generic Storage Class -> Generic address space (OpenCL) -> 4 (LLVM)