Confused by inconsistencies in GPU magic constants

nicolasvasilache · July 14, 2023, 10:57am

I looked deeper into memory spaces and I must say I am confused.

I see:

llvm/llvm-project/blob/5c5a1a2927118eb9cc51c7b98c959a30524cc491/mlir/include/mlir/Dialect/GPU/IR/GPUBase.td#L79


      
            let genSpecializedAttr = 0;
            let cppNamespace = "::mlir::gpu";
          }
          class GPU_I32EnumAttr<string mnemonic, GPU_I32Enum enumInfo> :
              EnumAttr<GPU_Dialect, enumInfo, mnemonic> {
            let assemblyFormat = "`<` $value `>`";
          }
          
          def GPU_AddressSpaceGlobal : I32EnumAttrCase<"Global", 1, "global">;
          def GPU_AddressSpaceWorkgroup : I32EnumAttrCase<"Workgroup", 2, "workgroup">;
          def GPU_AddressSpacePrivate : I32EnumAttrCase<"Private", 3, "private">;
          def GPU_AddressSpaceEnum : GPU_I32Enum<
            "AddressSpace", "GPU address space", [
              GPU_AddressSpaceGlobal,
              GPU_AddressSpaceWorkgroup,
              GPU_AddressSpacePrivate
            ]>;
          
          def GPU_AddressSpaceAttr :
            GPU_I32EnumAttr<"address_space", GPU_AddressSpaceEnum>;

github.com

llvm/llvm-project/blob/5c5a1a2927118eb9cc51c7b98c959a30524cc491/mlir/include/mlir/Dialect/NVGPU/IR/NVGPU.td#L60


      
              static bool isSharedMemoryAddressSpace(Attribute type);
          
              /// Defines the MemRef memory space attribute numeric value that indicates
              /// a memref is located in global memory. This should correspond to the
              /// value used in NVVM.
              static constexpr unsigned kGlobaldMemoryAddressSpace = 1;
          
              /// Defines the MemRef memory space attribute numeric value that indicates
              /// a memref is located in shared memory. This should correspond to the
              /// value used in NVVM.
              static constexpr unsigned kSharedMemoryAddressSpace = 3;
            }];
          }
          
          //===----------------------------------------------------------------------===//
          // NVGPU Type Definitions
          //===----------------------------------------------------------------------===//
          
          class NVGPU_Type<string name, string typeMnemonic,
                  list<Trait> traits = []> : TypeDef<NVGPU_Dialect, name, traits> {
            let mnemonic = typeMnemonic;

and

github.com

llvm/llvm-project/blob/5c5a1a2927118eb9cc51c7b98c959a30524cc491/mlir/include/mlir/Dialect/LLVMIR/ROCDLOps.td#L40


      
              /// Get the name of the attribute used to annotate external kernel
              /// functions.
              static StringRef getKernelFuncAttrName() { return "rocdl.kernel"; }
              static constexpr ::llvm::StringLiteral getFlatWorkGroupSizeAttrName() {
                return ::llvm::StringLiteral("rocdl.flat_work_group_size");
              }
              static constexpr ::llvm::StringLiteral getReqdWorkGroupSizeAttrName() {
                return ::llvm::StringLiteral("rocdl.reqd_work_group_size");
              }
          
              /// The address space value that represents global memory.
              static constexpr unsigned kGlobalMemoryAddressSpace = 1;
              /// The address space value that represents shared memory.
              static constexpr unsigned kSharedMemoryAddressSpace = 3;
              /// The address space value that represents private memory.
              static constexpr unsigned kPrivateMemoryAddressSpace = 5;
            }];
          }
          
          //===----------------------------------------------------------------------===//
          // ROCDL op definitions

I don’t know if there are others.

Can someone please explain the discrepancies here ?

Could we please have either “one common thing” (i.e. GPU dialect) or “no common thing” (nothing in GPU dialect) ?

I am not sure what innovation at this layer brings to the table but I suspect some things are tied to the actual LLVM backend.

What’s the best way forward here ?

@ThomasRaoux @ftynse @antiagainst @krzysz00 @Hardcode84

Hardcode84 · July 14, 2023, 11:23am

The concept of global/shared(workgroup)/private memory is common across all gpu runtimes, the only differences are the actual memory space numbers. We can probably just have 3 separate attributes (global/shared/private) in GPU dialect without any numbers. Lowering then just convert this attribute to memory space, needed to specific gpu runtime.

krzysz00 · July 14, 2023, 1:44pm

Basically, what @Hardcode84 said. The idea of marking memory global/workgroup/private is common across GPUs, but what that lowers to is a function of the backend. For example, on Nvidia cards, private memory lives in LLVM address space 0, but on AMD ones, it has LLVM address space 5. And then SPIR-V has an entirely separate system of pointer types that we also want to be able to lower to.

arsenm · July 14, 2023, 3:47pm

This is imprecise, it is allocated in address space 5, alloca just returns 0. As a really old hack, it just tolerates alloca returning address space 0 and the backend inserts some casts later

krzysz00 · July 14, 2023, 4:22pm

I was under the impression - going off how MLIR’s gpu backends were coded historically - that the NVPTX backend actually wanted alloca()s to be in address space 0 unlike ours.

In any case, even if AMD and Nvidia both use address spaces 1, 3, and 5 with the same semantics, part of the motivation for this attribute was to handle GPU backends that use a different address space mapping.

arsenm · July 14, 2023, 4:28pm

I’ve never seen this as an intrinsically good thing. I think it’s obviously a > 10 year old hack to avoid changing the IR to support non-0 alloca

mehdi_amini · July 14, 2023, 5:06pm

We should have proper abstract attributes in the GPU dialects, but delaying to the translation to LLVM may not convenient: I don’t know if we have informations about the target backend when we handle an attribute. Even if we do, it would couple the translation of a generic attribute to know about all the possible backend.

Instead it seems better to me to handle this during the lowering out of the GPU dialect to NVVM/ROCM/…
However I would still want abstract named attribute for these dialects as well: using numbers is just a unfortunate artifact from LLVM (and even in LLVM I wish there would have been a bit more design work around these!).

tschuett · July 14, 2023, 5:11pm

In Rust it is called new typing. You just wrap a struct around an integer. Instead: unsigned AS you AddrSpace AS`.

jdoerfert · July 14, 2023, 6:00pm

We are going to undo the hack @arsenm mentioned. We should annotate AS early for NVIDIA too. See also [RFC] Cleaning up the NVIDIA (and potentially AMD) GPU backend

ftynse · July 17, 2023, 8:05am

You seem to be confused by the fact that enum attributes have numbers associated with enum values. Those numbers have no meaning. Everything else is pretty consistent IMO.

It looks almost like what we already have: the GPU dialect has the target-agnostic attribute with names for global, workgroup and private (top snippet). This is an enum attribute to we have to specify some values for each entry, but they have no meaning otherwise. NVVM and ROCDL have specific numeric address spaces consistent with what LLVM IR expects. The attribute conversion happens in GPU-to-NVVM/ROCDL, e.g., here llvm-project/mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp at main · llvm/llvm-project · GitHub.

We could have named address space attributes at NVVM and ROCDL levels, but that would just push the conversion to numbers from an MLIR conversion to MLIR-to-LLVM translation, or the preceding “legalization” pass. Not sure the complexity is worth it.

mehdi_amini · July 17, 2023, 9:18pm

Yes this is exactly my point above. I don’t see much reason to use “magic numbers” when we can avoid it, and I don’t see any downside to use named-enums.

ftynse · July 18, 2023, 7:58am

We’d need to support non-numeric address spaces in the LLVM dialect types, which we currently do not because it would be inconsistent with LLVM IR. And we’d need to make the translation aware of the targeted GPU somehow, presumably in a opaque way that needs design and adds complexity in (1) configuration of the translation and (2) possibility to abuse translation to do type/attribute conversion that should really be done as conversions proper. We could do address space change in the magic pass legalize-for-export pass that runs automatically before translation, but that doesn’t remove the need for it to become somehow configurable.

mehdi_amini · July 18, 2023, 8:19am

If the attributes are per-dialects (nvvm, rocdl, …) then this seems like it should already be handled trivially? Each dialect matches one LLVM target and controls the translation.
(It’s not as easy for the LLVM->MLIR translation of course…)

tschuett · July 18, 2023, 9:01am

In the later case, the llvm::Module should come with a target triple. It should not be that hard to disambiguate between the different vendors.

victor-eds · July 21, 2023, 8:57am

That would not suffice for SPIR-V, as a SPIR-V module can be consumed by different execution environments using different client APIs (and different representations for address spaces in LLVM). We implemented a solution for this in our LLVM project fork (upstreaming pending) using an option in the -convert-spirv-to-llvm pass (see).

We would be happy to upstream if this is also relevant to others and we agree it is the way to go.

mehdi_amini · July 23, 2023, 7:54am

Is this here that a “CrossWorkgroup” SPIRV address-space corresponds to different integers in LLVM depending on the target?

victor-eds · July 24, 2023, 8:41am

Yep, so the whole SPIR-V storage class to LLVM address space integer mapping will depend on the client API (defining how the execution environment, e.g., OpenCL or Vulkan, consumes SPIR-V) storage class to address space mapping and how the execution environment represents such address spaces in LLVM. I see it as something like:

Storage class (SPIR-V) -> Address space (execution environment) -> LLVM

e.g.

Generic Storage Class -> Generic address space (OpenCL) -> 4 (LLVM)

victor-eds · August 23, 2023, 3:38pm

Coming back to this: I’ve pushed a patch with the solution I and my team came up with to handle this in our SYCL-MLIR project (built on DPC++). Everyone’s welcome to take a look and discuss.

Note: We only provided a mapping to OpenCL, as that’s mostly what we care about.

keryell · September 17, 2023, 6:16pm

On the other hand SYCL-MLIR is a branch from DPC++ which targets CUDA & ROCm too.
So, I guess a larger “we” should care.

victor-eds · September 18, 2023, 8:14am

SYCL-MLIR only works for SPIR-V targets for now. That’s what I meant. Of course we’d like to support other targets DPC++ supports

Topic		Replies	Views
GPU Workgroup/shared memory address space is hard coded MLIR	9	923	December 29, 2022
Failure to lower private GPU memory MLIR	2	146	May 3, 2024
[RFC] Add NV-GPU dialect (HW specific extension of GPU dialect for Nvidia GPUs) MLIR	21	1829	April 15, 2022
Breaking change to gpu.func to spv.func lowering MLIR	23	834	April 4, 2020
Memref.alloca in AMD GPU kernels seem to lower to llvm.alloca with an incorrect address space MLIR gpu	24	1080	January 4, 2023

Confused by inconsistencies in GPU magic constants

Related topics