GPU Workgroup/shared memory address space is hard coded

Hello,
I’ve been working on a project which I can’t discuss the details of unfortunately, but I’ve run into issue integrating a new hardware architecture with MLIR.
In GPUBase.td, the workgroup address space is assigned to a constant 3. This architecture doesn’t use that value. I’ve written some code that remaps the address space to the correct value. However, this doesn’t seem like a robust approach. I think it would be better to write the value of the workgroup address space in an architecture-specific place and propagate it from there.
Thoughts?

I’ve written some code that remaps the address space to the correct value.

But that does sound like the right approach. GPU dialect needs to choose some type to represent workgroup memory. It’s a dialect level property and is needed in e.g. the verifiers of the gpu ops. Whether it used memref address space or created a new type entirely to do so, you would need to convert that type to your desired type. Seems like whether or not your code is robust may depend on how you went about doing the conversion. Are you using the type converter infrastructure?

1 Like

The memory space was extended to not be an integer necessarily I believe. Ideally we’d use a gpu enum attribute (or something more specific even?) for this purpose instead of an integer. That would avoid confusion with other pre-existing use of magic numbers (it is also more robust/readable).
The lowering would then decide what to do with the specific memory space attribute.

I’m using the type converter after conversion to llvmir, and some other code because the type convertor doesn’t cover changing llvm.mlir.global or llvm.func with external linkage.

Yeah, +1 to use symbolic attributes like GlobalMemory/SharedMemory/etc. instead of hardcoded magic integer numbers. When converting to each target then we can map those to concrete architecture-specific numeric values.

We have made the memref memory space to accept any attribute but still lots of places are assuming numeric integers. (And the long deprecated getMemorySpaceAsInt is still used a lot.) It would be nice to clean up those uses too.

Not much to add here other than that it would indeed be great if this was cleaned up. The address space in the gpu dialect is as old as mlir and has never been updated to use newer capabilities.

1 Like

+1 from me for the following enum, more or less

enum GpuAddrSpace {
  {Default,Unknown,Any,...} = 0,
  Global = 1,
  GlobalDataStore = 2,
  LocalDataStore  = 3,
  ...
  Register = 5,
  ...
}

(I ... here because I can’t quite tell what we use address spaces 4 and 6 for in the AMDGPU backend, but I’m sure we can slide them in once we find good names for them)

This probably ties in to the alloca rewrite pass that’s needed to happen At Some Point ™.

@hyperspace-otter Is there a specific reason you had to wait to re-write the address space until below the LLVM conversion? That seems a bit harder for the reasons you noted – the thing you need to correct is now more distributed.

Have you seen the MapMemRefStorageClassPass in the “mlir/Conversion/MemRefToSPIRV” folder? This seems to be an example solution to exactly what you’re asking.

That being said, depending on when you decide to apply the rewrite, there may still be operations like bufferization.alloc_tensor that would require specific logic to handle. That’s the only one I can think of at the moment, though.

Edit: I think I see the problem now. If you are using gpu.func then there is common lowering code which directly uses getWorkgroupAddressSpace to create LLVM global ops. So there needs needs to be another step inserted to map getWorkspaceAddressSpace to another value, and we would be forced to do so if we change getWorkspaceAddressSpace to be e.g. an enum attribute.

I created a patch for the address space enum update ⚙ D140644 [mlir][gpu] Migrate hard-coded address space integers to an enum attribute (gpu::AddressSpaceAttr)

1 Like

I wasn’t sure what passes might depend on the standard values, so to be on the safe side, I swap them at the last minute. The LLVM GlobalOps require special handling because the address space isn’t stored in a type. The GenericTypeConversionPattern didn’t seem to work on the signature of FuncOps either.
I’ll take a look at that SPIRV pass later.