[RFC] `address` dialect

TLDR;

This RFC proposes the address dialect as an intermediate layer between high-level memory representations like memref and low-level dialects like LLVM.

An early implementation of this proposal can be found in this repository. It contains the core operations of the dialect, some canonicalizations, a pass for applying the bare pointer convention, and a conversion pass to LLVM.

Why?

Currently, there are no mechanisms to represent addresses that don’t involve using the LLVM or SPIR-V dialects; thus, any high-level are likely forced to use memrefs even in inconvenient places, like GPU kernel parameters.

See also these discussions:

Proposal

The initial proposal is comprised of seven operations and the address type -ideally this type would be included as a builtin type. These operations are:

One immediate consequence of introducing the address dialect is that the bare pointer convention can now be applied as a pass in high-level dialects.

 func.func @bar(%arg0: memref<i32>) -> memref<2xi32, strided<[1]>> {
   %0 = call @foo(%arg0) : (memref<i32>) -> memref<2xi32, strided<[1]>>
   return %0 : memref<2xi32, strided<[1]>>
 }
 // Gets transformed to:
 func.func @bar(%arg0: !addr.address) -> !addr.address {
   %0 = addr.to_memref %arg0  : memref<i32>
   %1 = addr.from_memref [%0  : memref<i32>]
   %2 = call @foo(%1) : (!addr.address) -> !addr.address
   %3 = addr.to_memref %2  : memref<2xi32, strided<[1]>>
   %4 = addr.from_memref [%3  : memref<2xi32, strided<[1]>>]
   return %4 : !addr.address
 }
 // After canonicalization:
 func.func @bar(%arg0: !addr.address) -> !addr.address {
   %0 = call @foo(%arg0) : (!addr.address) -> !addr.address
   return %0 : !addr.address
 }

Most of these operations have trivial lowerings to LLVM, see AddrToLLVM.cpp. type_offset is lowered as a GEP op with a null pointer.

Edit:
Type syntax:

address ::= `address`  (`<` address-space^ `>`)?
address-space ::= attribute-value

Future work & directions

  • Remove the bare-ptr-convetion option from all passes.
  • Create pointer aliasing analysis passes.
  • Extend the dialect with more operations so that memref gets lowered to address instead of LLVM or SPIR-V, adding another level of possible optimizations.
4 Likes

Should an address carry around a memory space?

That is, for example, should we distinguish address<#gpu.address_space<workgroup>> from address<#gpu.address_space<private>> from address, aka address<null>?

I’d argue for yes, otherwise the lowering to LLVM may not quite work.

2 Likes

My bad :sweat_smile:, I should’ve included an example with an address space or mentioned it, because the proposal already includes it. Here’s the syntax for the type:

address ::= `address`  (`<` address-space^ `>`)?
address-space ::= attribute-value

Here’s the same example using a different address space:

func.func @bar(%arg : memref<i32, #gpu.address_space<workgroup>>, %arg2: !addr.address) -> memref<2xi32, strided<[1], offset: 0>, #gpu.address_space<workgroup>> {
  %memref = func.call @foo(%arg) : (memref<i32, #gpu.address_space<workgroup>>) -> memref<2xi32, strided<[1], offset: 0>, #gpu.address_space<workgroup>>
  return %memref : memref<2xi32, strided<[1], offset: 0>, #gpu.address_space<workgroup>>
}
// Gets transformed to:
func.func @bar(%arg0: !addr.address<#gpu.address_space<workgroup>>) -> !addr.address<#gpu.address_space<workgroup>> {
  %0 = addr.to_memref %arg0  : memref<i32, #gpu.address_space<workgroup>>
  %1 = addr.from_memref [%0  : memref<i32, #gpu.address_space<workgroup>>]
  %2 = call @foo(%1) : (!addr.address<#gpu.address_space<workgroup>>) -> !addr.address<#gpu.address_space<workgroup>>
  %3 = addr.to_memref %2  : memref<2xi32, strided<[1]>, #gpu.address_space<workgroup>>
  %4 = addr.from_memref [%3  : memref<2xi32, strided<[1]>, #gpu.address_space<workgroup>>]
    return %4 : !addr.address<#gpu.address_space<workgroup>>
}

Overall, this is a dialect I’d be pretty happy to see and it fits my needs as a GPU kernel developer

What does the address type provides that differentiate itself from !llvm.ptr?

I don’t understand what is the meaning of %addr1 = address.constant 1 : !address<3 : i32> ; is this like void *addr1 = (void *)1; in pseudo-C (minus the address space).
If so: same as above, what does it provide over the LLVM dialect?

(I haven’t look at the rest of the dialect, but I’ll like have the same kind of questions)

+1

This is really needed as intermediate step between memref and low level dialects like llvm/SPIR-V, we were trying to do address calculations directly on memref dialect (https://github.com/llvm/llvm-project/blob/main/mlir/lib/Dialect/GPU/Transforms/DecomposeMemrefs.cpp) for our SPIR-V pipeline but it’s just awkward.

Because llvm is not the only target? E.g. for our internal SPIR-V pipeline we are going memref->(address calculations)->SPIR-V and involving llvm dialect there will be just wrong layering.

Sorry but I don’t get it: if there is zero semantics difference, then I don’t quite see why this can’t be used there, that requires more justification IMO.

Because adding llvm dialect to pipeline which has nothing to do with llvm just for the ptr type is undesirable.

And, after some thought, there is actually semantic difference, llvm ptr only supports integer memory spaces, but for general address calculations we probably want to support attributes of arbitrary types, just like memref.

The first difference is that !llvm.ptr only supports integer attributes as address spaces, and address accepts any attribute for the address space just like memref.
Modifying !llvm.ptr to accept non-integer address spaces is possible, but it would stop modeling LLVM pointers.

Also, semantically, an address might be more than a traditional LLVM pointer; for example, it could be a fat pointer with an underlying struct representation.

Yes, initially, I had addr.null instead of addr.constant, but I imagine someone else might have a use for nonnull constants.

The mappings to LLVM are all straightforward; however, first there’s no compatibility layer with memref, as LLVM is usually the final target. Secondly, the SPIR-V and CPP targets also exist. Having an address dialect, enables the possibility of target independent address analysis like aliasing.

1 Like

While this proposal looks good to me in many ways, I couldn’t see why someone would want to use a low-level abstraction like an address with non-integer address spaces. That leaves the utility of this new dialect to provide a unifying abstraction over LLVM and SPIR-IV addresses. Target-independent address analysis sounds appealing, but have you considered the fact that such things could be implemented using operation interfaces without needing to create a new dialect and an unnecessary extra layer just for that purpose?

@Hardcode84 For my understanding, why can’t the SPIR-V pipeline use !llvm.ptr for its addresses?

Here’s one example:
Address spaces might collapse to different constants depending on the target, e.g: #gpu.address_space<private> collapses to 5 for ROCDL, but 0 for NVVM. Thus, a higher-level address space can abstract target specific concepts.

Yes, but I think the end user will likely have to recreate some of these ops to gain the functionality -at least I had to, if they don’t want to use !llvm.ptr.

Also, IMHO, this alone:

should be a great justification to have it.

Let’s consider the following example:

gpu.launch_func @kernels::@matmul ... arg(
  %A : memref<?x?xf32, #gpu.address_space<global>>, 
  %B : memref<?x?xf32, #gpu.address_space<global>>,
  %C : memref<?x?xf32, #gpu.address_space<global>>
)

Using gpu-to-llvm will transform the kernel signature into 21 arguments (3 x (2 ptr + 1 off + 2 size + 2 strides)), and there aren’t many target independent options to lower the argument count before reaching gpu-to-llvm.

However, with address, one can transform the kernel beforehand and end up using only 6 arguments:

%Aptr = addr.from_memref [%A : memref<?x?xf32, #gpu.address_space<global>>]
%Bptr = addr.from_memref [%B : memref<?x?xf32, #gpu.address_space<global>>]
%Cptr = addr.from_memref [%C : memref<?x?xf32, #gpu.address_space<global>>]
gpu.launch_func @kernels::@matmul ... arg(
  %Aptr : address<#gpu.address_space<global>>, 
  %Bptr : address<#gpu.address_space<global>>,
  %Cptr : address<#gpu.address_space<global>>,
  %n: i32,
  %p: i32,
  %m: i32
)

And the above code should work both for LLVM and SPIR-V.

Why is that?
It seems undesirable to me to add redundant concepts to MLIR, this increase the API surface unnecessarily and bloat more existing pipeline without adding value.

Not necessarily, this is something on my TODO list for a while actually, we just need an attribute interface for the translation. This can be something that takes a TargetTriple during the translation as well.
(I would even want to make LLVM support this actually, I don’t think this is out of reach actually, I just don’t have the bandwidth for this).

This is the part that isn’t clear to me, do you have example? How does this play with cast_int and ptradd?
That said I don’t understand your interactions with memref either: seems like 1) it assumes a specific representation of the memref but also 2) it’s not clear to me what you do with the pointer to the data alone without the memref metadata in general?
This also seems to overlap with memref.extract_aligned_pointer_as_index, but then the RFC isn’t really explaining all this.

Overall the problem to solve is not clear to me (other than splitting part of LLVM pointers right now), I could imagine a new integer type that would be “pointer-size” in order to abstract the target pointer size and perform arithmetic on it (to solve Should IndexType be parameterized? ; but that actually wanted to annotate the pointer itself, which you don’t solve), that would still just be an integer, and it looks awfully like the LLVM pointer type. You’re also linking to a LLVM codegen thread which I’m not connecting to this proposal immediately just now.

1 Like

(flyby comment)

This actually comes up semi-frequently. There are various lowering flows when you split programs up where (basically) the description of the memref (metadata) gets separated from the base pointer. Having these only materialize separately during the conversion to LLVM has been a large source of issues and hacks. It’s better than it was due to some work to make memref descriptors more first class. @Hardcode84 did a lot of work in that area, and I expect they are now working to continue making this more of an incremental lowering vs a one shot exit to LLVM.

With that said, I’m not sure I have an opinion on this specific RFC except to say that I support the work generally to make memref lowering compose above the LLVM level. Using types from the llvm dialect for other lowering flows at the same level of abstraction versus needing to create the equivalent concept elsewhere may have merit. It also comes up semi-regularly with !llvm.struct.

There’s a pretty simple solution, let’s make !llvm.ptr a builtin type (ptr) capable of supporting any address space attribute, then there would be no need for the address type. However, we’d still need a dialect to interact with it.

One more reason to not want !llvm.ptr in higher representations is the need to link to LLVMDialect.

A disclaimer, both cast_int & ptradd would need special lowerings via user supplied patterns or an interface attached to the address space for them to work with fat pointers, and cast_int might not be semantically valid for all address spaces.

One simple example is an always de-allocatable pointer:

struct FatPtr {
  int8_t *allocatedPtr;
  int8_t *ptr;
};

in this case:

cast_int(myFatPtr) == static_cast<intptr_t>(myFatPtr.ptr)
ptradd(myFatPtr, indx) == FatPtr(myFatPtr.allocatedPtr, myFatPtr.ptr + indx)

A more complex example with HW support is: Representing buffer descriptors in the AMDGPU target - call for suggestions

from_memref & to_memref only assume that memref’s have two pointers (can be the same), the actual representation of the memref is only known until it’s lowered.

In the presence of multiple memrefs there might be redundant metadata, see the example at the end [RFC] `address` dialect - #11 by fabianmc. In that example the kernel signature went from 21 arguments to 6 arguments by using address, and the memrefs can be reconstructed for computation inside the kernel.

It’d be even possible to add an attribute to the function specifying redundant metadata, and a pass making the transformation.

That op could be potentially removed, also, there’s no guarantees index can support the pointer.

Because llvm is a backend dialect and using it in lowering for completely unrelated backend is just a bad layering. If we are really want to unify pointer type between some middle-level dialect and llvm (which I don’t think we should), this pointer type should live in middle-level dialect and not in llvm.

@fabianmc
Also, spir-v (and probably other backends? cpp?) is using typed pointers, so we may want to make addr.address typed too. We can get away with casting to/from i8* during lowering but I would prefer not to.

I have also felt the need for an address or a dialect that models variables while working on the OpenMP dialect. The OpenMP standard extensively uses and refers to variables. At the moment only LLVM dialect has this. This makes it difficult to test the conversion to LLVM in MLIR proper since the MLIR that models variables is already in LLVM.

Besides this, I think it will be also a good reference for front-end developers.

1 Like

We sort of have this already - there’s a way to register conversions for attributes during type conversion that the GPU backends use for the GPU address spaces … but it’s also something of a hack and having that just be a rewrite over memref + address may well be a better solution.

We can explore it, however, I prefer a type-less version for the same reasons LLVM switched to opaque pointers, a typed pointer doesn’t provide any real information on the underlying data and it makes optimizations harder.
I do think it should be possible to create an operation to easily cast address to typed pointers in other dialects.

Exactly, the OMP MLIR dialect is way too low level to fully make it work with memref as it’s users are usually language front-ends like flang and expects pointers (OpenMP_PointerLikeType) for almost everything, but too high level to make it only work with LLVM -in core MLIR.

I don’t quite see how it is a problem?

Is this still “a pointer” though? Why wouldn’t this it’s own thing, like memref?

You only show one way, not the round-trip.

You have to be more specific, because this thread is pretty long and I don’t quite see the immediate connection, in particular some bits like:

We’ve lived for many years with buffer access intrinsics that don’t use pointers at all.

This my intuition looking at it as well: these don’t need to be modeled as pointer: they aren’t.

  • Improving the alias information available to the backend
    I’m going to focus on the alias info point since that’s what I know and care more about.

This shows a motivation that seems very specific to the internal of LLVM more than anything really principled about unifying a pointer representation with consistent semantics.

I don’t know what that means. A dialect is a semantic definition types and operation: here seems to me that we have a match for a Type and some operations to manipulate this type…

I think there was a time that we kind of implicitly organized things the other way and only used the llvm dialect as the end point of (some) mlir lowering flows which were exiting to “llvm proper.” This may have been more a remnant of the standard dialect days. I don’t think this approach really applies anymore (at least to the same extent), but I don’t think it’s been talked about as a more general vocabulary to draw from. It isn’t surprising to me that there is debate or confusion on this as the thinking of a number of core contributors has evolved on this over the years but I’m not aware of a general discussion on design philosophy.

It might be more productive to ask why not. One reason that I can think of is that it is monolithic, combining a lot of unrelated concepts. If you buy into a little part of it for a domain specific thing that isn’t naturally exiting to it, there is no real way to define the subset that you want to ensure you legalize in and out of (or know that some canonicalization won’t push you into areas that are not defined for your domain) and you will just be picking and choosing specific legality rules vs full conversions, etc. The reason to not use it for everything that it has a thing for derives from the same reasoning on why we have dialects at all!

Also, your argument against redundancy could easily be flipped to ask why not just use the spirv types and ops everywhere… These things occupy different positions relative to each other. In fact it is common to both convert from and to the other, depending on the use case. It would get really weird to not have the redundancy. I think that limited redundancy of things that are part of different hierarchies or levels of lowerings is kind of part and parcel with what mlir is. But with that said, I agree with you and think we should be reigning that in when we can.

A dialect can also be a unit of managing scope for legalization flows, and it remains the cleanest to have full conversions that make an entire dialect illegal at certain points. The llvm dialect is monolithic and serves both roles. It isn’t just a semantic definition of types and ops. It occupies a place relative to others.

I’m arguing the above because I think it is a point of design that needs some more flexibility than how it is being presented, but my opinion is actually different:

Bifurcating types is much more costly to the ecosystem than ops, and there are often stronger reasons to have scoped sets of ops for specific purposes or domains. Just saying that llvm ops being used at any level of the stack creates a different kind of complexity and ambiguity than saying that the types have broader applicability. I wouldn’t mind seeing some of the battle tested types in llvm be moved up to built-in and just be made available as part of the fabric. We’re already halfway there with integer, float, vector (implicated in lowering), etc. Why not ptr and struct? There is a lot more mileage and more universal applicability on these types than some of the things currently in built-in. I don’t believe the same analysis applies to arbitrary ops, though. We could also say that it is fine to use types from the llvm dialect everywhere, but I think that feels wrong to folks because the same reasoning is very much not true of the ops.