[RFC] contiguous<permutation, offset: O> layout (and changing default memref layout)

I’ve put up [mlir] Add a contiguous<perm, offset> layout, use as identity layout by krzysz00 · Pull Request #131663 · llvm/llvm-project · GitHub for this change, but since it’s a substantial change to a core MLIR primitive, I figure it deserves an RFC thread so people who aren’t watching the PRs know to comment.

To reproduce the description:

This PR introduces a new ContiguousLayoutAttr, which holds a permutation of the dimensions of the memref and an optional offset, and replaces the default memref layout (which was previously the N-D identity map) with contiguous.

In general, the syntax for this attribute is
contiguous<[I0, I1, .. IN], offset: O> where I0 through IN are integers in 0..=N and O is either a static offset or ? for a dynamic one. If the offset is 0, the offset isn’t printed, while if the permutation is [0, 1, ... N], we print it as N+1. That is, the 2-D identity/row-major layout is contiguous<2> and not (d0, d1) -> (d0, d1) like it used to be.

Motivation

In summary, the contiguous<> layout both fills in the “layout hierarchy” (all contiguous layouts are strided, and all strided layouts are affine maps, but you can’t go back down) with a primitive that enables useful optimizations and makes it easier to have relocatable/mergable allocations in MLIR code.

Consider memref<?0 x ?1 x ?2 x T> - a memref with three dynamic dimensions. This memref has a row-major identity layout.

Suppose I want to make this memref “relocatable” - declare that it has an unonwn offset so that I can, for example, have a pass that merges allocations into larger contiguous buffers. With the current layouts in MLIR, I can either use:

  • strided<[?, ?, 1], offset: ?>, which loses the fact that this is a row-major memref. We don’t know the relationship of those two ?s to each other.
  • (d0, d1, d2)[s0] -> (d0, d1, d2 + s0), which isn’t a “strided” layout by existing definitions and encounters the fact that meany memref operations don’t handle non-strided or arbitrary affine layouts.

Being able to use contiguous<3, offset: ?> (or, in its long form, contiguous<[0, 1, 2], offset: ?>`) resolves this isue. That is now a strided layout that directly encodes the fact that this is a 3-D row-major memref with some dynamic offset.

As seen in my changes to some passes like gpu-decompose-memrefs or the vector transfer op flattener, knowing that a layout is contiguous - if not necessarily row-major, allows us to use operations like affine.linearize_index for index computations, which fold well with operations like affine.delinearize_index, allowing for eliminating unnecessariy “divide an ID into parts and multiply them together again” computations that often come up in tiling-based code generation that the affine map simplifier has difficulty with or generates inefficiently.

This layout also allows describing permuted layouts, like column-major layouts, without needing code to handle the general complexity of an affine map layout. For exmample,

memref.expand_shape %arg [[0, 1], [2]]
  : memref<?x?xi32, contiguous<[1, 0]>
  into memref<?x?x?xi32, contiguous<[1, 2, 0]>

accurately describes the effects of expand_shape’ing a column-major memref.

Why change the default layout?

Since the built-in layout attributes form a hierarchy of specificy (all contiguous layouts are strided …), there are multiple ways to represent the identity row-major layout. The contiguous layout is the most specific of these, so it makes sense to declare it the canonical form of the identity layout. That is, strided<[?, ?, 1]> is less specific of a layout for memref<?x?x?xi32>. The identity affine_map also has non-canonical forms and is less spcefici: code that can handle te identity AffineMapAttr may not know what to do with other affine maps because of how general they are, but it will be easier to go from the identity ContiguousLayoutAttr to permuted and/or offset attributes.

Therefore, making the contiguous layout the default form of MemRefLayoutAttrInterface makes writing memref-handling code easier going forward.

Concrete impacts of the change

  1. memref<...xT, affine_map<(d0, d1, ..., dN) -> (d0, d1, ... dN)> no longer prints as memref<...xT>.
  2. Similarly, the default memref layout is no longer an AffineMapAttr. This didn’t break any code in-tree, since almost everything had moved to MemRefLayoutAttrInterface::getAffineMap(), but it’s worth calling out.
  3. memref.subview, memref.reinterperet_cast, and so on do not alwasy produce a strided layout: if code needed to create strided<[], offset: O>, it’ll now create contiguous<0, offset: O> and similarly for strided<[1], offset: O>, which is a 1-D contiguous layout. This is facilitated by the new StridedLayout::getCanonical method, which doesn’t always return a strided layout
  4. Some passes have been updated to use affine.linearize_index disjoint when they were flatting a contiguous (subset of) a memref, allowing for more efficient code generatino compared to an affine.apply over the strides.
  5. getStridesAndOfffset() has learned a new trick for affine maps: any “offset permutation” (that is, a permutation where the last result can be dX + E for any E) is now considered strided. This means that you can now getStridesAndOffset a
    memref<MxNxf32, affine_map<(i, j) -> (j, i)>, which would previously fail.
  6. MemRefType::canonicalizeLayout has been updated to canonicalize strided layouts to their contiguous equivalent for static-shaped memrefs.
  7. bufferization.buffer_layout can be any MemRefLayoutAttrInterface, and any identity maps present in such attributes are transparently migrated to their contiguous<> equivalents.
  8. Certain reshape folders will now work with any row-major layout, even if it has an offset.

While this is a breaking change, we expect that it will allow long-term improvments to how MLIR represents memrefs in common situations.

3 Likes

cc @ftynse @MaheshRavishankar @kuhar @stellaraccident @jpienaar @matthias-springer @dcaballe @banach-space @makslevental - and please fell free to ping anyone who I left of the list of folks who might have an opinion (I was going off of Github’s reviewer assignments)

1 Like

Overall the new layout makes sense to me, +1!

This I think is the key observation. The strided layout is not expressive enough for certain transformations, and the affine map layout is too unwieldy (and in some cases ill-defined per extract_strided_metadata) for many existing operations.

I have a couple minor questions and comments about staging these changes.

Is this replaced by the contiguous layout? i.e. memref<...xT, contiguous<n, offset: 0>> prints as memref<...xT> instead? If so this seems like a fairly far reaching change whose implications for downstream projects could be non-trivial. This is a problem because there was a default layout for memrefs to begin with; have we considered making the layout attribute nullable?

This is maybe better as a comment on the above PR, but I would expect to see the introduction of the new layout as a separate change from functional changes. Better staging can make it easier for downstream to adapt if there are use cases we don’t know about.

The current state of the world is that, when you pass in the null layout attribute - that is, MemRefLayoutAttrInterface{} as it is most commonly known - you’ll automatically get (d0, d1, ... d(R-1)) -> (d0, d1, ... d(R-1)) where R is the rank of your memref.

My PR changes that to contiguous<R> - that is, contiguous<[0, 1, ... R-1], offset: 0>, which represents that same identity mapping in a more structured way.

The changes arose from needing to make existing code that operated on strided<> also work on contiguous<> and/or not break existing tests. It’s possible there are some separable ones in there, if I poked at it again, so that’s a fair point.

I’m generally supportive of the change, but I’d structure it differently to make it potentially less controversial, specifically holding off on changing the default because the default is not what the RFC suggests it is.

  • Introduce ContiguousPermutationMemRefLayoutAttr (or some less verbose name), make it implement MemRefLayoutAttrInterface, get it integrated with transforms that care, etc.
  • Get some mileage with this change before any next step.
  • Consider some sort of progressive relaxation: contiguous → strided → affine map as discussed in the RFC. My intuition says we want this to be explicit, but I am sensitive to any good argument here. Specific questions are whether we want type equality between layouts that are mathematically the same, and if that’s even possible to implement (and what would be the repercussions if not). If not, whether we still want to “canonicalize” layouts on construction or not. This is a more general question than just dropping the canonical identity map.
  • Change the default, or not, based on the above consideration. Note that the current default is no layout attribute, which is interpreted as contiguous row-major. We may go into the hardening direction and request there to always be a layout, or not. Furthermore, MemRefLayoutAttrInterface requires any layout to be convertible to an AffineMap, which is largely for the benefit of affine-based (downstream) compilation flows.

Separately from that, I’d recommend carefully considering the utility of having offset in the layout. There have been numerous discussions on its (in)utility in the strided layout, but given how highly load-bearing the type is, we never managed to decide about removing it. Practically, what can be expressed with it that can’t be just a change to the base pointer?

Final question, how does this layout lower to pointer arithmetic?

2 Likes

So, while the user-facing default is “no layout”, the builders for MemRefType turn that into (d0, d1, ..., d(R-1)) -> (d0, d1, ..., d(R-1)) right then and there, so the default - that is, the layout you get if you don’t specify, is actually the identity affine map. It isn’t stored as MemRefLayoutAttrInterface(nullptr) - your ability to pass that in is a user-facing convenience.

That is, if we look at the current state of the main branch, llvm-project/mlir/lib/IR/BuiltinTypes.cpp at 896df5c4bf77a9a8933c0a5cfdcabccad4c50471 · llvm/llvm-project · GitHub is the exact spot where we define the default layout to be an identity affine map with no symbols.


Re type equality for layout equivalence … I don’t think we want that, given how == works on Type (it’s a ponter-equality check on the context). There is the canonicalizeLayout() method that can be used to check for layout equivalence in transforms that only really care about that - I’ve seen it in a few spots, but probably not as widely as it could exist.

I’ll also note that our current usage of the identity affine map actually causes a subtle breakage of parse(print(X)) == X - namely, memref<?xT, (d0)[s0] -> (d0)> prints out as memref<?xT>, which reads as memref<?xT, (d0) -> (d0)>

To clarify the proposal, the main breaking change in MemRefLayoutAttrInterface{} becomes contiguous<[rank]>

Also, to the questions I forgot to get to:

The main reason to have an offset parameter is that there exist backends (SPIR-V) where the notion of “changing the base pointer” is not meaningful. So if you need to carry around some offset from your fixed base pointer (say, because there’s a subview) you have to carry that around separately and add it in.

I think a lot of the controversy about offset was the assumption that it’s 0 by default, so you’d often see rewrites constant-folding away the + offset bit in a way that’d make it impossible to add back in later if you decided that your identity-layout memref actually needed to be moved around.

Currently, it goes via getStridesAndOffset() and so follows the same path as a StridedLayoutAttr, just computing the strides on the fly.

So memref.load %m[%i, %j, %k] : memref<AxBxCxi32, contiguous<3>> is gep %m.base, (%i * B * C + %j * C + %k * 1) while, for example, memref<AxBxCxi32, contiguous<[2, 0, 1], offset: ?>> would give gep %m.base, (%m.offset + %i * 1 + %j * A * B + %k * A)

Ok, I’ve realized there’s a better way to phrase the impacts:

  1. Code that uses MemRefLayoutAttrInterface{} or nullptr, or just not passing an argument to get the identity layout is entirely unaffected by this change unless it was relying on that layout being an AffineMapAttr specifically (that is, layout.getAffineMap() is fine or layout.isIdentity() are fine, they’re on the interface, but if (dyn_cast<AffineMapAttr>(layout()) might have issues … though I couldn’t find one upstream.
  2. If you’re still using the affine-map builders and are explicitly passing in an identity map instead of AffineMap{null}, your memref will no longer print out as memref<…xT>` - you’ll see your layout. (I found no upstream users doing this, as far as I can recall)
  3. Some ops like memref.subviewwere updated to usecontiguous<1, offset: [offset]>instead ofstrided<[1], offset: [offset]>` , so that’ll cause some test churn

My question is about such equivalence being desirable in general. If we decide it is, we can think about technical mechanisms to achieve it, such as mandatory layout canonicalization with strong guarantees at type construction time. If not, it’s a non-problem.

The proposal does not fix this though, does it? Affine map layouts will now print explicitly, but newly introduced identity permutations won’t.

I don’t follow this. Why do we need to have an offset field in the layout for this? It sounds like it’s a matter of lowering, which may always go to the pointer+offset pair when targeting SPIR-V if desired, this doesn’t have to be leaked beyond the lowering as far as I understand.

There seems to be some desire to differentiate between “relocatable” (subview) memrefs and “non-relocatable” memrefs, which isn’t something the type itself supports. FWIW, we have originally had buffers and views into them as distinct types and converged them both into the memref type because there was no obvious benefit to separating the two. If there is a need for that in the SPIR-V case, maybe we should consider having more types or using address spaces, which are also attributes, to indicate “relocatability”. That being said, it’s unclear to me what’s the benefit of having memrefs that appear “non-relocatable”.

1 Like

I think that this equivalence is something operations can allow - that is, if we’d have an example.op ... : memref<5x3xi32, strided<[4, 1]>>, that operation could also allow itself to be declared as example.op ... : memref<5x3xi32, (d0, d1) -> (d0 * 4 + d1)> … or perhaps example.op wants to insist on having a StridedLayoutAttr specifically if that simplifies other analyses.

It does. memref<8xi32, contiguous<1>>, for example, will get printed as "memref<8xi32>", which’ll get parsed as a memref with shape {8}, type i32, and layout MemRefLayoutAttrInterface{}/ContiguousLayoutAttr::get(ctx, 0, {0}); . The layout attribute is only elided if it’s exactly the result of passing in MemRefLayoutAttrInterface{} to the memref type builder.

Our current code doesn’t do this - if your layout is an affine map with pointless symbols, that map is still .isIdentity() and so gets silently converted to a more canonical form.

Relocatability isn’t quite about subviews.

The motivating example for this sort of relocation, in my mind, is transforming

%a = memref.alloc() {alignment = 16} : memref<128xi32, #gpu.address_space<shared>>
%b = memref.alloc() {alignment = 16} : memref<64xi32, #gpu.address_space<shared>>
...

into

%joint = memref.alloc() {alignment = 16} : memref<192xi32, #gpu.address_space<shared>>

or

%joint = gpu.dynamic_shared_memory : memref<?xi32, #gpu.address_space<shared>>

Once we have %joint - and note that, while this simplified example is just concatenating buffers together, one could imagine some more complex algorithm to overlap a %a, %b, and %c that have different lifetimes, - we need to construct values to RAUW %a, and %b.

We can’t create %b with either subview or reinterpret_cast - the type of the %b replacement would be something like memref<64xi32, strided<[1], offset: 128>, which doesn’t match. Not only that, if someone’s already done the subview → reinterpret_cast lowering, it’s rather unclear where you’d aff that offset in.

This all gets much simpler if the types of %a and %b started out as %a : memref<128xi32, strided<[1], offset: ?>> and %b : memref<64xi32, strided<[1], offset: ?>>, where you know that there will exist some offset that you need to apply that hasn’t been determined yet.

But if you always want set offset: ?, you run into the fact that StridedLayoutAttr doesn’t let you represent row-major-ness on dynamically-shaped memrefs, hence wanting contiguous<[0, 1, ... Rank-1], offset: ?> instead.

I think a lot of the previous discussion around offset on memrefs was around whether or not the assumption that memrefs don’t have a built-in offset that they’re carrying around it sensible - or, to your question, whether memrefs appearing non-relocatable was sensible. I think the base case here is that if I have %m : memref<T>, that means that memref.load %m[] is equivalent to load T, %m.base - but if I have a memref<T, contiguous<0, offset: ?>``, then that same memref.loadisload T, (gep T, %m.base, %m.offset)... and that adding that implicit%m.offsetto thememref` case would be a much subtler breaking change.

I personally expect that adding implicit-offset semantics to every single memref is a much riskier change than changing what the null memref layout is, but that’s more of a general intuition than anything I have evidence for.

Sounds reasonable to have it on operations. We likely want some generalized support at the type level to avoid every operation reinventing what it means to be canonical.

Okay, I see what you meant, the part about symbols being removed wasn’t clear initially.

This incorrect roundtripping looks like a bug to me and therefore can and should be fixed regardless of introducing a different layout type.


Thanks for the example!

Let me elaborate a bit on what my current preference for the design is: no offset at the layout level. As you point out, there is indeed an issue in certain transformations like allocation merging where we could get a mismatching type. My point is that having the offset field in the layout attribute is exactly what leads to the problem, so the solution should be to remove it. It’s rather confusing to say memref.alloc returns something with offset: ? where we certainly know that offset == 0 but we can’t canonicalize it away. If we (almost) always want offset: ?, we might as well assume the offset is always there and not carry it around in the type.

The question of lowering to load or gep+load, in my opinion, is a matter of lowering and not the type itself. We can very well have two lowering schemes:

  1. For LLVM-style targets, subviews and co. always update the aligned pointer in the descriptor, instead of the to-be-removed offset field, and loads/stores never need to gep the offset (still need the strides though). Progressive lowering makes it possible to further simplify address arithmetic.
  2. For SPIR-V-style targets, subviews. and co. update the offset field of the corresponding descriptor because that’s how this dialect wants memory to be modeled, and loads/stores will have to always take the descriptor into account. Similarly, progressive lowering should let you constant-fold zero offsets before load.

This separates the concerns cleanly IMO: the memref type itself doesn’t care about offset, we get less type mismatches, and the lowerings do what is needed for the target.

For context, we initially borrowed the notion of offset from pytorch/dlpack tensor descriptors for compatibility reasons, back in the day when we had buffer and view types separate from memref. Note that dlpack offset is in bytes rather than in the number of elements, so we are not actually compatible. Beyond compatibility, the reasoning was that explicit static offsets could help with aliasing analysis. This never materialized and we now have sufficiently powerful dataflow analyses to propagate known offsets easily without mingling with the type.

1 Like

I am not sure why this needs to be considered here. This is a bit of a red-herring. We have this discussion repeatedly cause most people seem to assume that you can always add offset to base pointer on all serialization. This is definitely not the case, and is mostly true for only LLVM based serializations. The mental model to use here IMO is that the <base_ptr, offset> is an opaque object. So any assumption that you can update the base pointer is invalid.

This is similar to what most people seem to bottom out to. While this is workable, an implicit offset in the type is a bit of a footgun. The allocation merging example is one such place where the “canonicalization” of zero offset is a problem. You cannot break the link between the offset and pointer cause it is impossible to know how the offset is used in the program. Breaking that link can create very real observable effects. It might be dense, but mmt4d ukernel called with LHS pointer passed also for RHS and OUT · Issue #12480 · iree-org/iree · GitHub is a real issue uncovered by this behavior downstream. I have long argued for this canonicalization to be removed. Leaving aside the theoretical discussion of what canonicalization should/shouldnt be, to me this is a major footgun that makes this canonicalization not worth it.

In any case, the offset issue is a bit of a tangent. Its unfortunate that the original RFC explicitly called out this as an example to justify the use of contiguous layouts. If we remove that example, there is enough reason to use contiguous layouts. It is a richer representation that strided when dynamic dimensions are present. That is a good enough motivation from my vantage point. I agree though, we probably dont want to change the default soon. We should add support for this interface, modify transformations upstream/give time to folks to adapt downstream if needed, and then change the defaults if everyone agrees that this is viable.

Because we already took a similar decision for the strided layout, we regret it, and it proves hard to revert. Here we have the opportunity to reconsider, let’s take it!

My general point is that the offset is irrelevant for the type itself. It only appears for certain lowerings of that type. If the type is unaware of the offset, the zero-offset canonicalization is a non-issue. To me, you are sort of making an argument to support my point. The issue you referenced would have never happened if we didn’t have offset in the type allowing us to assume it’s statically zero instead of systematically adding what’s in the descriptor and letting lower-level constant propagation get rid of that if it’s constant zero.

(Irrelevant here, but I have never been a strong proponent of extract_strided_metadata/reinterpret_cast, it’s a clutch to work around the absence of 1-N conversion and pointer types so we can simplify address arithmetic. It looks like we may be getting both in-tree soon, so may be a time to remove those operations).

On top of that, I find homogeneity, such as “there is always an offset that must be added when going to target T”, easier to comprehend.

1 Like

Ok, so, just to clarify the proposal - which I think is somewhat independent of the RFC:

Currently, a memref semantically carries around a base pointer, its sizes (so, at the very least, the value of any ? dimension “lives” in the memref and can be retrieved with memref.dim), and any values its layout needs.

Are you proposing that, instead, a memref carries around a base pointer, its sizes, the layout it needs, and an offset? That is, a memref always has a field you can get at with memref.get_offset, memref.add_to_offset, and so on?

I do think that’d be a solid idea but does cause serious issues for things like the subview → reinterpret_cast lowering and feels like a much subtler breaking change. I’m worried about its impacts on downstreams more than I’d be worried about the effects of changing what MemRefLayoutAttr{} is.

1 Like

Also, as to the original topic of the RFC:

Feedback has shown that landing both the existence of ContiguousLayoutAttr and making it a default in one big PR is unnecessarily drastic. Therefore, I’ll break my PR into several changes, so that ContiguousLayoutAttr and the changes needed to make it work with various passes will land separately from snapping the default over.

(That is, thanks for pointing out that I’ve created a mega-PR that doesn’t need to be as intertwined with itself as it is)

1 Like

This is a very nice wording, let’s make sure to add it to the documentation!

I think what I’m proposing is sort of the opposite. Memref semantically doesn’t know anything about offsets, it’s just a base pointer that can be updated. Conceptually. A lowering to targets such as SPIR-V where this is impossible directly mimics that behavior by carrying around the offset, but this behavior is confined to SPIR-V.

So no memref.get_offset operation, and I’d even reconsider having extract_strided_metadata when we can. This leaks specifics of a lowering to a higher-level abstraction, which I don’t really like.

Note that we already have the discrepancy between conceptualization of the memref and its lowering even to LLVM IR because it has two pointers: allocated and aligned.

1 Like

In the interest of making progress, I am supportive of introducing the new layout by itself. The discussion of the offset in the existing layout is orthogonal. The new layout may or may not have the offset. My proposition is to put an “experiment deadline” as either a time frame or some amount of code that has been made aware of the new layout, and commit to reconsidering the presence (or absence) of the offset in the new layout at that point.

It seems that it would be more conservative to start without it: it’s always more difficult to remove it afterwards (proof is the discussion on the existing model!)

To update where my thoughts have gone from here:

  1. Some parts of the above PR (which I don’t intend to merge as is) are independent changes that’re being factored out into smaller commits (ex. generalizing bufferization.tensor_layout)
  2. In order to enable more flexibility in the future, I recently movned getStridesAndOffset() from the memref type to the memref layout - now, anyone can implement LogicalResult getStridesAndOffset(shape, strides&, offset&) on their layouts.
  3. I’m planning to add a getPermutationAndOffset(shape, perm&, offset&) method - by analogy with getStridesAndOffset(shape, strides&, offset&) - to the layout and type. This’ll allow me to write code that would’ve been an dyn_cast<ContiguousLayoutAttr> as checking for getPermutationOrOffset()-ability instead
  4. This (plus some minor changes, like teaching ContiguousLayoutAttr to getCanonical to let it return the N-D identity map attribute when relevant) will let me add a contiguous<perm, offset> attribute without changing the definition of MemRefLayoutAttr{}. That’ll remove most of the breaking changes - and will let the “handle contiguous layouts specially” cases be something that can be implemented more gradually
1 Like