[RFC] Materialize strided MemRef layout as an attribute

I propose to introduce a new StridedLayoutAttr that directly implements the strided memref layout as an attribute. This new attribute will implement MemRefLayoutAttrInterface and is convertible to AffineMap as required by the interface. See [RFC] MemRefType affine maps - list vs single item for context.

This attribute can be used instead of extracting offsets and strides from affine maps during transformation and lowering where only strided forms are supported (the majority of cases). If necessary, we can introduce a pass that converts between AffineMapAttr and StridedLayoutAttr for compatible memref layouts. The inverse conversion shouldn’t be needed as long as all existing code uses the interface-based conversion instead of bluntly casting the layout attribute to AffineMapAttr.

This has the following benefits:

  • simplify transformation applicability logic – no need to analyze affine expressions;
  • improve IR readability (memref<?x?, strided<?x?, offset: ?>> reads better than memref<?x?, affine_map<(d0,d1)[s0,s1,s2] -> s0 + s1*d0 + s2*s1>>);
  • simplify the core parser (we currently support the memref<?x?, strides: [?, ?], offset: ?> form that gets unpacked to the affine_map form upon construction);
  • conceptually simplify the memref type specification by making the strided layout a first-class construct instead of a “view” of the type.

The drawbacks are the usual API/test churn and potential incompatibility with some downstream passes that only expect AffineMapAttr.

Specifically looking for feedback on:

  • should this live in the builtin dialect (allowing for memref<?x?, strided<[], offset>> syntax) or in the memref dialect (memref<?x?, #memref.strided<[], offset>>)?
  • should we “canonicalize” memrefs with compatible AffineMapAttr layout to use StridedLayoutAttr instead?

Any other comments are also welcome.

4 Likes

Additionally, a simple porting of the memref.subview code (result type inference, verifier, and canonicalization patterns), without removing the special logic in the parser, shows a 5% speedup in parsing and a 4% speedup in canonicalization of a 1mLoC file consisting of replicated memref.subview canonicalization tests.

1 Like

How would this work? We don’t have “type canonicalizers”.

At construction time, inside MemRefType::get

2 Likes

+1 from me in concept. I’ve read all of the threads and I don’t know why we overly generalized such a common (the most common) layout to a general purpose affine map. I know that it is “mathematically equivalent” but I don’t care: strided layouts are first class and should be treated as such.

This will create churn but I feel is needed.

I suspect the “right answer” is that this goes in memref but built-in does feel more consistent with current locations of things and will create less weird coupling. Maybe this lives in builtin for as long as the memref type itself does?

This makes sense to me. memref as a type gets used everywhere anyways.

It’s completely the other way round - there was no generalization done, but the strided layout was later introduced as a common special case (subset of affine maps). There are many useful layouts for code generation and performance that can barely be thought of as strided layouts but still be captured as one-to-one affine maps. In other words, affine maps capture a very common and structured subset of one-to-one functions that can be used as layouts.

1 Like

So this is going the opposite direction than LinAlg? E.g., AFAIK LinAlg has been positioning the generic op with specific views and positioning that as the better tradeoff than the this direction of more specific with conversion to more generic.

How is this different than doing a dyn_cast wrt simplification? E.g , why is view approach insufficient?

This could be done as in MHLO where common cases are pretty printed differently. No new attribute is needed for that.

You’ll still need to parse strided form as here. The unpacking is helper function that will look pretty much like the canonicalization that you propose below, so seems like it is the same functions just called in different places.

Strong +1, this will greatly simplify the general comprehension of the IR as the affine map convention here is error-prone. I should have pushed back harder against shoehorning AffineMap in the semantics when I introduced.

Just because one can encode the information with a more general linear N-D → 1-D affine map + a convention (i.e. s0 is the offset, s1 is stride[0] etc…) does not mean one should do that.

The underlying logic should continue to use AffineMap for the foreseeable future but the convention becomes an implementation detail not exposed to the user.

This would largely be NFC apart from updating the test churn and generalizing the paths that have come to rely on AffineMapAttr. Is this accurate?

I would vote yes as I am not aware of any legitimate path that linearizes to a 1-D strided layout and is not a StridedLayoutAttr.

I think you’re completely missing the point @stellaraccident makes here: using affine maps instead of a more idiomatic offset + strides form is the unnecessary generalization that was adopted in an effort to build consensus.

This is what Linalg uses in practice, I view this as a syntactic change that replaces an error-prone spelling by an idiomatic one. Maybe I am missing the intent of your comment though, I am not sure I am parsing it properly.

This would already be an improvement and was the first thing that was implemented when introduced (we didn’t have a way to selectively decide on a printing form).
Would the generic form of the op print properly if the layout actually an AffineMapAttr?

A specific layout attribute also brings an opportunity to extend the layout.
One thing that has been missing is to specify that a stride is a dynamic value multiple of some static constant. This will be needed for proper alignment support in the N-D case and we have punted on it for too long IMO.

I have been punting on adding this because representing such n-D alignment with:
memref<?x?, affine_map<(d0,d1)[s0,s1,s2] -> s0 + 4*s1*d0 + 4*s2*d1 + 8*s3*d2>> breaks the convention (i.e. the stride would not be encoded by s1 but by 4*s1).

Anyway, strong +1.

Does it? Linalg has never supported anything but the strided layout. When it was first introduced, it had its own “view” type with this layout baked in, that was later replaced by memrefs with a mechanism to encode the strides into affine maps. There was even an attempt to bring the strided layout into memref, but it did not succeed.

I did not say it was not sufficient, I said it was not simple enough. We have multiple places that call getStridesAndOffset which traverses the expression tree of the affine map in attempts to recover what could otherwise be stored as a vector. Slapping dyn_cast on top of this just changes the API, the implementation complexity is the same (note that there is no isa support on AffineMap, so this adds more complexity; there doesn’t seem to be a precedent of using a “view” class for attributes upstream, so this adds even more conceptual complexity) and so is the fragility of the approach that easily broken by tiny modifications to the map.

This effectively means the strided layout becomes a first-class thing in the memref type, except with added complexity of it actually being a mere facade for the affine map. It being a first-class thing was rejected when proposed. It also sounds like it contradicts the consensus of using attributes an interfaces reached in [RFC] MemRefType affine maps - list vs single item.

It would use the standard logic for parsing custom attributes and would not require any special logic in the core parser that is aware of this form - llvm-project/TypeParser.cpp at 0a5c344a84a452452185f3a6bd7d7679a42abb42 · llvm/llvm-project · GitHub. There would also be no conversion between the strided form and the affine map, which are not free.

1 Like

We can also replace getStridesAndOffset(memref.getLayout().getAsAffineMap()) with a dyn_cast and direct accessors. This is still arguably NFC except for a more graceful failure more in several cases.

Works for me!

Yes I was asking more about the approach whereby one was able to match the structure. So non-load bearing names as property of it, it’s about computation not name.

No it wouldn’t. So in crash cases it would be raw.

Yes, so this is a caching mechanism of sorts. Passes using affine map operating on this would then incur the cost of converting to affine map vs passes that uses stride. And for LinAlg you only use stride and so it would be cheaper (and as you mention less fragile). In general would we know which is cheaper? I think that would determine if this should be on producer to create the appropriate form, a pass that converts them or in the build method. I could imagine this being close to always cheaper but have no measurements.

DenseIntElementsAttr is such a view IMHO.

Caching and verification. My, perhaps erroneous, understanding is that we have significantly more in-tree usages of the strided form than of the more generic affine form. Specifically, the entire view/subview/reshape part of the memref dialect works exclusively on the strided layout and currently incurs the conversion cost in parsing, verification, and canonicalization. I observed some speedup here. Anything that either comes from Linalg or goes to LLVM uses the strided form and incurs the conversion cost twice: first, it lifts the affine map form into the strided form, then reasons on strides, then converts the strided form back into the affine map form.

Note that purely affine flows that use non-strided maps even today use a specific pass to push their layout maps into attributes of affine operations and turn memrefs into ones with trivial layouts. These flows won’t have any added overhead. The only kind of flow that could have the overhead is the one that needs to go from, e.g., Linalg on strided memrefs to affine loops on affine-layout memrefs. IMO, this only shifts the conversion cost down one step in the pipeline as Linalg on strided memrefs would be already paying the double-conversion cost to do its reasoning.

It is hard to measure precisely though, we don’t have compiler performance benchmarks.

Indeed. Then the “view” aspect is less scary conceptually. I argue that it is still just hiding the complexity instead of removing it. The classof implementation of DenseIntElementsAttr is nowhere close to the complexity of getStridesAndOffset (llvm-project/BuiltinTypes.cpp at 2d20fb00b3c7e455c7ba668ff55efc6515bce383 · llvm/llvm-project · GitHub, note the repeated calls to simplifyAffineExpr, which allocates and traverses the expression tree recursively + creates new affine expressions under lock).

1 Like

Yes, this was the essence of my point. I’m not opposed to more general forms but when you have a layout like strided which is the overwhelmingly common case, it is really useful to represent that itself and be able to also use a more generic mechanism when the need requires. When I look at the memref dialect and all of the lowerings, I see strided everywhere but a very complicated encoding for it. Seems like it was an artifact of a previous era of the system before we were more comfortable with how to leverage MLIR for better factorings.

Also, it sounds like affine maps and how they are interpreted as strided is not a strict subset of strided layouts and would require yet more pattern matching to generalize.

1 Like