[RFC] Move `tensor.pack` and `tensor.unpack` into Linalg

rengolin · November 12, 2024, 4:59pm

When we originally proposed the packing operations, we went from Linalg to Tensor on a pre-RFC discussion, but over time we often wanted to keep the packing semantics through bufferization (to fuse at micro-kernel level) but couldn’t and had to rely on loop-level fusion.

IREE still retains its linalg_ext.pack operation for presumably similar reason, so I’ve been thinking perhaps we should move the pack/unpack pair into Linalg.

Recently, there was a PR that exposed yet-another problem of packing operations at tensor level: tensor compilers need to embed attributes to data movement operations that are not related to the actual tensor type, but have no other way to carry them that discardable attributes.

The fix was to propagate discardable attributes, because extending operations (to add non discardable attributes) is not that easy in MLIR, especially those that are associated with built-in types. Here there are no good solutions (pollute tensor dialect with various unrelated attributes or propagate attributes that may have become invalid).

This gives us a second, more important, reason to move (or copy?) pack and unpack to Linalg: if we restrict its usage to “tensor compiler” logic, then we can inbue it with HW information, because that’s what tensor compilers do.

Due to the “tensor compiler” bias in my work, it’s not clear to me how much tensor.pack is used outside of such use case. If not at all, then moving is the right thing to do, as we can evolve faster on the linalg side without breaking other, non-linear algebra cases.

If they are used elsewhere, then there could be a more involved discussion on how to break it.

For example, where linalg.pack “lowers to” tensor.pack plus surrounding bits (loops, offset calculations, function calls, etc). Or if there’s only a minor use case, then we can also argue to duplicate packing onto another smaller dialect (potentially downstream). But we should not get ourselves in this convoluted discussion if there are no use cases.

My number one concern here is to keep design churn where the people that will pay the price are the ones actually making the decisions (right or wrong in the eyes of other people). By segregating decision makers and the impact of their decisions, we reduce conflict and avoid unwanted separation of concerns. We may not reach the best design in the meantime, but as long as we’re evolving into a better design space, this is a sound strategy.

@mehdi_amini @stellaraccident @jpienaar @qed

chelini · November 12, 2024, 6:26pm

I assume the plan is to move tensor.unpack as well, right?

rengolin · November 12, 2024, 6:27pm

Sorry, yes! I have changed the title to match.

stellaraccident · November 12, 2024, 8:37pm

I’m +1 to making linalg a comprehensive destination that is self contained for its task of representing these kinds of programs – and this is a key step in that direction.

I think that the “tensor dialect experiment” has basically failed from the perspective of a linalg based compiler. Representing pack/unpack natively in linalg brings a number of benefits, as called out here. Further, taking a “copy on write” strategy for the next time that the pad and concat ops need attention and giving them native linalg representations would further eliminate the needless sharing and over-abstraction. There is a list of others that may also be up for discussion, but I don’t think it is necessary to redesign the world on this kind of thread.

I think that my meta-point would be that directionally, making linalg complete and self contained for what it represents is basically project-charter level. From the perspective of the torch-mlir and stablehlo to linalg lowerings (both of which my team marshalled into existence), the uniformity of representation and “single thing to reason about” aspects of going in this direction are pretty compelling. There are similar benefits from a backend perspective.

I trust the technical judgment of @rengolin and @qed on these topics implicitly because of the long miles they have put in on it. +1 to a concrete proposal that the two of them agree on.

(edit: I treat the two things as separable. Adding these ops to linalg makes a lot of sense. If that would abandon them without user in tensor, then it is a move. Otherwise, it is a copy)

qed · November 12, 2024, 8:54pm

I am +1 to the move. I’ve rarely seen these ops interface with nearly any of the other tensor ops while they are frequently used in conjunction with Linalg ops. My main reasoning seems aligned with the above:

Their lowerings all go through linalg(.transpose) or linalg_ext.(un)pack in IREE. IIUC it’s the padding semantics that made it appear as though it should live in tensor, but I consider pack and unpack as specialized forms of gather (and not tensor.gather, but the linalg kind with tensor.extract). The reason to give them their own op is because they have other more convenient decompositions/properties than an unstructured gather.
Many of the interesting transformations that use them are fairly closely tied to linalg (DataLayoutPropagation, LinalgOp packing).
No memref variant

Due to the “tensor compiler” bias in my work, it’s not clear to me how much tensor.pack is used outside of such use case.

This is my understanding as well. I would love to hear if someone working in a different area landed on a use for them though!

This gives us a second, more important, reason to move (or copy?) pack and unpack to Linalg.

I would prefer to move it unless another stakeholder speaks up, in which case we can go for a copy (and also fork all of the associated transforms).

rengolin · November 12, 2024, 9:22pm

Indeed, this is the key reason why we’re proposing other named ops. Many of them can be represented by linalg.generic, but parsing that in the compiler, and extracting their properties from base principles only is a major pain.

These are further step needed, yes. In linalg, we can not only reason about them in a principled way, we can reason about them in the same way regardless if we operate on tensor or memref (modulo the memory difference) without having to duplicate everything.

This also makes bufferization much simpler.

That’s why the title is “move”. Hopefully people that would feel strongly against would be more enticed to respond than if I had said “copy”.

Jokes aside, I’m treating this as “move unless strictly required”. And if required, but very niche, it can still be a “move into two different dialects if semantics is significantly different”.

Further ops (ex. pad) could have different usage, so we should discuss them in a separate thread, possibly after this one. Here we focus only on pack/unpack.

jpienaar · November 13, 2024, 1:40am

I don’t see any usage of these outside IREE and coupled with Linalg, so don’t really have an opinion about the move. I think this proposed change (moving) rational more closely reflects the counterpoint Uday had made back then.

banach-space · November 13, 2024, 11:31am

ATM, these Ops sit at the boundary between Tensor and Linalg, so some clarity here would be welcome.

However, constructs like:

GeneralizeOuterUnitDimsPackOpPattern

decompose tensor.pack into a combination of tensor.pad (tensor) and linalg.transpose (linalg). It seems that even after a move, these Ops may remain somewhat at the boundary due to patterns like GeneralizeOuterUnitDimsPackOpPattern.

Overall, I don’t have a strong stance on where these Ops belong, but I’d appreciate avoiding a rushed decision, especially considering ongoing discussions around Linalg:

Additionally, I’m currently updating various patterns to support dynamic tile sizes. I’ve mostly completed tensor.pack and will next focus on tensor.unpack. I’m happy to coordinate with anyone handling this move or copy effort—please feel free to reach out.

Lastly,

I don’t quite follow this point.

ftynse · November 13, 2024, 1:27pm

This sounds like a much larger scope than just moving ops. Unlike many other dialects, Linalg sort of has a charter: Linalg Dialect Rationale: The Case For Compiler-Friendly Custom Operations - MLIR. It’s rather lengthy, but I included a link to the guiding principle: transformations first. Can someone make a case to moving this ops because they align with that charter?

My main concern here is to avoid reinforcing the somewhat misguided perception of Linalg as a dumpster where we put all ops that don’t belong elsewhere.

cc @nicolasvasilache

rengolin · November 13, 2024, 2:44pm

That’s the key here. One of the big issues is mentioned by @banach-space:

If you see this as “Linalg uses Tensor types and operations”, then it makes more sense for linalg.pack to lower to linalg.transpose + tensor.pad than the current situation.

If linalg would use another type of tensor (say sparse) than this would break the pattern. This is one example of the layer violation @stellaraccident mentioned in the restructuring RFC.

This is the opposite of the intention here. I want to move to Linalg what makes sense being in Linalg and nowhere else.

tensor is a dialect that can be used by any domain (tensor compilers, front-ends, even triton uses it), so and domain specific knowledge on a tensor type (or operation) pollutes the design space.

linalg is a dialect that is used for tensor compilers (ML, HPC, etc) and as so, already fits into a domain that tensor does not. Even triton-shared, when it raises triton to linalg, it does so with the precise intention to convert it into a tensor compiler problem (tile, fuse, etc).

Therefore, adding tensor compiler attributes to linalg operations is on par with its own design, in a way that tensor (or memref, or vector) never will be.

To @ftynse’s point, by these “tensor compiler attributes” I mean precisely how linalg operations are transformed from a tensor world, through a memref world, into a vector world. (not necessarily related to their dialects, juse level of abstraction). All in sync with what tensor compilers need to generate efficient code for the target architectures.

jpienaar · November 13, 2024, 3:14pm

Sparse being an encoding on ranked tensor type, seems that it should still fit. It just results in making dense (which is semi expected when padding). What am I missing?

rengolin · November 13, 2024, 3:21pm

You’d have to add such an operation to sparse_tensor. And memref. And any other type that can be used in linalg.

ftynse · November 14, 2024, 10:02am

Let me recenter the discussion a bit here. We are not trying to change the Linalg charter, just decide if moving ops into the dialect makes sense.

I’ll elaborate a bit more on the original design point of “transformations first”. Linalg IR is designed to support and make simple (as in, require only trivial analysis) certain transformations: the original version of memrefs to support tiling, the version on tensors to support tiling+fusion beyond simple loop fusion. I think says sort of the same thing as @rengolin above when he says “domain-specific” but without litigating on what constitutes a domain.

It isn’t really about using tensor types or not. FWIW, the first proposal was using its own pair of types that were later both morphed into memref with strided layout.

rengolin · November 14, 2024, 12:32pm

As @mehdi_amini noted recently, linalg does have a charter (short version).

The key motive behind linalg is to allow a set of key transformations on reasonable abstractions. Packing, transpose, padding, tiling, filling, fusion are all transformations that should be done at the linear algebra level, so that we can reuse the same abstractions across the board.

While we can create interfaces that other dialects implement to make use of such transformations, this should be a secondary goal of linalg. Concentrating the efforts into a core dialect for tensor transformations gives us the ability to understand the entire flow, from framework to hardware abstractions.

It also gives us the ability to control external processes like bufferization (and the semantics we expect from tensors and memrefs), vectorization (and the semantics we expect from each target architecture), micro-kernel lowering (and the semantics each library provides), etc.

In this view, packing operations are really part of the linear algebra domain and not properties of a container type that, by design, has no strong semantics.

stellaraccident · November 14, 2024, 1:10pm

Personally, it has been a long time since I’ve thought of linalg as an island as it is defined in its charter, and at some point, we probably want to update the charter to reflect reality.

The current charter does a reasonable job of defining the core role of the representation and what qualifies as “linalg-ness”, but the parts that describe design approach and reality for named ops has drifted quite far from what is written, both in tree and when considering what lowers into it. I don’t want to derail this thread, but we should update that, both with respect to reality/current thought and the role of ODS, YAML, etc. I’d say that part of the charter has a few experimental stances in it that time has not served well.

Whether it is the “linalg dialect” or the project, something needs to call out something like this: “As part of its design goal, lowering into and composing with linalg has a strong requirement for following the IR and interface design of the core abstraction. Enabling this composability such that the linalg project can serve as a comprehensive lowering target for {{list things that exist}} requires an annex of named and structured ops with compatible IR and interface semantics. Presently, this annex of ops is carried in the linalg dialect itself, but in the future, it may be sensible to split the dialect into the core abstraction and the op library annex that is needed to make it a comprehensive target.”

@ftynse once upon a time, I think you and I collaborated on a design doc which proposed basically that split as a direction for a next evolution of linalg which addressed this tension of what is core vs annex – although I think that has been lost to time and company firewalls at this point. Also, it is this kind of “bringing all of the children of linalg home” aspect which is why I consistently argue that the linalg domain needs to be a project that can grow into multiple parts vs a single dialect-nation.

But practically in this case, I don’t think we need to think about such big changes – just keep them in mind on the horizon. Practically, we are talking about pack/unpack which have been marooned in the tensor dialect with a representation that does not compose with the whole. Even if we were being conservative and saying that the rest of those data layout ops in tensor should also move, we’re talking about a very small list (certainly in comparison to the list of things that linalg already admits). The thing that they all share in common is that they need to compose with the linalg conventions and they represent the core ops you need to manipulate program level data layout transformations. These aren’t just some random “ML ops” but are deeply implicated in the transformations that linalg enables, and I think its time to bring them home and stop trying to interop with a foreign representation at this level of abstraction.

rengolin · November 14, 2024, 1:24pm

Absolutely!

+1. That’s what I tried to convey above.

dcaballe · November 14, 2024, 5:03pm

+1 to moving tensor.pack and tensor.unpack to Linalg or, at least, having a copy of them there. These operations are strongly coupled to the actual Linalg computation. When they made their way to tensor, I can’t remember if from linalg itself or linalg_ext, we had to decompose/special-case them in the Linalg vectorizer as they were no longer LinalgOps. I think having them back is a step in the right direction.

ftynse · November 14, 2024, 11:22pm

This aligns with my understanding, thanks for clarifying. It looks like these operations were designed to support the packing transformation, which sounds like a good fit to the idea of transformation-supporting IR. Therefore I’m supportive of the move.

Yes, I think we should revisit, but agreed that not here.

It just appeared to me that I can’t argue that we should write up charters and define more objective contribution criteria in one thread, and perpetuate the cliquey “makes sense to me” RFC review behavior in the next thread.

stellaraccident · November 15, 2024, 5:32am

Yeah, I got what you were doing and tried to respond accordingly. The prompt did get me to re-read some of the long neglected parts of the linalg docs, which was a good refresher. LLVM discussions have taken a lot of the time I have for such things over the last few weeks or else I’d volunteer to take a pass at revising. But at this point, I know I have my plate full with the other topics and I think a refresh can hold for awhile longer. Better to not be running too many threads of such things.

nicolasvasilache · November 15, 2024, 7:51am

Indeed, as this piece of the overarching puzzle is now in use by multiple groups and seems to have many stakeholders, the original “charter” needs a refresh.

The key principles of linalg still very much hold, even 5 years later, as they were devised from first-principles based on analysis and experience of decades old contributions from which it learned and evolved. In particular, my experience is that the knowledge captured in this simple picture, is often overlooked.

Unlike many components in MLIR, I can testify that this body of work has historically suffered from a “too few cooks in the kitchen” syndrome. These days, we seem to observe the reverse effect, and the community needs to adapt to this trend, by e.g. refreshing the charter.

Well, since I have been explicitly summoned I certainly can .

I believe there are a few different aspects to consider:

the design of linalg carefully considers how static information is encoded in the IR to provide a good tradeoff to the legality/profitability/applicability problem. As you note, this is all about the “transformations-first” principle.
however the dialect need not be closed under transformations. For instance, the linalg initially contained linalg.for, linalg.load, linalg.store that evolved and became the basis for the scf dialect and memref.load and memref.store ops.
more generally, I believe the whole notion of dialects is quite limited. Dialects are a concept that impose a rigid static “belonging relationship” to ops, attributes and types. This whole concept fails to account for the key job of a compiler: transforming from one representation to another (usually finer-grain, all the way to bits and bytes). As a consequence, the notion of dialect is just nomenclature and build/linking units. One left with dialect dependency issues until one carefully considers where to break dependencies at the level of individual transformations.

From the PoV of 1. this is a no-brainer: pack / unpack operations are a representational first-order consequence of performing transformations on a “structured operations” representation, therefore they are welcome to live in linalg.

From the PoV of 2. it is fine that pack / unpack live under tensor for the purpose of transformations. However this line of thought breaks when one wants to use these operations on buffers.

On the 2 points above, linalg does not mandate how one should perform transformations and an equally valid approach is to use affine.for with mod indexing and never need pack / unpack ops.

Anyway, enough about the tradeoff analysis, there are much more points to consider (in particular related to PoV 3.) that I am not going to unpack here (haha ).

Back to this particular RFC of moving pack / unpack to linalg, I am supportive of the move. Thank you for making progress on this @rengolin !

Topic		Replies	Views
An update on linalg on tensors MLIR	3	2028	September 30, 2020
[RFC] tensor.pack and tensor.unpack MLIR	29	2085	October 24, 2023
RFC for introducing `abstract_tensor` as a `builtin` type MLIR	25	786	January 10, 2022
Open MLIR Meeting 7/15/2021: From MHLO to Linalg on Tensors Announcements	1	496	July 15, 2021
MLIR News, 28th edition (2/20 - 3/5/2021) Newsletter	0	1327	February 22, 2021

[RFC] Move `tensor.pack` and `tensor.unpack` into Linalg

Related topics