MLIR Tensor Compiler Design Group

rengolin · February 3, 2025, 10:30am

Proposal

Following from the initial proposal, the survey and results, and the final proposal, this is the first step towards implementing design groups in MLIR to bring the technical charter for guiding roadmaps and implementation details.

This particular proposal is for the Tensor Compiler design group, as referenced by the final proposal above.

Role

The role of this group is to consolidate a technical charter for the tensor area dialects, interfaces, transforms and general surrounding infrastructure, compatible with the rest of MLIR.

First, it needs to define the scope by agreeing on a short list of major directions we’re going, for example:

Upstream shared values (“canonical” pipelines and forms, dialect semantics)
Downstream usage + dialect extension (MLIR based tensor compilers)
CPU/GPU/device code-gen vs. micro-kernels, etc. (tensor/memref/vector transforms and semantics, type system)
Common infrastructure (interfaces, matchers, rewrites, attributes)
Research directions (dynamic schedules, composable transforms, stable public APIs)

Agree and document a representational roadmap for each direction above, to understand how they overlap, build upon or get in the way, on each other’s contributions. This is not the charter. This is to build a common understanding of what people use MLIR for and how to build a common upstream infrastructure to support them.

Second, we take stock of what we have, reevaluate the charter documents, make sure we’re still in line with the roadmaps above and how we can solve the technical disagreements between them to reach a unified direction, with a strong upstream model, making clear what the downstream cost is for each party.

This is the “where we have been” part of the charter. It consolidates state but also clarifies the rationale behind the dissonant arguments that we’re having recently. Hopefully by then we’d start having much more fruitful discussions and effective changes.

Here we work upstream and downstream to implement the vision and continue writing the charter. In time, we should have enough direction to write the “where are we going” part. I don’t think we should do that before we have agreed on where we are.

Third, we identify the critical pieces of infrastructure missing to make MLIR more malleable to distributed usage (not just downstream, but also other upstream projects). For example:

A way to compose and extend off-tree dialects without requiring a particular hash of LLVM. This creates an ecosystem outside of the monorepo, helps build momentum before going upstream and reduces the need to go upstream at all for most dialects.
Missing coverage in tests, documentation, semantics definitions, type system requirements, etc. that make it easier for dialect designers to know the bounds that they need to adhere to for minimum functionality expectations.
More rigorous definition of canonicalization and transformation requirements, in view of the expected shapes and transformations, in a way that does not force everyone to use a particular form, or at least make that form generic and powerful enough that can be widely used.

That’s to help reduce the cost of needing a charter, which I’m expecting to be large, complex and still not completely unified. It’s a meta discussion to refine the charter, but one that can only happen after we know where we are, where we’re going and we all generally agree on the tasks needed to get there, upstream and downstream.

That’s the time we start converging into the actual technical charter that we can use for making principled choices.

People

Looking at the recent merges into linalg, tensor and vector that were not NFC, revert, typo fix or “one off”, here are the recurring contributors in the “Tensor Compiler” side of the equation:

@banach-space @MaheshRavishankar @javedabsar @rolfmorel @mshahid @Groverkss @matthias-springer @jpienaar @ftynse @kuhar @asiemien @hanchung @qed @krzysz00 @dcaballe @kurapov-peter @Hardcode84

(Note: even though the vector dialect was somewhat in between tensor and low-level groups, most of the contributions to it are from the tensor side, so I’m considering that as at least an indicator of tensor compiler contributions).

If my count is correct, we have 1 Arm, 7 AMD, 1 Qualcomm, 4 Intel, 2 Nvidia, 1 Google, 1 Independent. Not a bad distribution.

Also, we want people that have been involved in design, not just implementation. Looking at the forum posts, @banach-space @MaheshRavishankar @javedabsar @rolfmorel @matthias-springer @jpienaar @Groverkss @kuhar @dcaballe @ftynse @qed and myself are recurring users.

I don’t want to limit or volunteer people, I’m just listing based on upstream involvement that I see (which is biased). Some folks above may not want (or be able) to participate, others may be more suitable for this role.

Somehow, we need to find a good initial balance and start the process. Doesn’t have to be perfect or static, people can come and go, but we need critical mass, or this won’t work. I would try to keep at least 5 people with the intention to get through the year and consolidate a reasonable draft of the charter.

Happy to take proposals on how to select the team.

Next Steps

Step 1 is selecting how many people and who will be part of the design group. I don’t want to set limits here, and I think we should all agree on something and move on. The only constraint I’d put is to try to balance as much as possible on company / group representation.

Step 2 is the creation of sub-channels in Discourse and Discord, to minimize disruption into the rest of MLIR, and agreement on a recurring design meeting.

Step 3 is to discuss needs and tasks and collect volunteers for those. The output should be RFCs into the forum, PRs into documents and code that will make our life easier when reaching for the roadmaps and charters. These should be documented in a new section of the MLIR docs, and potentially move or deprecate old documentation, pointing to the new pages.

Step 4 is to perform the roles listed above and start working on the common infrastructure upstream.

This is 100% public work, and the main difference in selecting a few people is that they’ll be responsible for making it happen. Once agreed, the charter becomes the driving force behind the changes, not the people that are driving it.

Thank You!

Finally, thank you everyone who participated. This was not easy but it was necessary. More importantly, thank you in advance, because the work has just begun.

grypp · February 3, 2025, 2:27pm

I’ve been reading the posts, and there’s great feedback from the community, many of which resonate with me as well. Thanks to Renato for organizing this and kicking it off.

That said, the Tensor Compiler seems to be the only concrete outcome of these discussions. Shouldn’t the initial meetings focus on finding a path for the MLIR Organization & Charter rather than diving straight into Tensor Compiler-related work? Could we discuss this over ODMs first as some suggested in other threads, rather than assigning this work to the Tensor Compiler group?

I suppose everyone has their own areas of interest within MLIR. I am a downstream user and maintain upstream NVIDIA-specific dialects and deeply care about some of the core dialects. That’s okay, and it’s precisely why MLIR is a generic and great project. I believe that’s why discussing pathfinding could be first discussions.

rengolin · February 3, 2025, 5:04pm

The first one, not the only one.

This has been discussed for a year, including a panel, multiple round tables and ad-hoc sessions on the Euro and US LLVM meetings and many many threads in the forum.

We have repeated the same arguments multiple times in these venues, and from the past three main threads in the forum, it’s pretty clear there’s strong consensus (not necessarily unanimity) on this being the next step.

The consensus is that core is much more stable and we’re already progressing well enough there to need the level of engagement we’re planning for the tensor side.

This new thread is in addition to everything else that is happening, not as replacement. If you already have design discussions or concerns in other threads, please continue as usual.

banach-space · February 5, 2025, 10:38am

Thanks Renato, this plan makes sense to me.

Re selecting the right forum:

With no better heuristics, I suggest starting with a union of these two lists, i.e.:

@banach-space @MaheshRavishankar @javedabsar @rolfmorel @matthias-springer @jpienaar @Groverkss @kuhar @dcaballe @ftynse @qed

If anyone feels that we are missing anyone here, please comment - I am doing this very mechanically to help progress the discussion

Agreed, we should aim for a diverse group.

Indeed. @Folks listed above who are keen and available to participate, could you volunteer yourself? Let me start - please count me in

Just one final point …

I think that this is basically coming from the split within Vector, as documented here:

'vector' Dialect - MLIR

(virtual vector vs hardware vector?). I don’t want to get ahead of myself, but re-visiting this split could be one point for discussion within the newly formed group.

-Andrzej

matthias-springer · February 5, 2025, 10:58am

qed · February 5, 2025, 1:09pm

#2

Groverkss · February 5, 2025, 1:11pm

jpienaar · February 5, 2025, 1:39pm

kuhar · February 5, 2025, 2:39pm

ftynse · February 5, 2025, 4:25pm

One general caveat I’d like to avoid is being seen as overstepping or seceding. Specifically, whatever is done within the tensor compiler land must still adhere to and uphold the core principles and values of MLIR and LLVM overall, and either implement the project-wide design decisions or feed back to the whole project the desire to revise those decisions and follow due process there. The nature of what is accepted as canonicalization would be one concrete example here, I don’t think it is helpful if we redefine that locally in the tensor compiler to mean something different.

Otherwise, I’m happy to help draw the charter, especially providing historic context that I normally do on RFCs anyway.

MaheshRavishankar · February 5, 2025, 5:21pm

With the current list of volunteers, AMD is already well represented (through really it might be useful for upstream to really think of everyone as individual contributors). I am happy to help/contribute in some of areas that I have an interest in getting to a certain design point, and would love to follow along more broadly.

rengolin · February 5, 2025, 10:00pm

I don’t think that’d be fair. As much as we all want what’s best for upstream, we still have our internal requirements dictated by the kind of projects we work on our own companies.

While there’s no vote (no risk of out-voting), there is a notion of consensus, which would be wrong to assume has been reached if the people agreeing are all from the same company.

To avoid even the idea of such an accusation derailing a technical discussion, we should keep it simple.

Bear in mind these lists were built by one person using biased fuzzy practices. We should use them for what they are.

I’d start with no more than one person from a group / company and then work up from there.

The first stages are more to understand and document what we have, so less risk for bias, but as we start selecting what to keep and what to change, we need to be careful.

Absolutely!

Yes, I mention this on the previous post. Others have commented on the dialect split, too. We need to look at the status from a holistic point of view and make sure we don’t pigeon hole anything prematurely.

MaheshRavishankar · February 5, 2025, 10:20pm

Ok, I didnt mean to give an idea of “vote” or “consensus”. I was just saying that most people who have already volunteered from my parts of the world, see the same things and are mostly aligned on things. So if anything I am trying to avoid over-populating one particular view point.

rengolin · February 5, 2025, 10:48pm

Sorry, I understood your point and was trying to expand on it, not countering.

sjarus · February 6, 2025, 7:20pm

Looking at the first post here and the MLIR Organization & Charter final proposal thread, I might be missing something around the waistlining of dialects here. Quoting the other thread, the tensor related constructs are:

Tensor

linalg, tensor, TOSA

bufferization, ml_program

and

Tensor: linalg, tensor, TOSA, bufferization which are directly related to such workloads.

Whereas this thread lists the selection basis as:

Looking at the recent merges into linalg , tensor and vector

I’m curious as to how and when did this change in scope evolve ?

rengolin · February 6, 2025, 10:17pm

See note right below that quote:

sjarus · February 6, 2025, 10:56pm

Sure, that addresses moving Vector from low-level previously to Tensor:

Tensor

linalg, tensor, TOSA

bufferization, ml_program

Low-Level

arith, math, index, ptr

cf, scf, func, affine, omp

memref, vector

It doesn’t cover what’s been removed from the group of tensor dialects here.

rengolin · February 6, 2025, 11:55pm

Sorry, nothing has been “removed”. I just listed some people active on the core dialects.

With TOSA’s governance being outside of the LLVM community, there’s not a lot of sense including it in the LLVM governance groups (we can’t do much from here). If this gets confusing, perhaps we can have TOSA in the core group and work on its governance later?

bufferization is mostly @matthias-springer (design and code) and ml_program is mostly dead.

sjarus · February 7, 2025, 2:14am

Sure, in terms of design steering at a dialect level, the TOSA specification would offer the governance basis. This applies in general to any such specification-backed dialect.

However, there’s a substantial body of content under Conversion/ that is TosaToXXX where XXX is a fellow tensor dialect such as linalg or tensor or even low-level dialects like scf and arith .

Combined, they constitute ~5500 LoC in the repository, with ongoing active contributions by several parties outside Arm. This infrastructure is business-critical and load-bearing to these parties.

Major activity within the fellow tensor dialects considered here, or within low-level dialects, would result in the need to manage and resource work to update these conversions.

Further, there are active users who differentially lower from higher level abstractions (e.g. Torch, TFLite) to both tosa and linalg/tensor, or for other such use cases. E.g. @rafaelubal @sayans are actively working in this manner.

Even without needing to have a strong opinion on the confluence of linalg/tensor/vector design, this connectivity means that operational and resourcing needs would need at least stakeholders to reach out to when a redesign necessitates work here.

Similar interfacing technical and operational considerations apply downstream of the linalg/tensor/vector trio .

So I suggest that where there are inbound and outbound dependencies around the dialects that are the primary technical focus here, there be clearly defined stakeholders who can support and assist with technical risk evaluation, resourcing and other governance considerations here so as to enable this group.

rengolin · February 7, 2025, 12:19pm

This matches the feedback of the survey that TOSA was mostly used as an ingress dialect, like Torch, ONNX and HLO.

Yes, but the governance of those dialects are not upstream LLVM either. While they’re stakeholders of the tensor and low-level group, so are Triton, and other front-ends.

During the discussion on the proposal after the survey, @jpienaar raised a very important point: some people work on the dialects from an upstream perspective (writing code and designing semantics to cater to all users), while others have a particular use in mind (particular front-ends and back-ends). The former group are what open source projects usually consider to be maintainers, while the latter is more like stakeholders.

Both have their place, and both are important, but when you mix their roles, you get biased maintainership or weak “stakeholding”.

Exactly. If we apply the same proposal downward (make maintainer from stakeholders), then the entire LLVM community needs to have an opinion in how we design the dialects, which goes back to the chaos we had before.

There most definitely are these dependencies. I had long discussions with @stellaraccident throughout this process about the importance of ingress and egress. Without them, there is no linalg.

So it is critical that the stakeholders from those communities become vocal members of the tensor compiler design group. But they don’t need to be maintainers. Here, I’m using the term to mean “the people who will catalogue and work to reach consensus with the rest of the community”, not “the people who will decide on their own what should be done”.

So I very much expect that you and the other TOSA, Torch, Triton, ONNX, HLO folks become active participants in the process, as will me and Stella and Mahesh. But we don’t need to be in the group who will do the consensus work directly.

Topic		Replies	Views
MLIR Organization & Charter MLIR	24	1864	February 3, 2025
[RFC] MLIR Project Charter and Restructuring MLIR	92	5372	November 16, 2024
MLIR Open Meeting: Tensor Compiler WG, 2025-04-29 Announcements	6	336	April 29, 2025
[Survey] MLIR Project Charter and Restructuring Survey MLIR	13	2255	December 9, 2024
Google’s TensorFlow team would like to contribute MLIR to the LLVM Foundation LLVM Dev List Archives	22	511	October 8, 2019