[RFC] MLIR Project Charter and Restructuring

Written in collaboration with @stellaraccident @clattner @ftynse @jpienaar @mehdi_amini

Introduction

In recent years we have discussed many changes to the MLIR infrastructure, dialects, conversions, representations, transforms and their future. There is a sense of stagnation on progressing those critical issues from fragmentation created by lack of clarity on what is the purpose of MLIR in general.

Like LLVM, there are a number of external (upstream and downstream) users of various levels of the infrastructure, from just the core logic to all dialects and transforms. Unlike LLVM (few front-ends, single IR, default pipeline, specific targets), there isn’t a well defined core semantics for the whole compiler, which makes it really hard to even define what MLIR is to begin with.

Multiple (partial) solutions have been floated and many of them incompatible with each other. For example, dialect independence (being able to import dialects from other projects) really needs substantial changes to the core infrastructure of what a dialect is and how to connect it to the rest of the ecosystem. Another example is the expected semantics of some dialects and disagreement leading to new dialects (example is the Linalg/TOSA/TCP Venn diagram), some of which do not “make the cut” upstream for often unclear reasons. Finally, there’s an idea of “purity of design” that is subjective and not actionable, while there’s the pressure to “get work done” that runs against that in the opposite corner. Neither are good positions to be in, but we have yet to find where we stand in that spectrum.

We need to agree to a common charter, at both high level (what MLIR is and how it fulfils its purpose with lowest global cost) and low-level (dialect semantics and conversions, what does canonicalization means, type systems, etc). The old code ownership model does not work here because of the number of voices and the disconnected nature of the MLIR infrastructure, more as a “tool bag” than a tool itself.

We have discussed this topic at length during the US LLVM event last week, on meetings, panels and round tables. The long post below is a summary of the discussions and some concrete proposals on the next steps.

Technical Charter

We believe there is consensus that a strong charter and ownership model is beneficial to LLVM as a whole. The recent restructuring of Clang’s new ownership model, LLVM’s new code ownership, and the introduction of multiple maintainers and governance proposals are clear signals that the community as a whole puts high value on those. MLIR should be no different.

However, MLIR is more of a loose bag of tools than a compiler or a language front-end, with a fixed set of stages in a clear hierarchical shape. It is harder to separate the interests of the majority while allowing for localised minorities to still represent their required semantics on their separate tools and pipelines, which is what makes MLIR a great tool to work with.

A translation of the Clang/LLVM process to MLIR would require a more nuanced approach, where maintainers of the core infrastructure need to consider the needs of their users and liaise with the different user groups to make technical decisions based on overall lower maintenance cost, and not push for forced designs that only benefit a theoretical implementation, given we don’t have a default pipeline compiler in MLIR like we have with LLVM and several front-ends.

Furthermore, it’s not easy to split MLIR into a series of dialects, because dialects do not exist alone, their value derives from the transforms that can be done to them and the conversions between dialects until final lowering. There are also hidden dependencies in implementation, for example when the Linalg dialect uses the tensor type (which is in the built-in dialect), but requires tensor operations (in the tensor dialect) to pack and reshape its operands. However, Linalg is not the only user of tensor, so changes to either of them will invariably impact other dialects not in that list. Same goes for vector, memref, scf and other base dialects that are dependent upon as results of transformations and conversions.

In the end, there were two main proposals that coalesced into their own pillars:

  1. We continue with the MLIR project as is and solve technical, maintenance and direction in-tree as a single project.
  2. We split MLIR into core + bundles so we can discuss the charter and technical details of different areas more independently.

The first proposal is to keep the status quo, but still to progress on the ownership, charter and deeply technical discussions. The proponents argue that the code churn from an actual redistribution would be a high cost for low benefit because we can technically solve those problems irrespective of where the code lives. Others argue that this is not working because there are too many conflicting opinions which makes it hard to reach consensus on direction on a very large project.

The second proposal is to change the status quo and separate the concerns into core and work areas, so that the sub-groups can participate in more constructive discussions with better aligned goals. The proponents argue that the old model isn’t working and that this will also help solve layering issues, especially implicit dependencies as mentioned above by being clearer about the intentions and code structure. Others argue about the division itself, and that, while most dialects have clear purpose, others fall into a highly subjective grey area on how they divide and intersect with each other, and how to control dependencies.

Governance Proposal

Regardless of a potential split, we seem all to be in agreement that we need a governance proposal to MLIR.

Given the highly distributed nature of MLIR, we cannot have single owners of common areas, especially the core infrastructure. It also does not scale to have one owner per dialect, because interest shifts quickly and transforms, conversions and lowering are at the intersection of multiple dialects, sometimes implicitly so, for example, the base types and their separate dialects.

So even if we don’t split the code, we still need to come up with a grouping strategy for ownership and technical charter definitions. If we do split the code, it should be split along those lines anyway.

The main idea for governance is to follow the LLVM / Clang model for the general model, with variations below.

  1. The core MLIR infrastructure cannot have a single maintainer, and needs enough people to represent the subgroups we define charters for, be them maintainers or stakeholders.
  2. The subgroups themselves need to also have more than one owner to represent their user groups. We don’t need to name what each one represents, as long as it’s clear that there’s enough representation in the group. If a subgroup is small enough that can only have a single owner, perhaps we need to rethink if that subgroup needs to exist or be incorporated into another.
  3. With at least one owner of a subgroup also in core, there must be an intersection of ownership. Owners can participate in different subgroups, too.
  4. Dialects, conversions and transforms must have owners. Either within a subgroup or individually, or both. These owners are responsible for keeping the code in shape and demonstrating usefulness to others.
  5. Dialects should belong to a subgroup, as a mark of usefulness to others, but it’s not required to do so. Disconnected dialects without owners or those in subgroups that are unused or deprecated may be asked to move out into an incubator or a separate project.
  6. Disagreement between maintainers should be resolved locally within their subgroup, escalating to core if necessary. This is due to the generality of decisions, not higher authority.
  7. Core maintainers cannot dictate what happens in subgroups. Essentially, core is “just another” subgroup that happens to be more general and depended upon by all other subgroups.
  8. Technical arguments are encouraged across subgroups, including core, to make the best overall decisions across the project.
  9. We DO NOT want to create silos, we just need to reduce cross talk to a useful level.

Code Reorg Proposal

If we reach consensus on the need for a split in charter and maintainership, the next step is to plan if we split the code in the same way. If we do, below is an example of how that could happen.

This is a proposal by some of the people involved in this discussion for a long time and during the meetings last week. It is an idea, not a final decision. There are many issues to work around and we want to know if people are keen on doing it or not, and if yes, then we’ll start a string of surveys to collect information and reach a consensus by the year’s end.

The proposal is to:

  1. Separate the MLIR project into at least three interrelated components:
    1. Core: Builtin dialect and types, common infrastructure, reusable infrastructure (+pdl, transform etc.)
    2. Software: dialects and interfaces for building software solutions (arith/math, cf/scf, vector, llvm, ptr, omp, nv/amd/xe/x86/arm etc.). This becomes a dependency to tensor compilers, language front-ends, super-optimizers, MLGO, etc.
    3. Tensor: dialects and interfaces for building tensor compilers (linalg, TOSA, tensor, memref, etc.). This becomes the building block to tensor compilers such as IREE and the various downstream projects using MLIR for their ML stack.
    4. Other dialects may be loose or bundle up in other groups. Experimental stuff should have their own ecosystem.
  2. Keep the split in the monorepo, as top-level directories:
    1. Clear dependencies through CMake, include path, etc.
    2. Naming bike-shedding can be done later, but at least one core directory and one or more bundle directories.
    3. Unclear dependencies will become clearer when CMake starts to fail builds due to unknown headers, object files and library paths

There could be other bundles, for example, hardware related (which CIRCT uses) or distribution related (for things like TensorRT) or language bindings, etc. These discussions need to happen only after we agree there’s a split and start breaking down the infrastructure.

We may also need to decide what to do with some dialects that are either disconnected or unfinished or need serious work to become usable. We can reuse/adapt the LLVM component policy to MLIR, create work groups to bring them up to the same level as the others, etc. That’s a matter for a follow up discussion, not here.

Actions

The proposed next steps are:

  1. Reach a consensus in this RFC if:
    1. We need a governance model that is compatible with recent changes in Clang/LLVM
    2. Our governance model may be slightly different (ownership intersection)
    3. We need to logically split the dialect soup into subgroups to make it manageable
    4. We may need to physically split it into multiple directories too
  2. Create a survey on dialects and gather feedback from all in the community
    1. Which dialects are in use and how they connect
    2. Who are the active developers/owners/stakeholders
    3. What is their scope and future, if not widely used/complete
    4. Which category they fall in (core, software, hardware, tensor, distribution, …)
    5. What parts of core are in use, how they connect to dialects, transforms, conversions
  3. Collect the survey’s results and publish a summary (End of Dec 2024)
    1. Have enough information to propose a concrete split and ownership model
    2. Defined the technical charter and direction for dialects and MLIR in general
    3. Gather volunteers for the work (PRs) over the following months into the next release
    4. Create an RFC with the proposal and reach agreement on the implementation
  4. Create PRs with the proposed code move and ownership models after LLVM 20 branches off (End of Jan 2025)
    1. Perform the code changes, change builds, validate downstreams
    2. Allow time for downstreams to build the new format and provide feedback, cycle
    3. Merge PR in time to allow for corrections before LLVM 21 branches off

This will take some time but we’d like to start some surveys as soon as possible to gather more data in an organized fashion to have a stronger proposal by the end of the year.

Disclaimer: This was a long post, written over many days, posted on a Friday night. It may contain typos, edit errors, repetitions, inconsistencies and inaccuracies. Please, give more importance to the intention than the presentation.

CC: @banach-space @River707 @Mogball @javedabsar @MaheshRavishankar @makslevental @kiranchandramohan @AaronBallman @TobiasGrosser @asb

28 Likes

Would it make sense to have a 4th category to group dialects that are related to a standard? Thinking of : OpenMP, OpenACC, SPIRV, MPI…

“Standard based dialects” serve different purposes, connect at different levels and are depended upon by different dialects. The idea here is to separate in a way that reduces code churn (inside the mlir directory or as top-level directories), and bundling multiple levels in one directory would create cyclic dependencies, not remove them.

thanks for doing this work, Renato and others!!

3 Likes

This is a fantastic proposal, thank you for driving this forward Renato. I think this will be a big step forward to help us define clear charters (and maintainers) for various important parts of MLIR, and will help scale into the future as the ecosystem grows even larger. +1 from me

-Chris

4 Likes

Thanks Renato for this wonderful and comprehensive write-up. There are a number of independent threads in your proposal, which while we agree with, may need to be separated out to have separate follow-through instead of everything in one thread.

  • “We need to agree to a common charter, at both high level (what MLIR is and how it fulfils its purpose with lowest global cost) and low-level (dialect semantics and conversions, what does canonicalization means, type systems, etc).”
  • governance, ownership model, code maintainers
  • survey
  • splitting of MLIR

I am in favor of : “1. We continue with the MLIR project as is and solve technical, maintenance and direction in-tree as a single project.” . The other option is far more disruptive.

anyways, thanks again Renato and I am happy to help in your effort.

1 Like

I’ll defer responding much as this is a Request For Comments, and so want to hear from folks we haven’t heard from already.

Indeed this RFC is about giving context and providing an overview. Instead of your 4 threads, I had seen this as 3:

  1. Governance proposal (towards helping all the parts mentioned above)
  2. Directory reorganizing yes or no.
  3. How the grouping looks like.

Where 3 depends on 1 more than 2 IMHO. And the directory proposal here is one possible one. You are correct though that charter and survey are independent too.

1 Like

Thank you all for putting this together—it’s a much-needed and timely discussion!

I see two main threads in the proposal and that’s how I’ve organised my answer. I will also use the labelling from @jpienaar above.


Governance Proposal

1. Governance proposal (towards helping all the parts mentioned above

I really like this idea.

I’m not sure what you meant by the “distributed nature of MLIR” as justification, so let me share my view, and you can let me know if that aligns with your thinking.

Dialects, as an example, often feel like independent entities but are in fact tightly coupled—through conversions, shared infrastructure, and transformations. For instance, the Linalg vectorizer connects Linalg and Vector, making it beneficial when Vector maintainers also have a stake in Linalg. Similarly, Vector serves distinct targets—LLVM for CPUs and SPIR-V for GPUs—so representation from each area is crucial.

Agreed. Additionally, we should incorporate criteria and policies for deprecating and removing components, especially dialects. Is a maintainer/owner enough to justify keeping something in-tree, or do we also consider active use and relevance? A component that’s seldom used but has a maintainer might still present a maintenance burden.

It would be good to clarify how disagreements are resolved. On one hand, we encourage escalation to “core maintainers”, but then “core maintainers cannot dictate what happens in subgroups”. Why not grant some special (even if limited) powers to “core maintainers”?

Completely agree. One suggestion here would be to ensure diversity of host organizations among maintainers, as this might help balance perspectives and reduce insular approaches.


Code Reorg Proposal

2. Directory reorganizing yes or no.

In my view, governance is likely to require substantial effort on its own, and I think it can be managed independently of a code reorg. Attempting to address both at once may slow overall progress.

I’m also unsure how a split would help MLIR directly. Improved build times would be a clear benefit, but this would depend heavily on the specifics of the reorganization and might not make a noticeable difference for everyone.

That said, if the community believes this will help productivity and MLIR’s overall health, I’m on board. My goal is to support what’s best for MLIR and to help make any transition as smooth as possible.

3. How the grouping looks like.

My perspective is obviously biased toward my use cases. I think Core makes sense; I’m still trying to better understand Software and Tensor.

Vector and Linalg
For me, Vector is a bridge between Linalg and LLVM, containing high- and low-level parts. It’s essential not to divide Linalg and Vector in a way that makes this bridge harder to maintain. Perhaps splitting Vector into high- and low-level sections could help here, but this idea might need more exploration.

Memref
memref feels like a low level detail more suitable for Software. Presumably this includes bufferization?

Hardware dialects
Regarding hardware dialects, I believe it’s important to establish clear criteria for adding and maintaining them. This feels urgent, as every new dialect raises questions about the bar for inclusion. As discussed at LLVM Dev, some existing hardware dialects are barely used, and setting a clear standard would benefit everyone (i.e. existing maintainers and new contributors).


I hope this is helpful! :slight_smile: Again, I’m super grateful for and excited about this proposal - thank you all!

-Andrzej

That’s pretty much it.

We have a policy for deprecation in LLVM that can be applied to MLIR, we don’t need a new one. Just having a maintainer isn’t enough.

Because even limited power can be misused, even if unintentional. Open source projects’ history is full of cases where people really care about a project but cannot see that their developers users are moving away from “core maintainers”.

I do not doubt their intentions, I doubt their ability to stay impartial and make decisions against their ideals but that are good for the community.

This is a nice idea but imposing any rule here would remove the ability for people to move around, as their ownership could become tied to their organizations.

This is a subset of what either linalg and vector dialects do. I can come up with a bunch of sub-set rules that “infer strong relationship” between two dialects, but when you bring all the other use cases, it dilutes.

linalg is a high-level representation, its most close friend is tensor, where most transforms happen. Bufferization as a concept is generic, but the current implementation is very much tensor-centric. vector is a low-level representation that I can skip altogether when lowering linalg to special targets, micro-kernels, etc.

We must make sure they work together for vectorization to work, but that’s not necessarily a property of either dialect.

Agreed. MLIR has been very relaxed in this area and I don’t think it’s a good thing. We have all the big manufacturers active, we should all get together and create a charter for interoperability with vector, llvm, etc.

This is one of the main reasons why we think creating these sub-groups is so important. If the hardware dialect maintainers agree on a common design, they should be able to drive these changes through. But it’s their duty to make sure that works for their producers (ex. tensor compilers, front-ends, etc) and consumers (ex. LLVM, graph compilers, etc).

If you make any sub-group more “powerful” than others (for example, the core group), then they could essentially lock the other groups in place and no innovation can happen. In OSS, this usually means fork, especially when there’s already a large group already aligned. They’ll just start elsewhere.

I believe in eventual consistency. I believe sub-groups will align with each other and make sure their designs and implementation benefit others, not just themselves. I believe if one sub-group starts moving too far off, the others will push them back. That’s the main reason why we need cross-ownership.

Thank you all for bringing this up. It’s certainly a topic that needs in-depth discussion.

I support the idea of establishing a more explicit and structured governance policy, including the creation of subgroups and assigning maintainers to different components. At this point, it’s not clear who to contact when there is a problem with or someone is interested in contributing to a specific part of MLIR. However, I’m still considering the points about the decision-making process and conflict resolution. Some of these points seem to suggest that we want to keep certain people’s opinions away from some subgroups, which I believe would lead to fragmentation and exclusion within the community. I don’t have the context of the F2F discussions from last week but I believe we need roles within the community that ensure the overall health of MLIR. We should make it easier for these roles to exist, not harder.

I also support the idea of reorganizing the code base. Emphasizing which components are stable and which are experimental is overdue. However, I’m against splitting the existing codebase into separate repositories, as this could make integration even more painful and lead to community fragmentation. I agree with Andzej that a deprecation policy is necessary but also a better upstreaming policy. Honestly, I’ve never seen such a strong push for upstreaming new components to an open-source project as I have seen in MLIR in the last year or two. Unfortunately, the lack of clear guidelines in this regard has led to conflicts. We need an upstreaming policy that ensures that most of the decision is mechanic and benefits the entire community moving forward.

Regarding Linalg, it requires its own separate discussion. Having followed and contributed to it since its release, I have concerns about never-coming stability. We have (ab)used it to prototype fast-evolving trends, resulting in too many back and forth cycles of refinement. As someone still invested in Linalg, I think it might be more practical to work on making it stable to serve its current purpose and consider building something else for the next trend.

1 Like

That is certainly NOT the intent. We want to reduce cross-chatter, not stop people from discussing important topics.

That is the whole point of this RFC.

This is the result of lack of clarity and direction. That’s why we need a technical charter.

Exactly why we want to separate (at least governance) into sub-groups, but still have people in more than one group. There needs to be cross pollination but strong direction.

This does not mean “only those people discuss”, but “those people direct the sub-community”, and make things happen.

A huge +1 for the governance/maintainers proposal, it would really be helpful to have clarity on who to include in design discussions.

What do you mean with cross-chatter? (I’m not familiar with the term or what it implies)

Is this just about not having a flat space of discussions?
We have subcategories on Discourse for that, for example TOSA or TCP, it would be easy to add a “Linalg” one if the linalg folks wanted to be able to track linalg discussion in a single place, or allow people to ask Linalg question in a more dedicated place.

Thanks, Renato for pulling this out of the numerous forums and side conversations where it originated and giving us a common thread to push on.

I’m very +1 on getting logical sub projects defined, giving them the mission to define a charter and join their key stakeholders with some local governance, goals, and scope. The project will be much better for that, even if that is the only thing that comes out of this specific push right away.

I do also think that some amount of code movement is an inevitable consequence of doing that work. Between the needs to reconcile the core infra with the rest of LLVM and some of the domain specific projects needing to be physically located so that they can grow, some of that movement is overdue imo. But it will be a lot easier to see from the other side of the governance process. While such things should align with a release, I agree with others: the next release is probably faster than this can get figured out and done (perhaps outside of a few deprecation cases that we might find).

I’d like to get moving on the surveys. Some specific things I’m keen to get some direct data on for each dialect:

  • What is your current path for arriving at this dialect (ie. What paths are you using to transform into it)?
  • What are you transforming it to and why?
  • Some indication of importance that this be carried in tree and further, as part of the core project. I think we may want to get at this from a couple of direct questions like (stated as purely hypotheticals for the purpose of arriving at strong expressions of need vs unactionable feedback like “yes, I need that”):
    • What would be the impact to you if this dialect was removed from the core project and/or llvm-project repo?
    • If this dialect did not exist in the llvm-project repository, what other projects would be a suitable host and what cost would be involved in carrying out development there?
  • Which other dialects (in this project or outside of it) does it cluster with in order to be useful to what you are trying to solve?
  • If you could assign names to a logical group that includes this dialect, what would they be?

There are probably people better at putting survey questions together than me. Those are just some of the things I wish I knew the composition of in order to form a better picture in my head.

Thanks! It seems we have strong consensus that we need this, so we can start the other two discussions:

  1. How do we create the sub-groups and their maintainers.
  2. What code movement comes from this.

I’d keep deprecation discussions orthogonal. If we use the existing policies, and have clear data from surveys, we shouldn’t need any major arguments there.

Those are particular to dialects, but yes, we should use that for discussing the sub-groups. However, this is a practical solution to an existing problem, not a direction in how do we divide the charter.

So, I’d transpose the order and say: let’s define the charter/ownership bundles and then create sub-spaces to discuss them. Ie. the important part is not where we discuss, but how we govern the discussion towards achievable goals.

The initial dates were not fixed. The key point there is to create the mess right after a release and clean it up with enough time before the next one. Again, we have plenty of precedence for this, so there should be no contention here.

Indeed, we’re at that point already. I’ll start a Google form (unless someone has a better idea). I’ll keep it simple: dialects in use, ingress/egress, bundles.

Brave’s AI was spot on:

In summary, cross chatter encompasses two distinct meanings:

  1. Technical interference in communication channels, caused by unwanted signals or energy transfer from another circuit or channel.
  2. Informal, incidental conversation that diverges from the main topic of discussion.

Possibly find a way to crowd source a word cloud for what group the dialect is part of. Naming and grouping is hard – might as well see if there is some wisdom of the crowds to guide us here.

1 Like

Do you have more concrete example than the “dictionary style” definition?
I’m trying to figure out exactly what we’re trying to solve in the context of MLIR right now, that seems quite important because I share the same concern as @dcaballe here:

I know this isn’t your intention, but the cure can sometimes be worse than the disease, with all the best intentions.

Overall my take on all this is that we can likely improve on a lot of things, but I’m on the “conservative” side: I’d like to see a gradual evolution instead of what seems like too much of a “big-bang” change. That is: build a more crisp understanding of particular problems that are important pain points right now, then solve the most important ones, and iterate.
The problem statement is overall still quite fuzzy to me at this point.

I would also add that consistency across the project is something that is critical to me, and building better technical direction for some components and improve all the documentation surrounding is different from isolating components in what seems almost like “separate projects”. If a “subproject” is desirable from a governance point of view, then I would push for incubator-style splitting instead.

This sounds to me like the problem of scale that this proposal is precisely seeking to solve: the project has outgrown the ability or need to do this in lock step by normalizing all use cases and people into one bucket, yourself included. You’re trying to do full LTO on a problem that is perfectly amenable to and would benefit from thin LTO. The approach is sound. The scale is not.

1 Like

Thanks @rengolin and others for this RFC. Looks good in spirit.

Would this mean that for building a project like Flang we will need to enable three additional MLIR projects (-DLLVM_ENABLE_PROJECTS=“core;software;tensor;flang”) instead of one (-DLLVM_ENABLE_PROJECTS=“mlir;flang”) currently?

Can we combine Tensor and Software? While linalg and TOSA are unlikely to be used in non-ML stack, tensor and memref are likely to be used. In the OpenMP dialect we have some handling for conversion of omp+memref code. Flang also has limited use of tensor/memref.

An alternative model of distribution could be to move the llvm dialect into the LLVM repository, OpenMP dialect into the openmp runtime directory. I would assume there would be objections to this since this mixes MLIR and non-MLIR/runtime code.

FYI: @clementval @jeanPerier @tblah @skatrak @kparzysz for OpenACC/OpenMP/Flang.

Something like that, I imagine. Though, since flang already adds mlir as dependency if not set, I imagine if flang depends on mlir-tensor and mlir-tensor enables mlir-core, Flang users would have no change in their builds.

Yes, that was an example of division, not what must happen. One proposal in the round table was to split core/non-core first, then iterate non-core into further sub-divisions. The concern there is that the code churn issue escalates and drags into multi-year efforts, which is bound to leave scars in code and people.