Written in collaboration with @stellaraccident @clattner @ftynse @jpienaar @mehdi_amini
Introduction
In recent years we have discussed many changes to the MLIR infrastructure, dialects, conversions, representations, transforms and their future. There is a sense of stagnation on progressing those critical issues from fragmentation created by lack of clarity on what is the purpose of MLIR in general.
Like LLVM, there are a number of external (upstream and downstream) users of various levels of the infrastructure, from just the core logic to all dialects and transforms. Unlike LLVM (few front-ends, single IR, default pipeline, specific targets), there isn’t a well defined core semantics for the whole compiler, which makes it really hard to even define what MLIR is to begin with.
Multiple (partial) solutions have been floated and many of them incompatible with each other. For example, dialect independence (being able to import dialects from other projects) really needs substantial changes to the core infrastructure of what a dialect is and how to connect it to the rest of the ecosystem. Another example is the expected semantics of some dialects and disagreement leading to new dialects (example is the Linalg/TOSA/TCP Venn diagram), some of which do not “make the cut” upstream for often unclear reasons. Finally, there’s an idea of “purity of design” that is subjective and not actionable, while there’s the pressure to “get work done” that runs against that in the opposite corner. Neither are good positions to be in, but we have yet to find where we stand in that spectrum.
We need to agree to a common charter, at both high level (what MLIR is and how it fulfils its purpose with lowest global cost) and low-level (dialect semantics and conversions, what does canonicalization means, type systems, etc). The old code ownership model does not work here because of the number of voices and the disconnected nature of the MLIR infrastructure, more as a “tool bag” than a tool itself.
We have discussed this topic at length during the US LLVM event last week, on meetings, panels and round tables. The long post below is a summary of the discussions and some concrete proposals on the next steps.
Technical Charter
We believe there is consensus that a strong charter and ownership model is beneficial to LLVM as a whole. The recent restructuring of Clang’s new ownership model, LLVM’s new code ownership, and the introduction of multiple maintainers and governance proposals are clear signals that the community as a whole puts high value on those. MLIR should be no different.
However, MLIR is more of a loose bag of tools than a compiler or a language front-end, with a fixed set of stages in a clear hierarchical shape. It is harder to separate the interests of the majority while allowing for localised minorities to still represent their required semantics on their separate tools and pipelines, which is what makes MLIR a great tool to work with.
A translation of the Clang/LLVM process to MLIR would require a more nuanced approach, where maintainers of the core infrastructure need to consider the needs of their users and liaise with the different user groups to make technical decisions based on overall lower maintenance cost, and not push for forced designs that only benefit a theoretical implementation, given we don’t have a default pipeline compiler in MLIR like we have with LLVM and several front-ends.
Furthermore, it’s not easy to split MLIR into a series of dialects, because dialects do not exist alone, their value derives from the transforms that can be done to them and the conversions between dialects until final lowering. There are also hidden dependencies in implementation, for example when the Linalg dialect uses the tensor type (which is in the built-in dialect), but requires tensor operations (in the tensor dialect) to pack and reshape its operands. However, Linalg is not the only user of tensor, so changes to either of them will invariably impact other dialects not in that list. Same goes for vector, memref, scf and other base dialects that are dependent upon as results of transformations and conversions.
In the end, there were two main proposals that coalesced into their own pillars:
- We continue with the MLIR project as is and solve technical, maintenance and direction in-tree as a single project.
- We split MLIR into core + bundles so we can discuss the charter and technical details of different areas more independently.
The first proposal is to keep the status quo, but still to progress on the ownership, charter and deeply technical discussions. The proponents argue that the code churn from an actual redistribution would be a high cost for low benefit because we can technically solve those problems irrespective of where the code lives. Others argue that this is not working because there are too many conflicting opinions which makes it hard to reach consensus on direction on a very large project.
The second proposal is to change the status quo and separate the concerns into core and work areas, so that the sub-groups can participate in more constructive discussions with better aligned goals. The proponents argue that the old model isn’t working and that this will also help solve layering issues, especially implicit dependencies as mentioned above by being clearer about the intentions and code structure. Others argue about the division itself, and that, while most dialects have clear purpose, others fall into a highly subjective grey area on how they divide and intersect with each other, and how to control dependencies.
Governance Proposal
Regardless of a potential split, we seem all to be in agreement that we need a governance proposal to MLIR.
Given the highly distributed nature of MLIR, we cannot have single owners of common areas, especially the core infrastructure. It also does not scale to have one owner per dialect, because interest shifts quickly and transforms, conversions and lowering are at the intersection of multiple dialects, sometimes implicitly so, for example, the base types and their separate dialects.
So even if we don’t split the code, we still need to come up with a grouping strategy for ownership and technical charter definitions. If we do split the code, it should be split along those lines anyway.
The main idea for governance is to follow the LLVM / Clang model for the general model, with variations below.
- The core MLIR infrastructure cannot have a single maintainer, and needs enough people to represent the subgroups we define charters for, be them maintainers or stakeholders.
- The subgroups themselves need to also have more than one owner to represent their user groups. We don’t need to name what each one represents, as long as it’s clear that there’s enough representation in the group. If a subgroup is small enough that can only have a single owner, perhaps we need to rethink if that subgroup needs to exist or be incorporated into another.
- With at least one owner of a subgroup also in core, there must be an intersection of ownership. Owners can participate in different subgroups, too.
- Dialects, conversions and transforms must have owners. Either within a subgroup or individually, or both. These owners are responsible for keeping the code in shape and demonstrating usefulness to others.
- Dialects should belong to a subgroup, as a mark of usefulness to others, but it’s not required to do so. Disconnected dialects without owners or those in subgroups that are unused or deprecated may be asked to move out into an incubator or a separate project.
- Disagreement between maintainers should be resolved locally within their subgroup, escalating to core if necessary. This is due to the generality of decisions, not higher authority.
- Core maintainers cannot dictate what happens in subgroups. Essentially, core is “just another” subgroup that happens to be more general and depended upon by all other subgroups.
- Technical arguments are encouraged across subgroups, including core, to make the best overall decisions across the project.
- We DO NOT want to create silos, we just need to reduce cross talk to a useful level.
Code Reorg Proposal
If we reach consensus on the need for a split in charter and maintainership, the next step is to plan if we split the code in the same way. If we do, below is an example of how that could happen.
This is a proposal by some of the people involved in this discussion for a long time and during the meetings last week. It is an idea, not a final decision. There are many issues to work around and we want to know if people are keen on doing it or not, and if yes, then we’ll start a string of surveys to collect information and reach a consensus by the year’s end.
The proposal is to:
- Separate the MLIR project into at least three interrelated components:
- Core: Builtin dialect and types, common infrastructure, reusable infrastructure (+pdl, transform etc.)
- Software: dialects and interfaces for building software solutions (arith/math, cf/scf, vector, llvm, ptr, omp, nv/amd/xe/x86/arm etc.). This becomes a dependency to tensor compilers, language front-ends, super-optimizers, MLGO, etc.
- Tensor: dialects and interfaces for building tensor compilers (linalg, TOSA, tensor, memref, etc.). This becomes the building block to tensor compilers such as IREE and the various downstream projects using MLIR for their ML stack.
- Other dialects may be loose or bundle up in other groups. Experimental stuff should have their own ecosystem.
- Keep the split in the monorepo, as top-level directories:
- Clear dependencies through CMake, include path, etc.
- Naming bike-shedding can be done later, but at least one core directory and one or more bundle directories.
- Unclear dependencies will become clearer when CMake starts to fail builds due to unknown headers, object files and library paths
There could be other bundles, for example, hardware related (which CIRCT uses) or distribution related (for things like TensorRT) or language bindings, etc. These discussions need to happen only after we agree there’s a split and start breaking down the infrastructure.
We may also need to decide what to do with some dialects that are either disconnected or unfinished or need serious work to become usable. We can reuse/adapt the LLVM component policy to MLIR, create work groups to bring them up to the same level as the others, etc. That’s a matter for a follow up discussion, not here.
Actions
The proposed next steps are:
- Reach a consensus in this RFC if:
- We need a governance model that is compatible with recent changes in Clang/LLVM
- Our governance model may be slightly different (ownership intersection)
- We need to logically split the dialect soup into subgroups to make it manageable
- We may need to physically split it into multiple directories too
- Create a survey on dialects and gather feedback from all in the community
- Which dialects are in use and how they connect
- Who are the active developers/owners/stakeholders
- What is their scope and future, if not widely used/complete
- Which category they fall in (core, software, hardware, tensor, distribution, …)
- What parts of core are in use, how they connect to dialects, transforms, conversions
- Collect the survey’s results and publish a summary (End of Dec 2024)
- Have enough information to propose a concrete split and ownership model
- Defined the technical charter and direction for dialects and MLIR in general
- Gather volunteers for the work (PRs) over the following months into the next release
- Create an RFC with the proposal and reach agreement on the implementation
- Create PRs with the proposed code move and ownership models after LLVM 20 branches off (End of Jan 2025)
- Perform the code changes, change builds, validate downstreams
- Allow time for downstreams to build the new format and provide feedback, cycle
- Merge PR in time to allow for corrections before LLVM 21 branches off
This will take some time but we’d like to start some surveys as soon as possible to gather more data in an organized fashion to have a stronger proposal by the end of the year.
Disclaimer: This was a long post, written over many days, posted on a Friday night. It may contain typos, edit errors, repetitions, inconsistencies and inaccuracies. Please, give more importance to the intention than the presentation.
CC: @banach-space @River707 @Mogball @javedabsar @MaheshRavishankar @makslevental @kiranchandramohan @AaronBallman @TobiasGrosser @asb