MLIR Organization & Charter

Proponents

@rengolin @stellaraccident @ftynse @jpienaar @banach-space @nicolasvasilache @clattner @MaheshRavishankar

Introduction

In the past few years, MLIR has grown immensely and became widely adopted across the industry. All major hardware vendors use it in production and it’s a popular choice amongst AI accelerators and software start-ups. It has a lot of tools with which to build tensor compilers for ML and HPC (like XLA and IREE), hardware design (like CIRCT), language front-ends (like Flang, Clang), and much more. There are also many downstream uses in various organizations in private or less advertised settings.

However, MLIR has become a victim of its own success. Its flexibility in creating new dialects, and the speed and directions in which multiple projects needed going, made it practically impossible to focus on a core pipeline and upstream compiler that has been the main reason for LLVM’s success. For that reason, for the last few years, the MLIR ecosystem has been suffering from a lack of clarity and direction.

In this document, and its associated RFC and Survey, we propose a new governance model for MLIR that focuses on the direction of a few core parts of MLIR, while remaining flexible to unrelated projects and experimentations. The core idea here is not to stifle innovation, but to bring the industry (corporate and academia) behind the core parts together under the same charter, to encourage better collaboration and co-evolution.

LLVM found solutions to many of these problems that ended up working well for the type of community and manner of evolution that it sought out, and that has resulted in most of the industry relying on it for compiler infrastructure. We are working to put MLIR into a similar evolutionary groove.

Background

Before we get into details, we want to express a few assumptions that are driving this proposal. These are the guiding principles we use to design the governance model, code ownership, charter definition and infrastructure decisions. These assumptions were reinforced by the result of the survey and recent upstream discussions. They are built upon the existing governance model creating a structure that should complement it rather than replace it, similar to the clang ownership proposal.

The Need for a Technical Charter

Currently, too many discussions in the forum rely on personal views and interpretation of compiler design. This not only isn’t helpful to reach consensus, but often gets borderline ad hominem. If we focus the design consensus into a technical charter, designed and decided by community maintainers and driven by actual implementations and real world usage, we can discuss each point against the charter and not people’s interpretations. By encouraging charters that cover both a technical guiding light and set out the expectations for how to achieve this, we believe that the technical result will be better than it is now, and we will address some of the recurring feedback asking for more visibility into the current state and next steps for key parts of the infrastructure.

Multi Governance

Open source projects and people with strong opinions always find each other. This is a good thing when there’s strong direction and progress, but it’s a really bad thing when there’s a clash of equally valid technical arguments. While the technical charter solves most of the personal view problems, if we end up with a single maintainer, it would not be easy to make sure the charter is a reflection of the main contributors and stakeholders.

MLIR Core and Areas

MLIR has a core builtin dialect and infrastructure, which is used by all dialects, transforms, passes, etc. This core infrastructure needs to cater to the rest of the code, including downstream users, and it’s very important for it to be stable and reliable. We should not have some corner of the design space changing core infrastructure without the knowledge and agreement of the other areas that rely on the same code. The core is in service of and needs to support multiple usage and user journeys. It is our belief that this part of the project is mature and it would benefit from explicitly being treated as such when considering its evolution.

The other areas identified are users of MLIR core, helping raise needs to core, as well as being in service of downstream users. Cross-talk can create an unproductive coupling by forcing agreement between non-overlapping parts of the code. This slows overall progress by preventing faster advancement in areas that are interdependent. The goal is to identify areas of collaboration with concerted groups of interested folks willing to drive and maintain the work in the service of the area’s users. It is our observation, both direct and informed by feedback, that by and large, these other areas are much less mature than the core, and that they need an evolution and project management approach that sets them up for reaching a more mature state.

Main Goals

Starting from increased understanding as part of the Survey, we wish to separate the ownership model into larger areas, in addition to existing code/dialect ownership, in order to design and drive the technical charter that the overall code will follow for each individual area and MLIR as a whole.

Our main goals are:

  1. Establish a technical governance model that represents the actual stakeholders of each part of the code, avoiding cross-talk (see above). Like the current trend in the rest of LLVM, this must be a multi-area (overlapping) multi-governance model, with a clear leadership model geared towards impact (those who work on and are affected the most) in each area. This RFC focuses on that goal.
  2. Reassess, encode and possibly redesign the technical charter of MLIR. After the governance model is in place, we can start updating the design/rationale documents to make sure we cover all the most used dialects in the various areas and make sure the objectives are the same. This will happen after this RFC is agreed and actioned upon.
  3. Set MLIR up for its next five years of growth. Most projects are not successful, and as a community, we are lucky to have had the opportunity to have grown MLIR from a few “wouldn’t it be nice” statements to a toolset that has become synonymous with general purpose compiler infrastructure. As with all human endeavors, though, what worked for a small set of aligned parties sitting next to each other rarely scales on its own to decade level longevity with a much larger and more fragmented set of stakeholders. We believe it is time to plan for broader stewardship of the success we have been granted.

Definitions and Governance Proposal

Defining “Usage Areas”

As exposed by the recent survey, the key areas in MLIR today are:

  1. Core: Builtin dialect and core APIs.
  2. Tensor Compiler and Kernel generation: with paths from ML and HPC frameworks, through tensor level and below (ex., Triton reaches further down), exiting as LLVM dialect + intrinsics, SPIR-V or EmitC.
  3. Language Front-end and Design: Clang, Flang, Julia, and other front-ends lower to their own dialects (outside of the MLIR tree) and then use the low-level dialects in upstream MLIR to lower to LLVM. DSLs can lower straight to MLIR and then to LLVM.
  4. Hardware Design: CIRCT and similar projects, using MLIR for hardware design and simulation, with most of the work done outside of the monorepo.

MLIR hosts many niche components, and we propose bundling those with core governance and design decisions until proven large enough to have its own area upstream (or be re-organized into an existing one). Work in progress dialects and code that live upstream may have a lower cost barrier for change (in-tree evolution), but still need to make sure they do not inadvertently change core code that affects other areas.

Note this is not a complete list of all MLIR usage domains. But in the survey, other areas were all selected together with those above. Of the three answers (out of 88) that were not, two named areas in the HPC/tensor domain, and one did not involve anything close.

Defining “Dialect Groups”

Despite being very different topics, the areas above reuse most of the dialects upstream. As exposed by the survey, the most common dialects across the three areas above are: cf, scf, func, llvm, arith, math, memref, affine, index.

In addition to those, the most common dialects reused in tensor compilers and kernels are: linalg, tensor, vector. Hardware design, front-ends and language design do not have any substantial usage beyond the core ones above.

Note that tensor/memref/vector here define operations, not the types, which continue to be in the built-in dialect. These very old dialects predated the ability to define types and attributes that were not built-in. Different design decisions likely would have been made if the core infra on which they ride was more mature at the point of inception.

Some of those operations are low-level (and used by all) while others are specific to linear algebra workloads, and used mainly by tensor compiler / kernel generator projects. A near future key goal of that sub-group will be to separate the concerns and clean up the dependencies.

A reasonable separation of dialect groups, that would allow the areas to work together would be:

  • Low-level: the common dialects above, with maintainers from the three areas.
  • Tensor: linalg, tensor, TOSA, bufferization which are directly related to such workloads.
  • Core: Builtin dialect, shared infrastructure, and all other dialects that do not belong in the areas above.

Language and hardware design groups already have their own groups outside of MLIR and do not need to have their groups here for now. A discussion around bringing them in the monorepo or extracting tensor into a separate repository is not part of this proposal but it is a natural follow up step.

Dialects should be allowed to move areas, be split or joined, brought in or extracted out, as long as the technical charter allows (i.e., not a fundamental piece of the MLIR ecosystem), the proponents have all alternatives covered, and the maintainers of the affected areas agree.

Note that most dialects implement various interfaces and many transforms operate on those interfaces. While they’re currently bundled together, their design should be guided by the dialects that use them and their group maintainers, which may be different, even in the same header. Future NFC code movement can be performed to make those distinctions clearer.

Changing Technical Governance

Having defined the project areas and dialects grouping above, we need to agree on a technical governance model and put in place the necessary tools to be able to create a technical charter for those groups. The technical charters should be written by dialect grouping in view of the usage areas. Therefore, we need multiple maintainers for each dialect group that belong to the different areas that use them.

This governance is for the technical charter of each group, which will guide the development of individual dialects. Specific dialect ownership isn’t part of this proposal and will continue as is. Once we agree on governance, it’ll be up to each group’s maintainers to decide on dialect ownership, direction and future.

The governance model does not need to be created from scratch, and should follow the clang model proposed and accepted last year.

The key points that we propose for MLIR is the following:

  1. Multiple maintainers: We do not want to have a single maintainer for any high-level part of the compiler, for reasons of availability, reduction of bias and inclusion, and to make sure we actually design for all. We also don’t want a lot of maintainers, or it would not be different than what it is today.
  2. Technical charter: High-level (technical charter) maintainers and dialect maintainers need not overlap. The former should focus on the key drivers behind the largest impact, while the latter focus on day-to-day implementation details, following the technical charter.
  3. Active maintainers: We want these maintainers to rotate with involvement. We should not have inactive maintainers and we should include those who participate actively and have a vested interest to become maintainers themselves.
  4. Overlapping ownership: We want to cross the boundaries of ownership, especially between the core and non-core groups.
  5. No veto power: We want maintainers to be responsible for discussing, agreeing and enforcing the technical charter according to the community’s use of their area, not their personal views. We want the community to challenge such a charter and have the chance of changing it, when demonstrating strong enough arguments to the pool of maintainers, not individual ones. Maintainers should exercise humility, especially when it comes to leveraging the wisdom and perspectives of those who came before or who have superior knowledge and insights about a topic.

We don’t believe there’s a lot of contention on the points above, at least not on its core principles. Implementation may vary, but the idea is to evolve faster than what we have been doing lately by avoiding battles of personal opinions and moving to technical discussions against an agreed charter.

It is an art to define how many people we have and how we rotate ownership. A reasonable initial proposal is to have 3-5 owners per group and to have some overlap between the shared groups (core, low-level). We would eventually integrate this with the overall LLVM ownership model globally.

The key takeaway here is that having active maintainers, driving scope-limited areas to a charter will address several points of recurring feedback:

  • Inability for production users to determine whether/when to invest: Maintainers provide proactive visibility into where an area can be expected to be over some period of time, and they are expected to be honest about the level of maturity and churn to expect over time.
  • Spending too much time debating simple problems: Maintainers are go-to people for determining solutions to questions of execution and sequencing, taking pressure off of the RFC process and open ended discussions for operational matters that are expected to resolve without debate.
  • Difficulty for newcomers to navigate how to make contributions to an area: Make it clear who the go-to people are to ask for advice and feedback about how to scope and make contributions.

Follow Up

Consolidating a Technical Charter

Once we agree on the governance model and select the initial maintainers, these groups can start writing the charter of their areas. The proposal is to reuse most of MLIR’s existing charter and evolve from there, with the key difference that the groups will be able to be more specific in their areas, and perhaps even take different design decisions on the non-overlapping areas of the code, as long as that does not require incompatible changes to core or other areas. For areas which need significant investment to achieve maturity, we expect that the charter must include a roadmap component describing how evolution is expected to proceed over some achievable timeframe.

Updating Infrastructure to Match

There was enough contention on the RFC thread on actually splitting the code that we will not propose this as a solution in this first iteration. But there was also enough discussion on how dependent the parts of MLIR are when building, that we still need to make sure the code is independent and areas can be built without each other.

This would mean we need to create separate libraries for each group and make sure they can be built as a bundle and linked independently, but also together as a big library, without symbol clashes. This is mostly build system maintenance, but it may require some header movement and will need new integration tests to check on every build.

Next Steps

Technical Governance

Dialect Groups

Action: Define which dialects will be part of the group’s charter.

Ultimately, these groups were defined by the breakdown of dialects exposed in the survey with regards to their usage on related projects (tensor, languages, hardware) but with the constraints of how they’re used today.

A draft proposal:

  • Core
    • llvm, complex, dlti, ub, acc, emitc
    • transform, pdl, shape
    • polynomial, async, mesh, mpi
    • sparse_tensor, ub, quant, vcix
    • gpu, nv, rocdl, spirv
    • arm_, x86, amx
  • Tensor
    • linalg, tensor, TOSA
    • bufferization, ml_program
  • Low-Level
    • arith, math, index, ptr
    • cf, scf, func, affine, omp
    • memref, vector

Note that this isn’t necessarily the best grouping for the dialects, but it’s a start. But this is a discussion beyond the scope of this proposal, which is to set the starting point, not a final goal.

Near future changes will involve handling memref/vector linear algebra portions, creating a sub-charter for the target dialects (CPU, GPU, C, SPIRV), and handling unused dialects.

High-Level Maintainers

Action: Gather stakeholders with a long history and commitment to the MLIR project that have a vested interest in MLIR being successful beyond prototypes and private projects. Select maintainers for the three areas (core, tensor, low-level).

The main criteria here is to represent a group that has concrete roadmaps for implementing upstream technologies and can define, articulate and defend MLIR’s core principles on design decisions and when resolving contentious issues in a way that is acceptable to the community and its values.

The main responsibilities of the high-level maintainers are:

  • Set the direction for their groups and agree on a high-level roadmap to follow that direction.
  • Discuss, form consensus and (re)write the technical charter in line with that direction, outlining the technical challenges to overcome from the current state and previous direction.
  • In technical discussions, defend the charter, not their personal opinions.
  • Challenge and expect to be challenged on changing the charter, but accept when maintainer consensus is against them.

These people will be responsible for guiding the re-writing of the technical charter for their groups, design interfaces with other groups and decide on the future of the project around their areas. This is not about code style or which attributes to add, but about how dialects fit together, what is the common infrastructure necessary and how other projects (especially LLVM hosted ones) tie into the MLIR story.

They will also not be writing it alone, but guiding the discussions and reviewing the PRs that will change the documents, submitted by the whole community. They will set the vision and charter of the whole project (and its parts), in unison with the dialect directions and the projects that use them.

Dialect/Code Maintainers

Action: Validate and persist the existing dialect and code maintainers into the new ownership model.

These are the people currently working on the dialects and parts of the core code, and should be making decisions based on the general charter. If a dialect cannot work with a high-level charter defined above, then changing the dialect or the high-level charter are equally possible outcomes.

After we agree on the governance model, we need to go through the list of current dialect owners and make sure they’re still active and each dialect is being used by a sizable portion of the community, and avoid incomplete dialects upstream without a clear roadmap.

Escalation Procedure

In other areas of LLVM, the escalation procedure is to involve top-level maintainers. In the same way, dialect / code maintainers can escalate concerns that did not reach local consensus to the high-level group maintainers where their dialects reside.

However, due to the non-hierarchical nature of the MLIR groups defined above, lack of consensus in one group should not be required to appeal to a single top-level maintainer for the whole project. This would violate the basic principle of multi-governance stated above.

We propose to involve all other high-level maintainers from the other groups, who can choose to participate or not. This still limits the number of people that need to be involved to just those who have already committed to maintainership, while allowing any group (including core) to ask for help beyond their own peers.

As a last resort, when we still can’t make decisions after involving all MLIR maintainers, we can rely on the area teams and the governance model for conflict resolution.

@mehdi_amini @River707 @Mogball @Groverkss @matthias-springer @qed @dcaballe @kuhar @bcardosolopes @jeanPerier @javedabsar

21 Likes

Thank you for driving this forward everyone, I’m excited for MLIR’s approach to improve here!

-Chris

1 Like

This looks like an excellent direction. I saw the polynomial dialect grouped in core. I suspect I am the only contributor with an interest (it’s been mostly crickets since my initial work), and if there are no objections I’d be open to migrating it out of MLIR to my downstream project. Perhaps eventually if there’s a nice mechanism to do so, it could exist as a standalone repository for those interested to include in their own projects as desired.

Thank you for this! I’m not intimately familiar with MLIR, but this sounds like a very sensible way to organize and resolve decisions.

Thank you for the hard work, Renato. I’m particularly happy to see the groupings and an approach to maintainers starting to emerge for the mature parts of the project! This will stand to help a lot of people use and contribute to these libraries with confidence.

I have followed the progress of this proposal and recognize the significant amount of work involved. Thank you to everyone who contributed.

Currently, I am a downstream user and less involved in the contentious areas of MLIR. Nonetheless, I deeply care about the core infrastructure, including low-level dialects, and their interaction. I also remember (sometimes fondly) many debates over aspects of mlir and dialects; I am well aware of the challenges.

I believe charters are key. Refining or defining technical charters for various components within the mlir project will aid in streamlining decision processes. More guidance for decision-making will have a substantial impact. At the same time, it will be challenging to develop these guiding principles, as it involves identifying and resolving current and potential future conflicts. I see this as the main task ahead.

The governance proposal, which builds on established LLVM processes, appears reasonable. If its structure assists in defining charters, it will be beneficial. However, if we fail to articulate charters, I expect little will be gained.

The main criteria here is to represent a group that has concrete roadmaps for implementing upstream technologies and can define, articulate and defend MLIR’s core principles on design decisions and when resolving contentious issues in a way that is acceptable to the community and its values.

For areas under active development, it is reasonable to prioritize the input of contributors. For the core infrastructure and mature dialects, it is important to also consider the perspectives of downstream users who rely on their stability. Admittedly, that is a somewhat selfish position.

3 Likes

Thanks everyone for the support, I think we have strong agreement on the core points of the proposal and we can always hash out the side points in due time.

Now, as next steps, I propose we start forming the high-level maintainer groups. Remember that dialect design and low-level technical decisions does not change with this group. That is for the charter and general direction. We’re not (yet) selecting dialect and code maintainers, which remain unchanged.

We need to focus on two key points for nominations:

  1. Those who want to participate in high-level discussions, and
  2. Those who have a vested interest in the future of MLIR upstream and its reach downstream.

I volunteer to lead the assembling of the Tensor Compiler group, searching for active contributors that want to look beyond their contributions and the effect of our collective work into the other groups. If people are happy with it, I can start a new thread to collect other volunteers to form the group.

I don’t want to volunteer other people for the other groups, so I’ll leave it open. We can also start with the tensor group and see what works (dog-food), learn and adapt to the next one.

+1 - You put a lot of miles in on the governance proposal, and I personally trust you to take the next step with forming the TC group. Hopefully there’s not too much controversy and we can succeed with this kind of optimistic lock and you running the process. If not, we can always fallback to the area team once it forms up.

1 Like

Thank you so much for putting this together, @rengolin. This was heroic, and this is an important step forward for the project.

I am happy to continue helping with MLIR core and some of the dialects. IMO the current grouping of dialects along usage domains is kind of whacky and I would personally like to see a grouping that more reflects the taxonomy of the dialects, but I understand we will iterate on this.

1 Like

Thanks for iterating on this, as others I recognize the effort needed for this. I’m just back from vacation and catching up with all this, it takes time to page-in everything.

MLIR’s growth and widespread use makes it important to ensure a well grounded support. Also, we have operated since the beginning with a small team of people working closely together and understanding a lot of implicit conventions, it does not scale and making more of it explicit is important. At the same time, we should recognize the success of the existing ethos and associated dynamic in the project and make sure we consolidate and build upon these, instead of a blank-page disrupting everything. I assume this is the intent behind the initiative and we should be able to converge in this direction…

In this spirit, there are quite a few things that are left unclear to me here at the moment. I would start with an important question which is likely more important than any mechanism put in place here: “what is the role/charter/goals of MLIR as a project?”. I suspect many conflicts can arise from different assumptions behind this question, hence why it seems key to me to start here.
From there we will be able to more precisely define the relationship between these “sub groups” and the project itself, and for example position how the “tensor compiler” grouping belongs to the project.

At the moment, there is some amount of fuzziness in my mind about what’s written vs I (or anyone) can interpret what’s written. For example you wrote:

the groups will be able to be more specific in their areas, and perhaps even take different design decisions on the non-overlapping areas of the code

This is pretty open right now, we should clarify the scope of the expected divergence intended here, and the amount of autonomy a subgroup has with respect to the project (and this loops back to my earlier question about “belonging to the MLIR project”).
To provide a very concrete example, we have a Developer Guide which codified the existing practices in the project (we also have more existing practices which are not all documented by the way), I believe that belonging to the MLIR project implies following the common practices, and I wouldn’t expect a sub-group to decide to replace FileCheck testing with gtest, or even just not having unit test coverage for example! (I worked with people in the past who believed that code reviews and writing tests was reducing their velocity).

Stemming from this comes some of the “value system” used to resolve conflicts: in order to build a true open and healthy community project, the MLIR (and I believe LLVM) approach has been to try to reason from first principles and drive the consensus based on rational arguments. This is in general anchored into the system invariants: MLIR is a complex system which only works well and consistently because of this.
Working from discussion and arguments grounded in principles and system invariants is also greatly reducing anchoring to personal preference, and reduces the weight of opinions presented without being substantiated. Another strong principle of MLIR has been to preserve the interest of all the users (including unknown ones) by avoiding taking shortcuts based on one or two current downstream needs. Sticking to the kind of principled design arguments in the discussion ensured that we have a system that continues to be fit for many different users and downstream needs and interests. Without being able to discuss changes from their technical merits, it ends up being driven by personality based arguments, or hitting conflicts because of immediate interests serving only some specific downstream projects.

It is important for MLIR to cater to its existing users, however an important balance we navigated (and successfully so I believe) has been to also keep overarching goals and long term support in mind: that implies a mindset where we think about our mission of serving the general compiler industry including future users and use-cases. Catering to the future users requires to continue to actively evolving the project without overly anchoring it to incumbent downstream projects.
This is also one of the reason why LLVM does not have stable APIs, and a key aspect of keeping the project evolving.

We should ensure that all this is reinforced and put first and foremost alongside the consensus seeking approach we’ve been following building the project, in order to avoid a situation where the governance mechanisms become levers for decisions to be rushed through bypassing a phase of principled design arguments.

All of these aspects seem pretty important to be able to continue to consider all of this “part of MLIR”, instead of components that diverge from the MLIR project. At some point there may be a case for some of these components to instead just live downstream (or in an incubator project, like CIRCT or torch-mlir or mlir-tcp…).

Beyond the high-level points above, looking at the current split proposal, the “Tensor” group seems reasonable to me.
I don’t understand quite well the core vs low-level though, and we should iterate on refining this (or possibly get started with only splitting Tensor as a first step to unblock progress!).
In particular when LLVM (which is the main “low-level” aspect to me) remains in core, while I see things like ptr, cf, scf (and arith to some extent) to be more fundamentally “core” than a lot of the other things left there.

6 Likes

I resonate a lot with Mehdi’s sentiment here, and he did a much better job putting it into words than I have.

MLIR is great and successful because we invest a lot into designing each component to be great and make sense on its own, rather than funneling them down a handful of concrete use cases. This is why downstream users even with diverse domains can pick up MLIR and hack together a functioning compiler with ease. Focusing on the components mean downstream users have the luxury to not have to exert their use cases and trust that upstream will be managed mostly properly. I hope that focusing on making each component great is something that we will hammer out with the technical charters.

I don’t have any issues with upstream components diverging or being redundant with each other. I think there should be room to experiment, given that there isn’t one correct way to “build an MLIR compiler”.

Another lens to put on this: cluster by maturity. The heuristic I use for this involves working backwards from a straw man of what the roadmap looks like. If the roadmap is dominated by things that are needed to consider the component “complete” then that is a pretty good indication that it is closer to the periphery. Even if we collapsed core and low level together, I think we really still want to clearly mark and advertise what pieces are mature or immature. This comes back to a core point of recurring feedback about needing more visibility into where pieces are on the maturity curve. Further, you end up governing mature components differently from immature ones.

My primary goal with engaging in the governance proposal is to make sure that the mature, lowest level parts of the project are clearly identified and held to a governance and organization approach that sets them up for long term success.

I think that the reason it is easy to see the tensor domain as a good first thing to consider separately is that if we were to look at the roadmap of what it would take to complete to a high level of maturity, there are a lot of fundamental design and implementation issues involved. This is pretty different than what the core components look like, and you run such projects differently (even up to what Jeff is saying in terms of supporting/encouraging different approaches, etc).

I’m personally less concerned that we parse the precise difference between core and low level, so much as I am that we acknowledge what level of design and implementation maturity the different parts have achieved. And I care about that because that is how you ensure that the more mature things stay that way and the less mature things progress towards that state (or move out, get deprecated, etc as appropriate), while letting everyone who engages with the project know how to judge their investments. Currently it requires a high level of sophistication to engage with the project and try to discern this.

My two cents.

2 Likes

Thanks Mehdi, your feed back is invaluable. I have some small disagreements below, but most of it is theoretical. In practice I think we’re on the same page.

Exactly. This is very much the intent. Slow convergence, not disruption.

I don’t think you can answer that question in a way that resonates with the whole community. There are more than one answers there, so we take lessons from LLVM and Clang (which also have the same problem). This is what we’re trying to do.

But Clang and LLVM had different paths in that soul searching. MLIR is more complex and disconnected than either of them, so many of us feel we need different paths for different areas. The tensor compiler is one that has most tension and is looking for its own path. But we don’t want to break the rest of MLIR, thus the separate working group.

That fuzziness is intentional. We will never have a clear picture, both because it changes with time and with who you ask. When you accept it will always be fuzzy, it frees you to do small progress instead of waiting for the clarity to come to change anything.

As we seem to agree above: Slow convergence, not disruption.

Absolutely! This would be terribly disruptive, and I’d never support such proposals. Code review, test coverage, design discussions, infrastructure are fundamental pieces of the software development puzzle.

As a great mentor of mine used to say: Less haste, more speed.

Here, I disagree. “Value systems” change with people, place, culture. In the past, LLVM has had attempts at pushing a particular value system based on the dominant company at the time. It backfires and creates conflict, people leave, the project suffers.

Also, “first principles” is a reasonable approach in research, but a terrible path in production environments. LLVM has since the beginning focused on “what works first”, and that’s the source of so many failed long term refactorings (pass manager anyone?).

MLIR may have started from first principles, but once it merged with LLVM, and became a key piece of the upstream story, it must adapt to the realities of the LLVM project: a lot of different people use it for a lot of different things. If we start rejecting those points of view, LLVM loses its value.

Again, this may have been the original design, but we can’t maintain an upstream infrastructure to unknown users and unknown requirements. As the great D. Knuth once said: premature optimization is the root of all evil.

We now have production systems with real restrictions and use cases that are being held back by unknown requirements from unknown users. Any product manager that sees this will eventually conclude a fork is the only solution.

Open Source history is filled with invariant positions leading to forks (EGCS anyone?), which is much more disruptive than the change they were proposing in the first place.

The feedback I get from almost all downstream users I interact with is exactly the opposite. It’s precisely that “first principles” argumentation that pushes them away.

I wholeheartedly agree with the sentiment. But consensus is rarely unanimous.

More important than some notion of total agreement is the sentiment that the project health is more important than our biases, and that sometimes we let go of something we believe in because the majority of the rest disagrees with us.

When you disagree with some change that happened, you may see it as “rushed”, while the time it took me to convince the rest and get it through may seem like it was “blocked”. Same fact, different points of view, very different conclusions.

+1000

I’m happy to hear this. And I want to make clear this won’t be a split, or change in quality or less focused on all users. @Mogball’s point on downstream experimentation is very important. That’s where it should be, but we need upstream to allow that kind of experimentation, and right now we don’t have that infra fine tuned: extending existing dialects or types has high maintenance cost, often more so than duplicating entire dialects downstream.

It’s not clear what’s the best split, we should definitely iterate. I wanted to be least disruptive and only propose a tensor group, but at the same time protect the “core” core from the changes that I see coming into arith and scf (due to languages, hardware and some new tensor requirements).

My aim is to make the boundaries intentionally fuzzy (ex. vector between tensor and low-level, cf between core and low-level, etc) and let the state coalesce on its own pace, annealing style.

But in practice, I agree with most of what you said: no disruption, evolution, iterate on boundaries, mostly about technical charter and decision making and less about software practices that make LLVM a great project to work on.

We had a similar discussion in our group. We feel the same way, that maturity is a key dimension in that choice. But how you use it to bundle is not clear.

Putting all mature dialects and infrastructure into a single bucket works because there won’t be a lot of changes there, but it’s a lot harder to bundle (or even spread) the immature ones into groups.

If we want the people “who care about a particular area” to bring their views and ideas, we need to bundle them by “usage” or “domain” instead, and pay the price of immaturity of some of those dialects.

+1 to many aspects of Mehdi’s post. Let me emphasise a few points.

I fully agree, process alone will not solve this. We as a community need to form an opinion on the goals and agenda here.

Long term support and overarching goals is also helping current downstream users that are less involved upstream but rely on a stable infrastructure. If MLIR’s goal were to be a stable infrastructure to build on, this would be a crucial aspect.

In this spirit, it would be more useful to slice components by their maturity instead of using core as a grab-all for what is left. I understand that slicing out tensor has a higher priority for some but I personally would care more about identifying the solid core.

FWIW, I agree we can start defining the role/charter/agenda for the tensor component (and thank you Renato for offering to put in the effort). However, I would be concerned if the tensor component deviates from current practices until we have an understanding of what the ground rules for the MLIR project are and where we want to take things as a whole.

2 Likes

Downstream users have the responsibility to argue why their proposals are good for upstream, not just good for them. This is especially important for more mature components with a larger, if silent, user base. The way to do this is to start from first principles, understanding the role of the component being changed.

This isn’t just good OSS management, but good technology management in general. Any infra owner in a large company with many users has faced numerous “Please add X for our Y use case” requests. How much one is willing to buckle to user requests is typically a function of said company’s engineering culture. The only thing that changes with OSS is you can’t enumerate all the users.

Typically in these conflicting scenarios, one requires technical leadership to come in and say “We can’t give you X, but we can provide Z and you just need to change a few things in your code”. It’s reasonable to move both sides of the equation here – upstream components and downstream projects – to achieve the desired goal. What I do agree is lacking in upstream MLIR is such technical leadership. Usually, proposals that are shot down are simply left like that without much meaningful follow-up technical discussions. I miss the old ODMs! I support the appointment of more dedicated maintainers if that will help here.

Contributions to LLVM proper are viewed through the same lens: why is this good for LLVM? Arguing for changes based on “I need it downstream” never ends well. In fact, I think MLIR is more permissive than LLVM in this regard. Well-motivated changes to MLIR are accepted even if the only uses upstream are a few tests. LLVM very much prefers that everything lives upstream.

I don’t think there is anything wrong with forking. MLIR is practically designed for that – no one forks the core infrastructure (albeit extremely rarely), new dialects and passes are extremely cheap. I am also not opposed to forking/cloning dialects upstream either, e.g. making an arith2 dialect that is stricter about undefined behaviour, although that naturally begs the discussion about whether modding arith is the better way forward.

In my experience, the greatest difficulties with working upstream have nothing to do with upstream but with how the downstream company views and rewards OSS work. I think this extends to the wider community as well: the de-facto maintainers are less interested in organizing and having the hard design discussions to move components forward for everyone, because it consumes time and brain cycles. Downstream users are more interesting in getting what they need and moving on. Between forking and trying to hijack upstream, I think forking should be the way to go.

5 Likes

I think my reply was misinterpreted beyond recognition of my original intention. Not just you, many people told me in private they were confused by my post. (so, clearly my fault).

Let me be clear: I agree with Mehdi (and you). I am not saying we shouldn’t look at core principles, or that we shouldn’t try to find common values. I am saying different people have different values and different principles.

We may want to be discussing first principles, but often we’re just discussing our own principles. It’s hard to know the difference when we disagree on a technical level.

That’s why I said I believe it’s more important to know when you’re outnumbered and concede, even if you don’t agree with the final result.

Rereading my post, I think I know what went wrong. It can be read as if I’m defending a downstream dominance over upstream stability, because it seems I disagree with the “first principle” argument and make a comment on forks and downstream dependence. Let me assure you, this is not the case. I was trying to warn about those effects, not defend them.

If the past 16 years of my work in LLVM defending the upstream story isn’t enough, I’ll try to provide a few quick points that I hope will help clarify my intentions. (my brain works better with bullet points, sorry).

  1. Open source projects are only successful when they have a strong upstream story. There’s enough evidence out there (and here) to back that up, so I won’t push further.
  2. But they’re also only as strong as their communities, and these are in bulk corporations (even Gnu and Linux). So it’s a fine balance to be reached.
  3. Successful open source projects have strong technical charters and LLVM is no different. But MLIR has increasingly divergent technical directions, and that’s what started the whole reorg.
  4. My only goal is to unite those points of view, not to push one over the other. This is why I’m trying to change the language that we use to communicate across the divide. Maybe that’s ill advised, and I apologise.

The bad feedback I get is not about the technical arguments or decisions, which there’s a lot more agreement than disagreement, but how the arguments are being portrayed. If we can move from individual bias loaded terms like “first principles” and “core values” and start using collective terms like “technical charter” and “forming consensus”, we’ll begin to heal the argumentation stress and go back talking technical.

In summary, my view is that “technical charters” are built from “first principles”, but from a collective agreement that has reached consensus, instead of a single person speaking on a single post in the forum.

I hope this helps clarify my position, and I sincerely apologise for the unintentional curved ball. My brain works in mysterious ways… :cry:

1 Like

I agree that it is much more productive to argue why a change makes sense in the context of a larger project and its direction. I felt like this was the case when I did the majority of my work on LLVM ‘middle-end’, with the idea that we do what’s best for making opt a better optimizer first and clang a better compiler second, with downstream compilers being down the list of priorities (but definitely not disregarded!).

Without a single MLIR optimizer, and let alone a compiler, this is often much harder because we have to argue why a change makes sense in almost complete isolation. I’d expect that agreeing on charter for various subcomponents will help guide these discussions, and I think we’ve already seen that work out in some of the attempts to agree on the general direction for dialects, e.g., poison semantics for arith or stacked vector representations.

+100

1 Like

I think this falls into the category of the best deals being the deals you can get done. There were many variants of splits discussed and proposed over the last months, and while I wasn’t a party to all discussions, I saw many of them.

I’m hopefully not overstepping by saying that I think Renato’s proposal of breaking down more represents a starting point than an ending point: it was pretty clear that there was a big cluster in the tensor domain and a big cluster in core. And it was also a general recurring theme that there is also a lot of miscellaneous parts of various levels of maturity that could not be conclusively decided in smaller groups.

There were various ways to think about categorizing the middle. What got written down was basically the sentiment biasing to put the “extra stuff” in core to start with would at least ensure that it was subject to a higher standard where people would be motivated to further organize. This seemed like it might decrease the chance of having orphans that we just lose track of due to lack of critical mass.

What I’m suggesting is more of a mechanism for doing the necessary work of doing additional cleanup and categorization on core: start with the lens of maturity and let that guide.

I think there are a few steps of dividing and conquering involved and the proposal is a starting point of that which also lays down some principles. I agree with you that I consider core to be the higher priority to ensure is on a good path. But it also has both more varied components and stakeholders – needing us to do some work to agree on the goals and organization first.

It’s good that we have people who care about both clusters as it means we can proceed in parallel.

I don’t foresee any near term big break with tradition here. The main thing I think the tensor group needs in the short term in order to better organize its world is a charter and discussion of it as a cohesive entity. That will allow the folks involved with that to start to address some long-standing code organization issues and separation of concerns. There is a lot of deferred maintenance of these dialects that would be aided by having a viewpoint of the whole vs a bunch of disconnected pieces.

In the longer term, if it is to be a viable project, the tensor domain needs a more robust testing and integration approach. But that’s a topic for quite a bit later: I don’t expect any desired changes there are in any danger of preceeding the rest of the discussion you are looking to have first. I certainly don’t have people waiting for the ink to dry on an agreement looking to tackle something like this. It is more that I’ve been feeling the pain of this for years and know it is something we would want to invest in at some point.