Sigh… Well, at the risk of sounding like the wacko in the corner, I’ll throw this out there… There’s a somewhat more radical solution to these issues, which would be to just check it all into the same repo. If the Clang/LLVM community is not accepting of end-to-end ML flows, then we simply create a new monorepo where the stuff can live together. I feel sure that this is the place where everyone is going to throw up their hands and explain why that possibly couldn’t work, but if we get to the point where these projects are regularly building together, will it seem so radical?
Yes that’s been our experience trying to fit those in with TOSA. Ultimately we have the support infrastructure placed sidecar on Developer Resources - ML PLatform and that has been reasonably tractable. For real e2e production level development - a combination of compiler and hardware dev as we’ve done around TOSA, all of these are loadbearing and interdependent early.
Further, to gratuitously summarize a very long history of productive conversation with @stellaraccident in particular, but also with @jpienaar and others, we benefited a lot from taking time to figure out TOSA and quantifatively consider things that fit and didn’t quite fit. It wasn’t so much the input in the form of suggestions for more ops, as much as advice suggesting what had been previously tried and didn’t quite work, that really benefited our decision-making on op constructs here.
For example, we considered and then decided against an explicit broadcast op. Tl;dr - there were multiple potential ways to do it, all with tradeoffs, and fundamentally a poorly designed construct becomes a lifetime of technical debt when backward compatibility requirements are applied in future.
I don’t have a strong signal on the ability to incrementally build out an IR at this abstraction. For TOSA, the broad contours of ‘what we want’ was generally clear early. It took conscious effort to resist the temptation to add more than scoped . It worked out better to stay focused on the op/spec use case while keeping an active line of comms on compiler IR level input that was reasonable - e.g some very productive ongoing conversations with @jpienaar on eliminating shape resolution constraints that were fundamentally unnecessary.
I agree with this on both points. On the first point, the list of desired attributes for such a project are not new, and many folks have embarked on this. I don’t think that starting this in tree with a bunch of different folks pushing in pulling in different directions is likely to lead to a different result from previous projects. As with the discussion of TOSA upthread, I think you’ll find that many stakeholders will only care about the “acceleratable by their HW” subset of ML, and that subset is different across various HW architectures. It will be difficult to get consensus here.
I also agree with the second point. MLIR already has a lot of “advanced research quality” code in the main MLIR tree, and it is difficult to know the capabilities and limitations of that code. Much of this is ML focused (e.g. linalg) but other is more generic (affine etc). Splitting these out from MLIR core seems like it would help differentiate the pretty-battle-tested stuff like SCF from the more researchy and evolving pieces.
I think such a move would also help assuage the viewpoint/fear that “MLIR” is moving and breaking all the time. It is true that MLIR core does change, but most of the thrash are in more derived dialects. Splitting these into conceptually different things would help folks better understand how much stability and what sorts of breakages are to be expected over time.
+1. The LLVM Incubator process is specifically designed to support “exciting and in-development” projects that want collaboration across many organizations but want to be within the LLVM umbrella. It seems like a perfect fit for this sort of project.
Yeah I don’t think it’s wacko at all. The e2e pieces are critical. They’ll be utilized by different use cases in a la carte ways focused upon constructively building out production e2e stacks. This is an ongoing effort at Arm as well - there are multiple accelerator IPs in the picture that are quite dissimilar with TOSA as an accelerator profile but their e2e codegen constructs are not similar - substantially different microarchitectures and use cases being reasons.
However one thing that is urgent is the need for these constructs to be developed rapidly. If another monorepo enables that, then it would work just because it is aligned to production schedules. That is better than seeing the ecosystem exist in a permanently splintered state. The stable and broadly applicable pieces could also ‘graduate’ to the LLVM monorepo over time without much disruption.
Can we please stop with these loose words and characterizations? I find that I can categorize many things as “research” that I have a bias against, and I have tried to edit that from my vocabulary because it only creates unnecessary division and works against the goal of getting to higher ground. If we ever want to talk about it at an ODM, we have a design doc for how to evolve/disaggregate/advance this stuff specifically because we value having things put together well in the upstream repo and would like to be good citizens. We probably agree on most of it, but it is hard to align folks to want to work on it when facing off with an us/them categorization as the basis of the discussion. Aside from not meeting our bar for inclusiveness in the community, it is also just not pragmatic: it is unlikely to produce the desired result.
I do thank you for the feedback over the years – every bit of it has been taken and actioned, including really getting a handle on the experimental velocity, making sure to develop consensus on pieces outside first, and sending a lot of engineers down months of work to address points of concern and layering. And meanwhile, most of this code is running our production infra.
Regarding what is good vs isn’t, clearly you haven’t been fixing some of the egregious bugs in the algorithms for the more “battle tested parts” (I have found a lot more bugs and poorly conceived things in scf vs affine, for example)
Sorry to be direct – every time you approach this topic this way, it causes people to entrench and I can watch it become harder to meet the quality goals that I know you and I share. It is possible to disaggregate and deprivilege much of this earlier work, and that is a goal. It just takes time and investment (and trust).
Fair enough, I’m obviously not aiming to get folks to further entrench, I wasn’t aware that there was a lot of debate about this. My understanding is that just no one was interested in stepping up to do it.
I can see how words like “researchy” may have a negative connotation to folks, but I don’t intend that. What do you suggest? Maybe “in active development” or “core design still evolving” would be better?
Beyond terminology, do you disagree that there is a different in design stability between core MLIR and many of the dialects in tree?
I do actually: the core IR directory’s design varies at a very high rate. Subjectively, much higher than I would usually ascribe to a library in a mature state. But I tolerate, defend and advocate for it because we are early and we are in this together.
On the dialects that I directly have a stake in, I have been actively damping the design stability of linalg for quite some time, insisting on extracting principles and interfaces – and most of the pushback to recent additions has been from the core devs on that who identify that there is a better layering. In fact, this whole RFC (and the opportunity to direct it away from the core) surfaced because of pushback on that from multiple core linalg devs (from multiple companies), which basically amounted to: we’re actively taking things out of linalg and creating discrete layering and adding new, unrelated things goes against that. Agree with them or not, but both bufferization as an interface based, independent construct and the TilingInterface represented focused work to further extract and make common useful facets of the monolith. If we can disaggregate the “frontend parts” and just leave generic and some of the core transformations, we’ll be pretty far from where it started and can talk about what to do with the rest of it, having gotten it down to just the essence. I know there are other dialects that need attention, but I also know that a lot of ire is directed at linalg specifically, and that is a part I try to take responsibility for.
There are certainly other parts of the early work that are still getting attention. Just last week, I deleted the quant ops that had ended up only being used by TFLite, and that inadvertently cost one week of integrate heroics to untangle in Google’s open source products. We wouldn’t be engaged and doing that stuff if we didn’t share the values. It’s just that this stuff is very actively used and takes time. There are still too many things in builtin, and some of them make me cringe that they are there – but we are down to things that are heavily, heavily used and design stability cuts both ways: even if improvements, beyond a certain threshold, all change is bad. We’re trying to stay under the threshold.
I don’t want to argue about this stuff - largely because I know we mostly agree on much of it, but we aren’t just going to indiscriminately dismantle the things in the core repo that are being heavily used: we’re going to redirect new things to different venues when it makes sense, distill the existing core concepts out and try to refactor this stuff incrementally over time. It took years to put it in this state and it’ll take time to get to a different one. I would personally prefer that that does eventually end up as not just one super sized monorepo (honestly, I think in retrospect, it was a mistake to put most/all of MLIR in the monorepo, but that is my personal bias), but if we do that wrong by half measures, it is just going to make the situation worse… So step by step. And we need engineers engaged in that mission for that duration. Words matter a lot to that end.
Apologies if this has been considered & discarded already, but there might be an intermediate step that has the integration-advantages of a monorepo without many the costs of assuming that development model: have a shallow megarepo for e2e testing that pulls in everything (but the testing infra) as submodules.
Updating the submodule commit of a given component (e.g. in a PR to that e2e-repo) would then nicely show the impact on everything else, but not every commit of every component would need to be globally e2e-capable, and different components could choose their own update cadence within the e2e repo.
IMO, having an end-to-end ML compilation pipeline somewhere within LLVM and the right communication around it is important in the long term. I speak about MLIR quite a bit and, while I do insist on it being an infrastructure, the most common questions I get are “how do I try it?” and “is it faster than X on Y platform?” There is no easy answer to these, at least not at the same level as download/compile clang and run it on your code. This creates an adoption barrier and an impression that all of MLIR isn’t good enough. Torch-MLIR helped a bit with that, but it is still an incubator project with some associated immaturity.
We need a tested and supported “happy path”. Whether it exists because we add more dialects to the MLIR main directory or because we create a separate “mlcompiler” is secondary. If the proposed dialect goes into an incubator project, it would be nice to understand what is the strategy and criteria for discussing its graduation.
Two cents on the “researchy” qualification: I would really appreciate if we didn’t establish the association that research = low quality. At least if we want people to feel confident enough to keep proposing new things.
I believe we need better separation (remember [RFC] Restructuring of the MLIR repo ? I sent patches but never finished it to the point of landing the changes…) but I also strongly believe that:
For example: I want us to get to have a end-to-end TOSA compiler in-tree for CPU/GPU, with integration tests, etc.
This does not have to be the only way someone compiles TOSA (or other), but I don’t see why the fact that some people may want to do it differently or don’t want to work upstream should prevent an interested set of people to collaborate in-tree.
This is also my motivation for the repo restructuring: protect the core of the project while preserving the ability to collaborate on building one or multiple end-to-end story: modularity of MLIR should also allow to do this and reuse pieces / dialects / components in various end-to-end scheme.
And the removal from OpaqueAttr from MLIR Core (builtin dialect) is even more impactful than that: we’re still working on the fixes to adjust to that right now! That’s telling of MLIR Core “stability” somehow as well (and there are a few breaking thing that may happen in Core still).
I do agree with you that there needs a pragmatic split between what’s used by most versus the few, but I can also see the point on what (I think) @clattner meant about “research”: Some parts move a lot faster than others, and the reasons are not necessarily obvious from outside Google.
I only single-out Google because it’s the one that uses and works on MLIR the most (for obvious reasons), and while the needs and uses of MLIR in the multiple parts of Google infrastructure is visible (perhaps obvious) to other Google employees, it isn’t much so from outside.
Also, I want to make sure you know I don’t mean this as a “Google” thing. It’s a “whoever works on it the most” thing. In the past, with LLVM IR, this used to be the case from Apple, but MLIR was born inside Google and is still mostly developed there.
So, while moving the parts, splitting and joining dialects can make a lot of sense, pragmatically, to you and other core developers, it may not from outside, and may even break a lot of stuff for research or companies trying to make use of MLIR.
For that reason, I do agree with your statement that, separating the core from the highly movable parts makes a lot of sense. How that’s best split, I don’t know, but it will move the cost of supporting certain dialects to those that really need it. But this doesn’t make the problem (moving too fast for in-company reasons) go away, it only reduces the number of external users affected, per change.
And, as an outsider, I’d just like to remind folks that many of the arguments I see on this forum don’t make a lot of sense because I don’t know the context of all frameworks you guys use. Discussing them is important, but we need to make sure we don’t weigh them higher than upstream stability and usability, because that’s what makes LLVM successful.
I have seen many arguments in the MLIR section here that argues for putting all stuff inside just because it’s easier to work that way for the core group, but that’s what makes LLVM less attractive to the rest of the developers, and I really don’t think we should do that.
I don’t know in the case we are going to compensate all the political, social and technical discussions in a multi-stakeholder container, on the high level spec/dialect evolution, it could also help to achieve a more clear and shared evolution path on the downstream and fast changing components in MLIR.
Probably I don’t have a deep understanding of all the aspects but it is just my 2¢ contribution to the thread.
As you can see from our discussion in the thread you referenced, we (the XLA CPU/GPU team at Google) are also looking into defining an additional layer on top of linalg for similar reasons, I believe.
One additional requirement we are looking for is that we would like the operations to support destination passing style, as the operates better with the bufferization approach in MLIR Core. Is that a concern you have, as well? This is one of the reasons for us why TOSA/StableHLO level dialects are not a good fit at that layer.
Wrt. transformations, we are also interested in keeping it an open op-set and would longer term focus more on providing the right interfaces so that we do not have to worry about the specific dialect an operation comes from. One motivation for this “open ecosystem” strategy is that it allows different down stream users to be opinionated about operations where they need to be (e.g. having operations that map really well to specific hardware that are not generic enough for upstream/a shared repository).
Would such a share transformations/interfaces but allow divergence on opsets address your governance concerns?
First, let me be a bit negative… (it will get better, I promise)
This doesn’t scale, nor it aligns with the ethos of upstream open source projects. If TOSA is a dialect that is mandated by a standard body, and only that standard body is allowed to change its spec, then the implementation in LLVM is just a reflection of that.
The original proposal seem to be that we, the LLVM community, should drive this “platform agnostic high level tensor dialect” and as such, is in stark contradiction with having to go through a separate committee.
As a non-Google person, I have no idea what that would entail and cannot assess if this would satisfy my own needs of a similar dialect. This is marginally better than needing a separate committee (because you want to move governance to the community), but it would also come with a lot of baggage that Google won’t have to deal with (because it already fits your own internal projects).
This would be a lot of effort from very distant users, and quite likely won’t converge to a solution that everyone wants, but if it converges, it probably will get to the least cost to interested parties (not necessarily least cost overall, weighed by number of users).
Now, let me be a bit more positive, and more pragmatic…
I think the number of frameworks and solutions at such a high level for ML is a direct response to the “stuck in a rut” paper, where everyone wants (needs) to find their own solutions. My experience is that most of those solutions are similar (at least in intention), especially at such a high level.
So, taking inspiration (or even code dump) from existing large scale solutions is probably the quickest way to converge. Whether it’s TCP, TOSA, MHLO will be a matter of how close each are to an agreed “goal”.
But we should follow some simple guidelines, to avoid creating yet another “standard”:
The dialect should be generic, like linalg and not tied to a particular front-end or back-end
The governance should be within the LLVM community, and any spec written and kept inside the LLVM repo
Other (existing) implementations could (should?) in the future reimplement their specs using this dialect
This last point is an important design decision, to make sure we can support the operations and shapes we want (need) to. But it also shouldn’t be mandatory, in order to avoid the kind of bloat that got into HLO. An intersection of the existing needs, so to speak, would be preferable.
+1. My vote is to look at this as an IR, not an industry-wide standard. We’d have a “spec” but only in the sense that LLVM IR has a spec.
+0.75. Mostly agree, except I’d assume that a large scale code dump is off the table (i.e code will have to be reviewed & contributed incrementally), especially if we are able to do this work in-tree.
+1, this is why I suggested developing in-tree as well. IMO an end to end happy path, even if imperfect initially, will be a nucleation point for collaboration and simplify adoption.
I don’t have a strong opinion on this yet (maybe @raghavanr ) does. Do you happen to have a link to the entry point in MLIR Core’s bufferization machinery? We do care about bufferization and being able to reuse functionality in MLIR Core will definitely be a plus.
This kind of design will help us in general (we have internal dialects that will never be open sourced), but our broader goal is to work with the community and develop large parts of of our end to end flow upstream, as @raghavanr said in the initial email.
But yes, I absolutely agree with you, it cannot be a dump in the strictest sense. It’d have to be a long discussion and slow merge anyway.
My personal preference would be to steal from those projects so that the migration becomes basically a move of namespaces for most ops of most projects, instead of just copying the actual implementation and then having left overs all over the place.
I agree with a caveat. It is very valuable to have an implementation for an end 2 end compiler upstream, so that we have a shared implementation we can point to. Concepts can be discussed less abstractly that way. It will also help with new projects joining, as it would be a good way to showcase the general design philosophy. However, we need to make sure that it is not the only viable implementation, so that we can disagree on details where we have to.
@matthias-springer has been driving a lot of the implementation. It is what is referred to as one-shot/comprehensive here. We have pretty much converged on this approach for code generation of numerical kernels across various projects at Google. One reason why this was possible is that the approach is very extensible via interfaces, so we did not need to agree on all the details.
I also suspect that having some internal only portion is a common requirement. Maybe others can chime in to substantiate this?
Probably not encouraging, but you may be assuming too much about the visibility from the other side Google’s development process is pretty chaotic in general and does not suffer from an overly inferable plan, especially in this area. Especially with MLIR’s roots as an experiment that actively encouraged that chaos (in terms of “try a lot of things so long as it’s a dialect”), I do think we are still collectively struggling with how to move to the next phase. The core is getting closer to the ideal, imo, but still has a ways to go. Speaking for myself, I do put a lot of effort and leverage into making sure that the modern day developments we do upstream are discussed and moving us more towards the ideal as well. Feedback welcome on specifics on that.
I will admit to there sometimes being a certain reticence to talk more about the details of some of the plans to distill and repoint. As an example, we did write a design doc for how to better align linalg with upstream principles, and we have been directing things that fall out of that light cone to experimental repos or incremental development/proofs in the client projects before, sometimes permanently, other times to gain proof/traction first. But we didn’t talk about it at an ODM, largely because I didn’t have the energy to deal with the kneejerk words and judgments that often get cast to some of these battleground topics (and people). That’s an example of what I meant by the words and attitudes causing entrenchment of the very things we would all like to see improved.
Relatedly, I have noticed a bias in this community specifically to use the word “research” as a pejorative, and (in my judgment) as a way to categorize things that folks don’t agree with or would like to have seen done differently (or by different people). I have been smarting at that recently and trying to redirect to get at the specifics (so we can address them). In an unlucky coincidence, I had literally just done a re-read of Modular’s website for the first time in several months and reacted quite negatively to the us/them treatment that it presented this all in. When Chris showed up using some of the same words, I latched on to that and brought it into this discussion. Thanks for speaking up and trying to explain the viewpoint.
This would be very nice to have. Anyway to use MLIR out-of-the-box to codegen for some architectures would be great. Personally, I think this is probably something we should have out of the core MLIR tree, since we could have multiple such codegen paths.
I’m in a similar position to @sanjoyd on this. I believe we haven’t gotten to that yet. If this helps with bufferization in MLIR, then we could definitely benefit from it.