We are proposing to add a new ML dialect, TCP, in MLIR. TCP’s mission is to be a mid-level dialect in MLIR that enables transformations that complement those in existing dialects, by supporting ops at a high-level of abstraction.
There have been extensive discussions about the technical and organizational aspects of this dialect in this thread in the MLIR forum. There is also significant interest from several different stakeholders to add this new dialect in MLIR.
A few different alternatives were considered to bootstrap this work, including a) in MLIR tree, b) in TorchMLIR repo, c) as an LLVM incubator, d) in a feature branch in one of the existing repositories. Since this is only a proposal and we do not have anything tangible at this point, an incubator seems like the best option to bootstrap this work. For a more elaborate reasoning behind these different alternatives, see these comments: 1, 2, 3, 4. Given these considerations, we would like to request to incubate this dialect under LLVM.
TCP will be developed by the community including us at Cruise following the LLVM developer policies. We’ll aim to develop and review it incrementally in the incubator repository to minimize the need to gate graduation on a bulk design / code review.
There is broad agreement in the community that TCP should be in the MLIR tree soon. We believe that we are well aligned to work towards that goal.
Raghavan (on behalf of the ML Compiler Team at Cruise)
+1 from me. I like that the backing proposal outlines some milestones and checkpoints where we can evaluate and make more decisions. So far, I think we’ve mostly incubated more open ended things with a less clear path to graduation. I think it will be a good experiment to see if projects like this can use the incubator as a viable means to implement a new category of functionality in a community-centered way prior to decisions about when/how it lands in the monorepo. Aside from getting the work done, it will perhaps be informative for ways to do this kind of thing in the future.
Probably it would be nice to have these evaluation/graduation estimated points in the incubator proposal process so that we make more clear that we have to do an evaluation with an outcome at some points in timeline (e.g. graduation stages?).
But I don’t think that the current incubator policy has something like this.
Makes sense, I think we should continually revise processes and policies as we get experience with them.
In the case of graduation, I think it can make sense for a project to define goals and set objectives in order to graduate, but the ultimate decision is owned by the LLVM community based on what the specific proposal is.
As one example of this, the CIRCT community recently proposed graduating into the monorepo and was (correctly IMO) rebuffed with something along the lines of “please propose graduating specific parts of CIRCT, not the whole thing”. This is a very rationale thing given that CIRCT contains code of different levels of maturity and battle-proven-ness etc.
There is no “greenlight” step but a judgment call with respect to consensus. In a non controversial case like this (so far), I would draw the line at “has everyone in the community had a chance to comment”. Between this post and the mention in the weekly newsletter (last 24h), I think we can reasonably say this week that the condition is met.
Since the proposer had already raised a thread with llvm-admin before being redirected to make a community wide RFC, if it were me, I would follow up on that in the next day or two and point to this RFC and request to proceed. Creating an incubator is not a permanent thing and should strong opposition emerge (which seems unlikely at this point), it can always be reversed. There is also usually some latency on the llvm-admin side.
This seem far too agressive to me: I would consider 1 week to be the absolute minimum for any consultation of the community (whether it is an RFC or any kind polling), likely more for important decision. Not everyone is full-time working on LLVM, people have other priorities, vacations, etc.
On the proposal:
I still have the same concerns I raised in the original RFC between the stated proposal here and what is suitable for an incubator.
That is: an incubator seems a bit large for a proposal that claims to be limited to an MLIR dialect, and the “soon” target proposed for integration in MLIR is a bit blurry and not clearly aligned with how I see the scope of an incubator.
I feel that the line set here for inclusion in MLIR is unclear and that without a clear agreement on the graduation criteria we’re just gonna ended up with more mis-alignement and/or some “moving goalposts” all around.
This also conflicts with the MLIR policy on adding new dialects, and that concerns me even more.
Having recorded my grief here, I don’t want to hinder progress more, so I’ll just (unhappily) yield here.
I request though that you wait at minima a week before considering such proposal having reached the entire community and given at least a chance for most people to chime in.
Not to nit pick, but there were quite a few opinions (many of them strong) that were against that proposal. Iirc, this was your viewpoint, and the matter seems in no way concluded. You obviously have a lot of history and insight here, and you may be right, but I’m not sure that the analysis you are applying is understood broadly.
I think we need to (apart from this specific discussion) directly have a real discussion about contribution model, process and organization for the mlir derived projects in the umbrella. Even if that just clarifies the policy complaint that was brought up on the parent thread to this one and makes guidance more concrete and non contradictory, it seems important to do.
I’ll second this. Maybe we should have a separate discussion on the contribution model for MLIR using this proposal as an example, but without blocking the progress of the proposal itself. Specifically, the original discussion had concerns about this proposal extending the project charter, but we don’t really have one.
On a more procedural matter, I feel that our current model is more suited for indivdiual contributors and doesn’t serve well open groups of contributors who would like to publicly iterate on a larger component. If this is something of interest, we can fork the discussion into a separate thread.
Strong +1. Especially when we get contributors forming on a new idea across company lines, the LLVM Foundation is often the only place that such collaboration can happen. Supporting it better seems like it would be a really good thing, but I don’t have any notions of how in my mind.
SGTM in general. Graduation criteria is good one to discuss and Mehdi and Alex have very valid points there. Whether it is “BERT via TOSA to TCP to LinAlg works end to end” or what not is TBD (but showing the fit and utility as described in dialect contribution doc along with some amount of usability sounds appealing [utility showing the delta of adoption may be difficult initially, and perhaps measurement is not quantified enough; up for debate potentially in different thread or ODM]).
My mental model here is similar to LLVM back ends, they start as collaboration between folks (within one company or outside) and gets to working stage with benefit to section of community, they are proposed as experimental backend, gets submitted for review piece by piece (where review is actually a further round of review), and post some period migrates to default backend potentially. That also influences my thinking above as can be seen. Now is that too heavy for a dialect is open question and probably function of the dialect and it’s requirements (testing, infra etc). But we do have folks proposing contributions that they have been working on for months to years and where they’ve built end to end systems and can show impact of their design decisions and be evaluated on those, so I don’t see it as exceptional in general. Also for many of the levels there are multiple answers and being able to contrast and take the best of each is good rather than locking in.
But to Alex’s point I see the incubator here as place where group of folks can openly collaboration as step 1 of the above. If it’s a week then repo and in parallel graduation criteria can be discussed while folks are actively developing/reviewing/testing then the overhead and startup latency doesn’t seem too high (e.g., folks already contributing on design doc, code can be contributed on in week, graduation criteria not blocking technical work). I also feel there are benefits to starting in incubator and being able to run. The cost seems low and ambiguity to downstream also removed. Visibility wise, weekly updates and perhaps snippet at open design meeting could help too to ensure folks who want to stay informed but can’t add one more project to list are.
I would welcome a broader discussion about collaboration models and how we view the monorepo. I am extremely concerned with the amount of stuff that is put into the MLIR repository (and thus the LLVM monorepo) that is not up the the standards of the rest of the LLVM project. MLIR has a lot of older code that was grandfathered in for various reasons, but when it joined the LLVM project it didn’t fully adopt the LLVM contribution model (which is one of the requirements for joining the project).
Mistakes will always be made, but there is clear lack of alignment in the community and that is something we need to resolve. To be clear, I’m saying that MLIR shouldn’t be special here: we should either change the MLIR policies to align with LLVM or we should change the LLVM policies to allow more early and experimental stuff in tree like MLIR. We shouldn’t have two different policies.
This is a growing concern of mine, perhaps we can have a discussion/bof/etc about this at the LLVM Dev Mtg this fall.
I am planning to be there and would welcome a discussion.
To be inclusive, we likely also need to explicitly plan to also discuss in other venues, take notes, etc: I’ve heard from many folks who were planning to travel to the Dev Mtg but are facing travel restrictions, etc. I think we should leverage the dev meeting for the opportunity it provides but plan for an extended conversation as well.
My personal opinion is that the mistakes may be more fundamental than a handful of misplaced coding experiments. As was often cited at the inception of the project: “MLIR is a social experiment”. It created a big playground and an open ended set of tools with a very low bar to differentiate while also making technical decisions which biased it towards creating monoliths. And then that was put in a monorepo which is supposed to be biased towards stable, long term projects, while creating a fairly steep gradient for the advantage conferred if in tree. Add in a few other social forces that piled up along the way, which further eroded the development of a vibrant out of tree ecosystem which could effectively collaborate and interop… and an outside observer would not be remiss to conclude that we created a perfect storm leading us to just this discussion.
Let’s figure out how to call the “social experiment” phase done and set this thing on a stable, long term path. And let’s also figure out how to sort the investments that have been made so far. There are some orphans in tree for sure, but behind most of those directories are contributors who I have deeply enjoyed getting to know and work with – and am grateful for the participation and care that they have put in to our little project and community. In large part, we’re having this discussion because the experiment succeeded, possibly more than anticipated. And that is a fundamentally better place to be than the alternative: it didn’t have to end up that way. I’m very interested in fixing some of the project formation bugs which got us here and doing that in a way that respects the investments made while setting us up for the future.
(Chris: I think I’m agreeing with you but I also want us to accept some responsibility for the initial conditions we set up, and I want to be explicit about how much I value the work that the community has put in to getting things to this point. I may be overcorrecting a bit, but these kind of conversations can degenerate quickly. I want to be clear that I consider this a systems problem to fix over an appropriate period of time, not something personal/targeted/judgment-of-specific-work/etc – because it can be so easily taken that way, and once those feelings engage, we lose our objectivity and ability to act with purpose)
This is something I’ve heard from you on multiple occasions: Perhaps you could go the next step and help make this actionable? If there are things in MLIR that really don’t follow the policy, then can you bring them up (specifically) with the goal of making the code better and without making personal judgments, rather than leaving it as a vague critique?
FWIW, I looked through all of the developer policies and couldn’t find anything about ‘code standards’ as a requirement. There is significant mention of a ‘active communities’ and ‘nightly builds that don’t break’ and ‘promptly fixing bugs’ and ‘aligned with the needs of a compiler’… In particular “stability” is discussed primarily in terms of “not breaking the tests/nighlty builds”, and not in terms of “stuff never changes”. Maybe I’m missing something, or you perhaps have additional standards in mind and this is what you want to discuss?
I personally think that there "either* needs to be space for experimentation (such as within the scope of peripheral code, and not compiled by default features or experimental backends/dialects) OR there needs to be an effective process by which experimentation can happen outside and later be merged as larger projects. Without one of these options, then only incremental changes can/will happen. Clearly we need to balance stability and change, in order to enable real world ‘products’ while allowing continued to evolution and growth.