This sounds great to me Sanjoy thank you for driving this!
This sounds like a good plan to me. Thank you for driving it, and looking forward to working together on it!
Thank you @sanjoyd and @raghavanr for driving this, and thanks to this community for helping us converge so rapidly.
Going by these guidelines, we (Cruise) have sent a request to llvm-admin@lists.llvm.org
for setting up the llvm/mlir-tcp
repo.
When approved, the llvm-admin group can grant the new project:
- A new repository in the LLVM Github Organization - but not the LLVM monorepo.
- New mailing list, discourse forum, and/or discord chat hosted with other LLVM forums.
- Other infrastructure integration can be discussed on a case-by-case basis.
I will also follow-up on discord to request the comms channels be setup for #mlir-tcp
.
The guidelines mention that the project: Must be proposed through the LLVM RFC process, and have its addition approved by the LLVM community
.
I donāt think this has been done: this RFC was about adding a dialect to MLIR along the line of the MLIR policy here: Developer Guide - MLIR
I donāt believe this RFC is representative of what the LLVM community at large for adding an incubator project.
So: please start another thread in LLVM focused on adding an incubator project.
Not trying to argue but just clarify: this is an RFC that meets the Developer Guideās definition of such, and the conclusion/consensus was that an incubator project is called for. These guidelines were written prior to the consolidation onto discourse and I donāt think any thought has been put into what the appropriate category is for requesting such things. We should probably update the policy regardless to clarify:
- The category to post in (replacing the āmailing listā terminology)
- Whether an LLVM RFC which concludes that an incubator should be created is sufficient or whether a top-level RFC to create the incubator is called for
Sorry but I disagree with your āclarificationā: this clearly does not count as āapproved by the LLVM communityā to me, as this likely didnāt even reach the LLVM community in the first place. Of course we should update the policy, but in the meantime I donāt understand how one can claim to consult the āLLVM communityā that way: at best we may say that pending a category, there is no way to perform such an RFC, but it does not mean this meets the requirementsā¦
Moreover neither the title of the RFC nor the body calls for an incubator at all.
Iām happy though to poll the community outside of MLIR to see who is aware that an incubator project is proposed outside of the MLIR peopleā¦
In case it isnāt clear: I am opposed to consider a new ādialectā as a "projectā and I donāt consider the scope presented here to be suitable to be an incubator.
So here you have it since it didnāt seem clear to you: there is no consensus.
I only saw Chris arguing this direction, the conclusion looks to me like coming only from this. And Iām happy to write a rebuttal there, because I find the arguments dubious right now.
On the other hand many other contributors and/or maintainers of MLIR subsystems (@nicolasvasilache, @sanjoyd, Myself, for example) are considering this as a dialect under the MLIR guidelines.
Also it was spelled out before by @ftynse that
If the proposed dialect goes into an incubator project, it would be nice to understand what is the strategy and criteria for discussing its graduation.
This didnāt happen, and this is naturally part of an RFC that proposes an incubator project (which this RFC didnāt do).
That escalated quickly and I donāt understand the magnitude of your response. You will note that my requested action is to update the policy to conform with your read of the situation. I was trying to put myself in the shoes of the author when I re-read the policy, and it seemed to me that it would be a perfectly reasonable thing to read it and assume that an RFC had been raised to the LLVM community via this thread.
I also independently argued this direction, although with less directness. My reasons for doing so remain: I do not believe we have reached consensus on the design and I would like to see a concrete proposal prior to engaging in upstream development. Especially given the design scope that is likely to be traversed in the related areas to this over the coming months, I do believe that elaborating it out of tree would be helpful. I had originally considered that such POC work in torch-mlir would be fine, but I do agree with @sanjoydās desire that if going that way, somewhere neutral would be better.
I am perfectly willing to acknowledge that you disagree and we can continue discussing, but I was taking it for granted that the authors of the RFC did spell out a plan forward that seemed to have good support on the thread at the end. The conversation at this point read as pretty indicative of consensus to me and what appears to be the other participants on the thread.
Having had a few hours to think about it, here is my proposal to get things moving:
- We leverage the energy that exists between the stakeholders here and get them writing code for a strong poc that we can evaluate. I was serious in that Iāve evaluated and talked to a number of people and I do not believe we are at a level of detail to just start letting upstream phab reviews fly ā maybe that resolves at the scheduled ODM and maybe (my estimate), it needs a couple of rounds of poc code we can look at and critique. Sanjoyās estimate of November doesnāt seem wrong to me, having been down this path some number of times before:
- It would be great for the community if folks did this in an llvm aligned repo, and the incubator process is how we do that.
- If that turns in to a big, theoretical discussion vs its intent as a lightweight check for realness and alignment with the goals of the llvm project, then someone creates a personal GitHub repo, invites collaborators and writes some code there.
- We have a look at the concrete proposal and code as it evolves and evaluate whether we move it upstream and continue development there. I personally am optimistic on that point after a few rounds on it, but weāll need to discuss actuals.
- A concrete advantage to not doing this upstream is that it isnāt clear to me yet that we are talking about one dialect or a suite: if we did this upstream, the hurdle of having this discussion at each stage will bias us towards the one and Iād rather have a friction free design space. If it turns out we are talking about multiple things, we let the stakeholders push in those directions so we can see.
- One of us updates the policies for incubators to account for the move to discourse and clarify the nature of the RFC.
- We start a new top level thread on MLIR project structure and contribution model and flush the queue on these topics there vs in what has become a tax spread over a number of threads Iāve observed over the last few months. Thereās clearly a lot of thoughts and feelings about this and it may be more helpful to hash it out as a dedicated topic.
FWIW, Iāve spoked with a few folks 1-1 about my concerns and the reason for my stance. I am happy to do so with a larger group (e.g. over zoom or whatever) if that is helpful.
I agree with Mehdi that the incubator should technically be proposed to on an llvm-community wide forum. I donāt expect concerns but that was the intent of the process pre-forums.
-Chris
I agree too. And on re-reading it, I felt i the policy was ambiguous to the uninitiated and think we should update the verbiage: as stated, it is not surprising to me that folks would think this RFC is sufficient.
That would be appreciated. Or a summary of those discussions.
Honestly, this is not the right way of doing it. Neither 1-1 nor zoom, as none of this is permanent nor have a way to refer back to it. If you have concerns and reasons that influence community wide decisions, then they must be in a public forum, ie. here.
There are three ways we can go about this: existing repo (ex. Torch-MLIR), new incubator and in-tree. IIUC, most people here are strongly opposed to existing repo, which leaves us the other two.
I donāt have a strong preference for either, but I agree an actual RFC with the two clear options would make it easier to find and claim consensus.
To me, the arguments are mostly mechanical. I have no doubt that TCP will become an official in-tree dialect soon enough, there are just too many people and projects interested in it for it to fly. But thereās still the question of what would be its place in the dialect cloud, and that could converge faster with more speed and less stability.
Hereās my listā¦
Incubator
A separate project exposing a standalone dialect that strongly depends on and interacts with existing in-tree dialects only.
PROs:
- POC can be cross-developed from very early days to a working prototype only by the people that really care about it, at a much greater speed than in-tree.
- Speed and stability can be tuned independently without upsetting our existing buildbot infrastructure (upstream and downstream).
- We can have different branches, different experiments, whatever, without affecting the in-tree policies.
- We could use pull requests directly, have issues unique to this repo and not have to create filters for it on existing main GH issues.
- Side-effect: Weād strengthen the standalone infrastructure, making it easier for people to compose dialects (new and existing) downstream.
CONs:
- Existing projects would need to add another dependency, albeit a very small and focused one, not unlike other dialect-carrying ones we all work with anyway.
- Existing projects will pick an LLVM commit via two different paths (their own dep + tcpās dep) and theyād have to be the same. Itād be hard to get all projects working with TCP to agree to the same base commit. This is perhaps the strongest CON.
- Current in-tree MLIR developers would have to work on a separate repo. This is a weak CON, as Iām assuming we all work on multiple repos anyway.
- Weād have to use it as a standalone dialect and connect to other dialects in perhaps less straightforward ways. This is a weak CON and has the positive side effect above.
- This will create the problem that we all hate: at the end, there will be a massive number of commits that someone will have to go through, sanitize, re-write to the main repo and slowly apply.
- This isnāt going to be a new directory in the main repo, but a move from standalone to in-tree dialect, which different structures and build flags.
In-tree
Adding an experimental dialect in-tree, making sure we donāt break other peopleās work/CI in the process.
PROs:
- Reviews on the same place, still on Phab (for those who like it), same cadence as existing MLIR.
- Zero abstraction cost, dialect composition is in-tree and flow with the rest of MLIR development process and pace.
- There is no multiple dependency to care about. Projects working on TCP will just pick straight from LLVM at a point when the necessary commits have landed.
- We can treat this exactly like experimental targets in LLVM, for which thereās extensive policies about and we know how to handle well.
CONs:
- Projects will be forced to pick newer commits of LLVM as a whole just because TCP has changed. This is the same as when the incubator project updates LLVM, but more fine grained and never stopping.
- Itās much easier to break other dialects/infra in MLIR with a commit to TCP (that leaks changes elsewhere) if the commit is in the monorepo, especially if done by experienced developers, certain that theyāre doing the right thing.
- To avoid buildbot instability (up/down-stream), weād have to create a way to not build this dialect by default, for which Iām not sure we have that yet in MLIR.
- Itād be harder to experiment, create different branches, share work via git remotes directly to other peopleās repos, etc. because the main repo is huge and the number of commits between a couple of days is large.
To be honest, my critique applies to all incubator projects (including my own) as we are yet to see an attempt at graduation. I am generally concerned that we keep spawning more such projects without clear understanding of the longer-term path forward for them. FWIW, making mlir-tcp a short-lived incubator project as suggested in
may give us valuable information on what the graduation process should look like. This is not exactly the strategy nor the criteria I was referring to, but having a deadline is a start.
I have several hats in the broader incubator/graduation discussion, but my concerns with all of them are about visibility and co-design. Specifically, at which point is the broader community expected to chime in? As one of MLIR maintainers, should I actively follow the TCP incubator to make sure it follows our IR design practices? Will there be a design / code review upon graduation where other people can make suggestions or request changes? If so, what is the amount of material to review? If not, should we have some limits on what is explored by the project (e.g., it should not be replacing tensor
or linalg
dialects)? Conversely, as an owner of an incubator project, should I proactively seek feedback from future stakeholders? Or can I just write āintentionally draftā code with the assumption that it will have to be revised/rewritten upon graduation?
The comments from @stellaraccident and others suggest that there will be an extra review, but it would be better to have clear expectations.
All my comments aside, I would want us to bias towards action and take the least worst solution, something suboptimal but acceptable, so I can just side with the majority (presumably, in favor of the incubator, maybe a quick poll?) and have a separate discussion on the project charter extension policy.
+1. Maybe lift this to the LLVM level if incubators are involved.
Thanks for all the back-and-forth and all the interesting valuable viewpoints.
I also prefer to bias towards action, short-term incubator is fine with me to get started and I hope we can roll up our sleeves and get down to business, as a community.
The main angle I expect we can agree on is that, wherever this lands, this is the forum where we should, collectively, come together and put an honest intellectual effort into co-designing.
I am expecting all interested parties that have a stake in defining this line of work to actively participate in such discussions and threads.
While I acknowledge the importance of the āhowā, letās not forget the āwhatā: the deep technical discussions and cross-pollination need to start.
I think this is a big enough topic that it probably needs both (i.e. on list discussion and face to face meetings). My preference would be to eagerly upgrade to in person when things get overheated. Side conversations may not be particularly helpful at this phase, but people (incl this community) often resolve such things better with some copressence. Given my experience here and the sensitivities involved, I would definitely want to use some actual f2f discussion as a strategy to make these decisions. That should be official open, scheduled meetings at this phase.
I agree with your meta point that I read about keeping the conversation public and accessible, but I just tire of big, drag it out forum threads and have observed that we do a lot better when we keep the F2F upgrade button firmly in view and use it (even encouraging individuals to talk, hash it out and report back). That also has the side effect of slowing the conversation down, and that can be a good stabilizing effect on its own.
Absolutely! Thatās what I meant, not as OR but AND (and āexclusiveā AND, so to speak). Sorry for not being clear⦠again.
I am personally in the uncomfortable position of really-really wanting this and being aligned with its intent but not yet seeing a level of concrete consensus on the key design points to give a personal thumbs up. Further, I think that some of the design points are subtle and would be better understood by all with a concrete prototype and connections.
I canāt speak for anyone but myself in terms of expectations, but I can say that Iāll flip from on the fence and pushing back to enthusiastic support if I see progress on a concrete, high quality prototype that has encountered some of the key design points that have been worked out in adjacent projects over years and that we need to surface once and for all here.
I also think that agreeing to a schedule of review could be a good grounding force to the work. Personally, I see the 9/1 ODM and the stated āNovemberā sync as good checkpoints that seem realistic and likely to focus the results. Like anyone, I donāt think we can precommit to approving the results on that schedule, but we should definitely review at each point and make a concrete decision about next steps.
+1 to bias towards action and code that we can look at and discuss concretely. This can even mean starting in a personal github repo so that work can literally start now (Torch-MLIR started in a personal github repo too).
Renato, I think youāre confusing a few things.
Here I am offering to share my personal opinion, explain my position on things, and talk to folks who care. This is because I care about the community and the people in it, and Iām quite aware that Iām in a position of being the ābad guyā here and would prefer people to understand the rationale for the position I am taking. I am entitled to communicate any way that I want to, and if I happen to be going on a walk with someone 1-1, Iām not going to refuse to talk about something they care about. I am also an individual in the community here, not speaking on behalf of LLVM as a whole or any other person.
On the other hand, there is consensus building on how to move the proposal forward itself (something that Iām not doing, but the cruise folk are). For that, I agree with you that written documents etc are great, as are long term collaborative discussions etc.
On the third hand, youāre talking about LLVM policy and community led decisions. These policy decisions are not made on >100 post threads in a corner of the project, they are communicated and feedback is collected in well marked threads on project wide channels. As an individual, I would of course participate in such discussions, but decisions are made by general consensus in the community.
-Chris