[RFC] MLIR interpreter framework

rengolin · July 9, 2022, 11:09am

Thanks for the summary. To be clear, I’m not against it, but I feel we’re not addressing the points that we’ve drawn from previous experience, and to me, it feels like it’s repeating.

The main fallacies of bringing downstream code upstream are:

It has proven successful on our use case, it’s stable/efficient/robust, so should be good for upstream
Other people said they want something similar, so building from an existing implementation is simpler/faster
Our implementation is incomplete but has 80% of the use cases, it should be simple to finish off the remaining 20%
We can create an infrastructure that is generic enough, so that people can use for different cases, but low cost enough, so that everyone can maintain
Performance isn’t critical for the few things we do, and other use cases say the same, so it should be fine.
We make no promises regarding performance/applicability/usability, so people shouldn’t complain when it “doesn’t work” for them.

The first three points are fallacies because an idea isn’t good on its own. It has assumptions, implementation details and business constraints that aren’t the same for other groups. What you did has a number of choices that may make total sense for your use cases, but not others. Because you want to upstream your implementation of this idea, other people will have to adapt to it, or have serious concerns, and you won’t know until they have tried to use and brought back their constraints. An RFC isn’t enough. A diff isn’t enough.

Example:

How do you know that this would be easy to adapt to your existing code, or if it’s going to change the underlying assumptions completely? How many other features you haven’t implemented and that can turn your simple assumptions upside down?

One of the things that the experience with the LLVM interpreter tells us is that, as the IR evolves, implementing the new features and their interactions with the existing ones become harder than you think.

Another, as @clattner and @stellaraccident have shown, is that the assumption of layout and execution for your case are the same (or even compatible) with the rest of the community.

The fourth fallacy is quite generic to any upstream development. Every new feature has a cost and the decision to add/keep/remove comes from the balance of cost vs. benefit. LLVM has a policy to remove features, even entire back-ends, if it turns out no one uses it or the cost is just too high to keep.

The LLVM interpreter, the multiple attempts at garbage collection and the few JITs we tried and threw away are good examples of things we collectively thought it was a good idea, but the implementation details have shown that it was impractical.

The last two points are very common fallacies in open source.

One thing is optimisation performance, when the program runs ok but could be better, and another thing is when the program doesn’t run at all (OOM, crashes, runs forever) for anything other than simple examples.

Quite so. But I go further, for non-trivial cases, even the change from OOM to completing can lead to non-trivial code.

The final “we make no promises” point is the one that usually leads to removal. You begin with an experiment, you bring to the codebase, people try to use but it doesn’t quite work or it’s too slow, then people stop using it, and you end up being the only ones using it, but everyone else has to maintain it.

When the “no promises, as is” is done on a separate repository, no one cares. You’re the one maintaining it and once you start making promises, people will begin to care. And when you have a repository that is making promises, the reasons to include in the main upstream repository, and the costs to keep it there, are clear, and decisions are much easier.

I know it’s a lot harder for the team proposing the change, but if we would include every experiment into mainstream LLVM, then we’d never get a stable compiler in the end. A LOT of experiments around LLVM have failed (for various reasons), some of them took years of a whole downstream team to do, and it’s sad when that happens.

But there are also examples of ideas that were good at first, but ended up wasting a lot of time from the whole community, creating stress and animosity of a much larger (and global) team. When you think of the scale difference between a local team and the whole LLVM community, it becomes clear that the small teams have to pay the costs of bringing ideas, not the rest of the community.

Any specific criteria in your list may work for you but may not work for others. And it would have to be agreed, and written, which would be a new round of discussions and bike-shedding. I recommend against it.

The main problem with the current state, IMHO, is that your prototype doesn’t seem to be public. If you have customers, then it can also be complicated, because in the case you succeed merging it and the final implementation changes, they’ll have to accept the changes and you can’t do much about it.

The steps that I (personally) would like to see:

A public, out-of-tree implementation, so that people not in your group/company can see if what you did will cater to their needs. It also helps people with fears of high maintenance costs see how each change interplays with another. This could be your current repo made public.
All interested parties discuss what they all want from an interpreter and agree to a common overall implementation framework. Not just the idea, but the overall required changes in code. This is the actual proposal.
A study on the differences and similarities between the past experiences (LLVM interpreter / JIT / GC) and the MLIR interpreter proposal, and why you won’t fall on the same pitfalls. Some of the comments in this thread related to that are still unanswered.

The key here is that bringing an interpreter into MLIR is a big deal because the maintenance cost will be spread across all dialects, upstream and downstream and has the possibility of being the first interaction people have with MLIR. If that doesn’t work well, people will conclude that “MLIR doesn’t work well”, which is what happened with LLVM until we removed the interpreter.

It should not be treated lightly, nor should it come as an experimental idea. If we decide it’s worth it, it will have to be a concerted effort from a lot of people (not just the ones amenable to it) to get that to good quality at a reasonable pace.

Topic		Replies	Views
RAMBLE: How to position new ML dialects in tree MLIR	13	1234	April 4, 2021
Numpy/scipy op set MLIR	47	4783	June 5, 2020
RFC: Introduce ml_program dialect (deprecated v1 proposal) MLIR	33	2477	March 12, 2022
Some ideas about organizing dialects MLIR	15	771	July 25, 2023
Google’s TensorFlow team would like to contribute MLIR to the LLVM Foundation LLVM Dev List Archives	22	458	October 8, 2019

[RFC] MLIR interpreter framework

Related topics