We originally started it as a fork of the LLVM repository, but transitioned to the MLIR standalone template, and we found it more productive to iterate out of tree in this fashion, bumping to the latest LLVM version every week or so as needed (note: the ability to exist out of tree for MLIR dependent projects is actually quite good, and the more of us who do it, the better it becomes).
How do you deal with the problem of using the “right” LLVM version? As
somebody who spends a significant amount of time on a project that is
open-source but out-of-tree – and for good reasons that mean we’re
unlikely to want to incubate in this fashion – I find this to be a
If the goal of incubation is to eventually become part of the
llvm-project monorepo, I feel that being inside the monorepo should be
a goal early on.
Actually that would be a big problem in practice, because it means that either:
- random changes in the monorepo can put the incubator into an unbuildable state
- people changing the monorepo need to somehow build and test and fix incubator projects
I think you misunderstood. The idea isn’t to have the incubated
project in github.com/llvm/llvm-project. It’s that the incubator
project is a fork of llvm-project.
Currently, in npcomp, we have a monorepo hash that we bump periodically. That means that people can follow our README and build our project at any point by checking out the right monorepo revision. Npcomp developers have the responsibility of fixing our own code as LLVM updates.
I suppose this works, though it seems to me that this is strictly less
convenient than having the project be a fork and just merging the
llvm-project master periodically instead of changing the README and
forcing everybody to update their llvm-project checkout associated to
the project manually.
Not duplicating the monorepo helps to ensure that you don’t diverge from the rest of LLVM by patching it (you’re losing flexibility in the development of course, but then shouldn’t this just be in the monorepo in the first place?)
The point of incubation is to have a path for getting into
llvm-project, right? At which point you have that flexibility, but we
don’t give out that flexibility immediately as a free-for-all. Having
the incubated project be an llvm-project fork gives you the “training
wheels” for working in a way where you consider co-development of both
the incubated project and core LLVM. I agree that there’d need to be
guidelines about keeping the “local” changes to the code from
Really, my main motivation for this though is to make day-to-day
development simpler for the incubated project as per my replies above
I suspect this is going to be a case by case, based on which other top level projects are a primary dependency.
Before the repo was open, we tried it both ways, originally starting with a fork. Then, on the advice of a collaborator who had worked on the MLIR out of tree template, I set some time aside to give it a try. I was expecting to need to reorganize things but was pleasantly surprised: a couple of top level cmake changes were all that was needed. Along the way, there were a couple of other patches to the main repo cmake files to include missing things in the installed target, but that is WAI in my opinion: using them is how such things get fixed.
What I wasn’t ready for was the subtle efficiency boost of working in this way, and I expect that to be quite case by case: the smaller footprint of being able to work through structural things common to early projects is just a lot easier in a repo that has ~100s of files where complete reconfigure/build time is measured in seconds.
Early on, we were thinking we would need to maintain a lot more non-upstreamed patches to the existing core projects, which is clearly easier in a fork, but we found the layering of MLIR to make the inverse easy and efficient. Plus there are some fringe benefits to stricter layering:
- it drives improvements back to the core projects, making them easier to use in this fashion.
- in the case of MLIR, it got us out of the bad habit of just shoving more dialects into mlir/Dialects and friends, instead building out our own tree for local dialects, transforms and conversions. With a fork, it is almost too easy to just put things in the easiest place, and for something you actually want to grow up some day, better organization early can be pretty important.
- it gave us more license to think about the identity of this project as a distinct entity, which, again, was a subtle pressure but, I think, a positive one.
- (minor) the repo has its own readme at the top level, helping visually distinguish it from all of the forks without staring at the directory tree to see if it had an “npcomp” directory.
I suspect there are hurdles we haven’t faced yet but it’s been a good experience so far it seems like at some point, a project will pass a critical mass, where transitioning it back to a fork will be important, but that can also be just a git-surgery script that merges it back together that we iterate on until it works.
At the outset, I didn’t think I’d be advocating for out of tree as a starting point. I expect that projects that are depending on parts of llvm that more force you into a long term development branch mode will have an entirely different experience, and would likely benefit from choosing a different starting point.