RFC: Graduate CIRCT to monorepo?

Leaving aside monorepo angst, I see this as a very strong point of the proposal. I think we need to audit whether all of CIRCT would come in or be chopped in some ways (some parts seem more “integration-ey” than others), but personally, I think MLIR would benefit from such a user as this in-tree. The maintenance cost/benefit goes both ways: too little in-tree usage and it is hard to understand the impact of changes and breadth of actual usage. Too much and many of us have scars from past experience on that.

Personally, I think that at least some portion of CIRCT, using MLIR canonically as a frontend for an important domain would actually help us get to more of a “just right” place with respect to cost/benefit.

2 Likes

One of our engineers has picked up skills on git history rewriting that may prove valuable before we actually do a merge – just to make sure things are tidy. In another repo, we picked through and were surprised at some of the gunk we had from the early days before the project structure was stabilized. Big moves like this, where you are already breaking the development flow can be a good opportunity to browse through and tidy up. Let us know if you’d like help/scripts/etc.

2 Likes

One benefit of the mono repo is that you can do atomic updates on clang and llvm.

1 Like

+1.

Dev teams behind new projects (which would like to be added to monorepo) would heavily decrease their engineering cost to keep their projects up to date with upstream LLVM/MLIR, but in reality those costs would be just shifted to core MLIR/LLVM developers teams.

For example, a MLIR developer changes some core MLIR API (and here we can assume just trivial API change) → now the developer needs to fix all call sites in CIRCT as well (so setup everything for CIRCT, build it, test it). There are efforts to use github pull requests instead of phab to make things “easier” for developers (questionable, but let it be), on the other hand, we introduce a lot of work for core developers; force them to spend some time to work on unrelated projects … or not contribute anything at all.

As many people said, the LLVM project needs to draw a line.

1 Like

Thanks, yeah, I think that would be helpful when if and when it comes time to clean up the git history.

A number of people have raised concerns about the growth of the LLVM monorepo. I just wanted to chime in on the other side, to say that I’ve not personally had problems with it (admittedly, only really working with LLVM and Clang). It’s true that mid-air collisions are possible when rebasing+pushing, but given LLVM’s velocity this is going to be an issue even if llvm/ is split out (and there are solutions to consider that don’t involve splitting out into smaller repos, e.g. handing off the final merging to a bot a la bors).

To be clear, I’m not trying to dismiss those who have had issues, just thought I’d say a few words on the other side. I should also say that LLVM is the largest monorepo I’ve done much work with, so others are sure to have broader experience.

Before declaring the monorepo “full”, could there not be a more extensive exploration of possible development models (other than “the kernel model doesn’t work 1:1 here”)?

For example, what if there was a step between an incubator and “any commit that breaks the new subproject breaks all of us” - say, the subproject still could become a top-level folder in llvm-project and yet still use it’s own development branch (i.e. not main) + CI. This would largely absolve other contributors of both the higher commit traffic and the CI breakages that people are worried about.

It would then put the responsibility of staying up-to-date with main onto the subproject, and similarly for the merge-back-to-main** – a bit like a materialized git submodule. In that way, a subproject could still gain most of the benefits of being in the monorepo, without burdening other contributors.

The last graduation step (if ever?) would then be when the subproject has stabilized to the point that the community is comfortable with changes going into main directly, with all that entails.

There are obviously many variations on this scheme, and the above is but a sketch, yet I feel this topic could be beneficial to explore in more depth.

** this would obviously be helped by having GH PRs (and allowing merges of larger series of commits), but IMO the flang upstreaming effort shows that going through phab for someting like this is still feasible.

1 Like

I think that there is a lot of conflation of three largely independent things in this discussion and it would be worth disentangling them:

  • Should CIRCT be part of the LLVM umbrella? This one is already addressed, it is in the GitHub org.
  • Should CIRCT be integrated with LLVM CI infrastructure such that we get automatic notifications if an LLVM commit breaks CIRCT?
  • Should the code for CIRCT live in the same git repo as LLVM?

The second two are completely separable concerns. There should be nothing stopping any project that consumes LLVM libraries from having CI that automatically updates the LLVM submodule on commits to LLVM and flags commits that have broken things. It’s a project policy question, not a technical question, whether an LLVM commit that breaks CIRCT should be reverted, should be expected to come with a fix for CIRCT, or should just let the CIRCT developers know that they have some more work to do. This is a trade-off between the extra developer load from LLVM developers fixing bugs that don’t affect their use cases and the extra benefit from a broader testing environment and I don’t know enough about CIRCT to have opinions here.

I was one of the people (roughly 50%, from the survey) in the community who did not want the monorepo in the first place and has suffered from the fact that it exists and I have to clone a huge repo to make a change to libc++, so I have some pre-existing biases here.

Most of my objections to the monorepo in the first place also apply to adding a new project, independent of the merits of CIRCT:

  • It adds burden to the people who are contributing to the smaller subprojects.
  • It encourages tight coupling between LLVM components, which is antithetical to our goal of producing reusable modular libraries.
  • It emphasises the two-tier ecosystem where projects that build on LLVM but are not in the monorepo are considered second-class citizens and should expect to not compile with a random trunk revision if they built correctly with the last upstream release. This harms LLVM both by making it less likely that projects will adopt LLVM and by encouraging downstream consumers to test only after releases are branched and so we don’t get good testing from things outside of the monorepo.

It seems that CIRCT is suffering from the second two points and, rather than trying to fix that problem systematically for downstream consumers, the proposed solution is to increase coupling. This does not feel sustainable. The logical conclusion of this strategy is that Mesa, rustc, ponyc, ldc, tensorflow, and so on all end up in the monorepo (the more likely outcome is that they get tired of being second-class citizens of the LLVM ecosystem and move to something else).

2 Likes

I think your breakdown and analysis here is spot on. For me, the discomfort is more about increased coupling and the negative ecosystem effects from that being the norm. I have always been puzzled by llvm’s built in contradiction of being focused on modular, reusable libraries/etc, and what I have observed as a systemic disregard for the projects using it (I would soften your first class/second class characterization, but generally agree with the sentiment). That is just not how software library projects run: they can still move fast and frequently break things if needed, but there needs to be accepted community norms around making choices that limit the impact of that where possible. The difference is in the details of how the project is run over time vs any one thing (it is more about instilling biases vs rules). The project itself needing to cope with the pain of using itself is a way to make sure the bias is present.

With all of that said, in this one case, I still think that at least some of CIRCT’s dialects rise to the level of being in the monorepo and would be a net gain to both the project and the domain: no matter what the development model, limiting diamond dependency skew for interop components (which is what at least some of CIRCT is) is a worthy thing to look at in its own right.

+1 - if we are primarily focused on coupling, this is separable from whether the code lives in one git repo or multiple.

Where I work has a somewhat (in)famous monorepo and supporting automation platform to keep it green, and what I have observed over the last ten years is that people who mainly know that present state forget how such things got to where they are and that there are perfectly reasonable middle points that are suitable for different types of situations, even in the quest to full on at head consistency (which may or may not be a worthy goal). As llvm grows, I would rather see it evolve tools and processes that do not require global, single commit consistency – I just don’t see how that scales given the state of the code, community, supporting tooling, and amount of money/time available for the level of infrastructure needed to make that work for real.

As just one example: we couldn’t always rely on single commit consistency and even in the cases where we had the technical ability to do so, in the early days it was often not worth the cost for a specific project to buy in. We lived with “sync to green”, milestone commits, carveouts of core vs leaf dependencies, and eventual consistency – and the ability to do so was driven by a few pieces of infra that all of the devs used. We built big, successful teams and products for many years in such a state. It works, and you just deal with some of the inconveniences. The reason I’m mentioning this is because there is a lore that has built up about this topic and some of that derives from my employer’s case and activism. It was just one path of many and we got the job done without the fabulously expensive global consistency that often gets conflated with “at head” development.

The cost of just pushing more into the monorepo (with global consistency) when there is friction is that we never grow those processes and development model refinements. For things that rest in that justification, I think that as painful as it is, the friction is working as intended (and I do manage multiple downstreams and acknowledge that the pain is real and we should look to limit it with tooling and process investments).

[quote=“stellaraccident, post:37, topic:61890, full:true”]

I think you make some good points here.
I think it is not so much the close coupling, as the API between the components keeps being broken.
I think the Linux kernel development module could apply here in its statement “never break user space”. We could apply this to the API between LLVM and all the sub-projects.
I have seen cases where a new LLVM API call is added and the old one deleted that is completely unnecessary to delete. One could have left the old one in there without any problem.
If LLVM was actually modular as it pretends to be, the wish for everything to go into a huge mono-repo would be unnecessary.

It’s important to note that this was really a by-design part of LLVM from very early on - having a stable API means you get stuck with bloat that has to be carried for a long time (assuming you follow some version of semver). Heck even the C API wasn’t that stable until Eric and others took a run at stabilising it a number of years ago.

It causes us (Unity) some pain when updating, but I think that pain is worth it so that the core LLVM contributors can keep making the technology better.

3 Likes

My personal opinion: discussions around this in llvm tend to fall into all or nothing thinking, when that is not actually the reality. That is why I phrased it as biases to instill, not rules. As you note (and there is more evidence of in other parts of the project), when care is taken, the churn can be reduced a lot. And you can scale care-taken-by-a-few-individuals into norms, often with development process, tools, some experience, and good expectations/communication around what is under active development and design.

I think most of us want the software to not be debt laden, and I support many patches towards that end, even though painful. I just think there are middle roads and llvm is at a scale where we need to find them.

3 Likes

I was not talking about a stable API. I was talking about not breaking stuff. There is a difference.
For example, recently, someone changed a method call from having 3 parameters to having 4 parameters. So, I had to manually go though all my code adding a NULL for that extra parameter as the NULL was a perfectly reasonable default. They could have left the method with 3 parameters in the API, and just make it internally call the 4 parameter version with the extra parameter set to NULL, but no, they decided to delete the 3 parameter version. Its unnecessary breaking the API, without it really causing bloat.
They seem to manage the API correctly in other places. For example, the migration from pointer types to opaque pointers. Both are available until the old one is depreciated.
Similarly, correctly done for the old pass manager vs the new pass manager. I.e. both the old method and the new method work for a sensible period of time.
This approach permits less close linkage between components and is far more scalable.
People are free to improve their own components without impacting other components.

1 Like

I see this problem a lot in the release branch, where we are not allowed to change the API. There are many API change patches that can’t be backported (or need to be rewritten) due to what I consider to be unnecessary API changes (e.g. adding an extra parameter to a function and not preserving the old signature.) If we had better policy around this as a project it would make it easier to fix more bugs on the release branch.

Then document LLVM’s policy for supporting downstream projects.

It is interesting that you mention these as “good examples” while I consider both of them as catastrophic counter examples of what we should strive for: they are massive projects that dragged painfully for many years and still aren’t really completed today.

The counter point is that “bloat” and “technical debt” are just making the project more complex and harder to evolve/maintain for not enough added value to the project.

3 Likes

This is a fascinating and important discussion, but I would like to pop back up to the specific topic of CIRCT and one thought I’ve been mulling over.

From these comments’ sentiment, it sounds to me like there is tentative support from the MLIR community to at least have part of CIRCT upstream. This is maybe not the outcome CIRCT had in mind for this RFC, or as an incubator project in general, but what about this…

Rather than upstreaming “the CIRCT project” into the LLVM monorepo as a top-level sub-project (like Flang), maybe the “production quality” dialects and transformations could graduate to upstream MLIR. This would definitely take into account (and potentially help inform) any restructuring of MLIR itself ([RFC] Restructuring of the MLIR repo).

MLIR already has a reasonable process for adding new upstream dialects. Maybe the solution here is to go through that process for the parts of CIRCT we want upstream.

It seems like that would still achieve the goals from the original post and focus additional MLIR maintenance burden on the core CIRCT components. From LLVM’s perspective, this should add similar maintenance burden as other new MLIR dialects.

I am just speaking for myself here, so I’m curious what the other CIRCT community members think.

One counterpoint is it would be more annoying to contribute to both “production” CIRCT components and “experimental” CIRCT components, but again, I don’t think that is the end of the world. I’ve had to do this style of development while evolving the MLIR Python bindings at the same time as I used them in CIRCT, and it is just another process to learn IMHO.

Can I 1,000,000x +1 this? I don’t think opaque pointers has been nearly as rough as the pass manager, but people seem to forget that the “new” pass manager migration is not done, and while in the incomplete state it is in people have been prevented from making reasonable improvements to the “legacy” pass manager. The original pass manager was added in 2002, marked “legacy” in 2013, and is still not replaced in 2022.

There is a really difficult balance between breaking things for yucks and being able to correct past mistakes. I’m certainly guilty of leaving some past mistakes around the LLVM codebase, and I would hate to think that we’re stuck with the code “Yesterday @beanz” wrote forever because that guy was not smart.

More to the topic at hand: I think adding CIRCT will impose minimal burden to the community, and I see no reason it shouldn’t be included in the monorepo if it conforms to LLVM developer policy and methods (read: Phabricator and lab.llvm.org, not PRs).

Enabling PRs on the LLVM repository even if they are only intended to be used for one part of the codebase will make things confusing for new contributors and harder for developers who work across project boundaries on CIRCT. I’m 100% in favor of moving to PRs, but CIRCT can’t push the community to make that change.

2 Likes

Should we start another thread to discuss possible changes to how we evolve the APIs in the project? There have been a lot of good points made in this thread, and I would be interested in discussing more, but I don’t want to get too much off-topic in this thread.

This would certainly break the perception that MLIR is tied to machine learning quite dramatically!

1 Like