Certainly: the cost/benefit goes both ways and it will be a (in this case gentle) gradient towards maturity. If what you’re actually saying is that MLIR is so experimental that it is not ready to take on any production users, then we should probably put that at the top of the docs.
I don’t think we are litigating whether the door is open to everything in this RFC. In fact, the next incubator in the rota (torch-mlir) is one that I started and would most likely argue against inclusion in the monorepo (both based on maturity now and perhaps indefinitely based on bulk and dependencies).
This debate that we are having is part of the process and why I have more confidence than not that this scales: the rate of aligned, production grade “CIRCT-shaped” projects is low and we will litigate each one of them on the merits.
I am definitely not advocating a lowering of the bar or an open door.
I strongly disagree with the way you’re phrasing it here: “API stability” has nothing to do with “experimental” or “production users”.
Just like many folks shipped LLVM compilers by forking and freezing it on a particular version (or updating once every year). Would you say that clang is “experimental” because any compiler based on a fork of clang has to suffer from API instability?
I agree we’re not debating the merits of CIRCT here, but regardless of how much CIRCT is up-to-date with the latest MLIR “good practice”: the cost will end up being the same on the long term, and this shouldn’t be discounted.
Sigh… OK, you brought up the elephant in the room. Honestly, I don’t see how to scale MLIR further without more API stability in general… The difference between in-tree and out-of-tree projects is how the technical debt is managed. In tree, the technical debt is quite visible and in most scenarios has to get paid-down quickly and often more efficiently. With out-of-tree code, the technical debt becomes more difficult to keep visible and quickly accumulates. I fear that without a better sense of ‘what is stable’ vs ‘what is not’ that projects not in the monorepo will eventually and inevitably fall flat on their face living on a development treadmill that never ends. Maybe this would be a good roundtable discussion topic for the next LLVM meetup.
I don’t think that providing arbitrary backward compatibility really solves much, either for in-tree or out-of-tree code: Architecting and verifying ‘backward compatibility’ comes with its own development costs and without really understanding the true needs of clients, that cost may be unnecessary.
In my mind, I distinguish less between the needs of so called “MLIR core developers” and “MLIR users”. MLIR exists to help users of it to solve problems. MLIR core development gets leveraged among lots of users (Great!) and MLIR users identify problems and goals of different types contributing their experiences back to help others get benefit from it (Also great!) Promoting an “Us” vs “Them” mentality by valuing progress of the core infrastructure (perhaps with a cost X) over the incurred technical debt on N projects (perhaps with a cost N*X) seems to divide the community rather than bring it together. As a community, we need to work together to balance these things. Technical debt (usually) has to get paid somewhere… The question is just when and by whom? I’m happy to hear suggestions for what we can do as a community to make sure that progress continues without the burden being focused on only a few people. Even with the current state of the monorepo, this would seem to be worthwhile.
You end up to the same place, I just took a shortcut here: some of the MLIR Core developers know too well that this is where there is a de-facto difference of balance between “core developers” and “users”. But also don’t forget that all (?) “core developers” are also “users”, so when we differentiate it is really about “users” who don’t engage/contribute upstream.
I believe it is fairly fundamental, and as such I’ll have to disagree with you on this:
Most of the improvements in MLIR will break “some” existing users who won’t see any benefits (so they will consider it unnecessary churn, and push back against it if they can) while it’ll benefit either other users or open-up to future users (or even: just help keeping MLIR Core healthy).
I’d be interested to hear what it means to “bring the community together”?
IMO there is the community of people that are actively contributing to Core (all of them being also users) and there are folks who are just users and don’t invest in developing MLIR Core, that is fine as-is from my point of view.
The people who develop Core are the ones who balance the development model (and they also pay the maintenance price in their own projects when they have their “user” hat, so they aren’t “disconnected” from the actual cost of evolving MLIR!).
When I have my “core dev” hat, I try to make abstraction of the cost of updating my project (TensorFlow) and I review patches and RFC based on their merit and what they bring to the project: that is I look at a it wondering “is this pushing MLIR in a better direction?” and I avoid thinking “is the cost of updating TensorFlow for this patch worth this change?”.
Hopefully my fellow core developers are looking at it this way as well.
Big +1 to all of this. I am likely one of the most active “core” MLIR developers, and I have also always been a user and maintained (sometimes manymany) users of MLIR. I have always paid the cost of API updates (generally always for my own changes), and also for others. I don’t look at patches from the perspective of “well, that might be annoying for me to integrate”, but rather if the patch is a good direction for the project as a whole.
Ugh, I really dislike having this discussion but nerd sniped. When I have my “core dev” hat on, I also apply this methodology. But I also ask “would I like it if this patch landed on me as-is on a Tuesday afternoon?” Asking that question modulates the “how” not the “what”, and often there is a spectrum of perfectly good “hows” – some of which are as simple as getting a second opinion on whether we really want to break this. My main experience here is with the C APIs, Python, and build system, and I can say that asking that question has never resulted in me compromising the pushing of the project in a better direction bit, but I believe it has resulted in a gentler experience among the population of user/devs of these components. The all-or-nothing attitude in these parts on this point has always puzzled me, and I dislike the logic knots we tie ourselves in trying to parse it. On a practical note: I’ve noted that projects which talk/ask and are considerate on these points can find that that pulls more bystanders in as devs. I don’t know why… probably because it is more humanizing.
I’ve really valued the CIRCT folks stepping up and in over the years on the parts of the core codebase we’ve collaborated on. They were some of our first users of some of the APIs from back when things were wildly unstable (versus just “wobbly” today, I would say). I’ve really valued their contributions, flexibility, and help on the infra. I wouldn’t be advocating nearly so strongly for the inclusion of their project if it hadn’t been such a good two way street.
This. Not only that, but also patch reviews, research grants, etc. That has certainly prevented me from expressing my opinion in the past.
Another aspect to have in consideration is non-professional developers, such as students, professors, and hobbyists. I hear folks complaining all the time about how hard it is to hire compiler devs. Are we doing anything to attract & retain new talent? Or are we pushing the contribution bar too high so that only professional developers can work on LLVM?
LLVM has very few students/professors involved with it. If we want to keep them around and increase the pool, we cannot ask people to fix a dozen unrelated codebases (LLVM, MLIR, clang, flang, CIRCT, lld) when they just want to commit a fix to LLVM. That doesn’t align with academic interests (papers), and so most academics will give up in the process.
Plus many non-professionals don’t have high-end workstations. It may take them a day to compile LLVM. I’m not exaggerating.
Adding more stuff to the monorepo will only make things worse. More dependencies, increased compilation time, increased testing time, more reverted commits, etc.
I agree. I’ve mentioned that it is awkward for me because in generality, I am anti-monorepo, and I am effectively arguing the other side because that is what we have agreed to and it is important that we be “consistently in or out” on this point. If we are monorepo based, we should have our aligned, accepted, engaged, production grade projects in it (and layered so as to preserve optionality to a reasonable extent) – not half in and half out based on feelings of fullness. I used four strong adjectives there because that sets the bar high and I think matches what we can sustain and matches qualities that people have latched on to in this thread.
If there were a new referendum on monorepo and project organization, I would likely be on the other side. Although I also identify with the commenter UT who asked if before declaring the monorepo full, we look at processes and tooling. My opinion is that how a monorepo is represented “on disk” is separable from how it is used. We’ve conflated both and they need not be if we are looking for a middle road.
I believe you. When I did work on gcc as a non professional back in the day, it took 12 hours to build on my desktop. I had a friend who literally let me unrack a server from his datacenter and I had it under my futon for months, whining in the background all day, in order to make the build time reasonable. That was almost twenty years ago, and it is interesting that we are at the same relative position.
We certainly have better options today that can avoid the persistent, physical toll of this (ie. Get a bigger machine)… Goma, RBE, cloud services, pre-built docker dev images. I wonder if there is something financial that can be done to increase accessibility of those options for students/researchers?
Thanks for all the thoughts everyone. In general, I haven’t heard any specific feedback about CIRCT that would prevent it from being upstreamed in some form, and some people are quite positive.
On the other hand, there are a number of concerns about adding anything to the monorepo, particularly something that is relatively loosely coupled to ‘LLVM’. In fact, some people expressed that they believe the monorepo to be a bad decision in the first place… On one hand, the relatively loose coupling is a good thing, since it is highly unlikely that any change in /llvm or any other subproject other than /mlir would affect CIRCT at the moment.
Ultimately, it seems to me that a fundamental issue is really ‘dependencies in large software’ and how these dependencies affect the ability of the software to change and evolve. Without care, Monolithic software tends grow dependencies between different parts that are difficult to disentangle. Change becomes difficult because the effect of a change cannot be easily determined. Modular software design encourages well-defined interfaces between different components to limit these dependencies, partly with the goal of enabling more change. Several different approaches to managing change were brought up in the discussion:
The ‘Linux’ approach: Interfaces (with user space) are eternal. New interfaces can be added, as long as old interfaces don’t change. Existing design errors/bugs must be preserved. The implicit assumption is that changing the users of these interfaces is extraordinarily costly or impossible.
The ‘package manager’ approach. Interfaces are well defined, but can change with little notice. Code is generally not built together, but understand what version of other components they require. Changes generally ripple from one project to the next as new versions are incorporated. The implicit assumption is that changing the users of an interface may be costly, but that cost is largely placed on the users of the interface.
The ‘monorepo’ approach. Interfaces are loosely defined and can change often. Code is built and tested together. Dependencies need to be considered more carefully, particularly if different components have different levels of maturity. Changes can be made atomically, eliminating the change ripple through different projects. The effects of changes can often be seen more easily because important dependencies exist in the monorepo.
I was a little curious about how many changes affect multiple subprojects. I wrote a little script:
#!/bin/bash
for i in {0..5000}
do
git log -1 HEAD~$i --dirstat=files | grep -e "^[[:space:]]*[0-9]*\.[0-9]%" | sed -e 's/.*% \([^/]*\).*/\1/' | sort | uniq | paste -sd ' '
done
and processed the result with | sort | uniq -c | sort -rn resulting in:
Clearly the vast majority of recent patches are not crossing project boundaries within the monorepo… Furthermore, although the monorepo is built together, it reflects a modular organization more than a monolithic one. There are probably other useful interface boundaries (e.g. the interface for targets in LLVM) that aren’t captured here.
I’m not sure you can draw any useful conclusions from the data you presented. I think it’s fairly obvious that the vast majority of changes is not going to touch more than one project, because the vast majority of changes do not touch the public API.
There are also a few more things that this doesn’t capture:
People working on LLVM will usually not run Clang tests for each change. It is somewhat common that a change to LLVM ends up breaking a Clang test, which will get fixed up in a separate commit afterwards. These two should have ideally landed together.
Just the possibility of such a thing happening means that contributors need to build and test more subprojects than they really ought to. I will usually only test LLVM, but when making changes for which I suspect potential impact in other areas, I may temporarily increase my build configuration to additionally build/test Clang, Compiler-rt, MLIR, Polly, LLDB and/or LLD. Often it turns out that there is indeed no impact, but this is still work that needs to be done. (The only consolation here is that I had never had cause to build Flang as part of routine development work.)
@nikic : I would also add to your points that this existed before the git monorepo when we were on SVN, when I was breaking clang by changing LLVM 7 or 8 years ago the clang developers were asking me to checkout and build/test my change with clang already! Breaking LLDB or LLD was not a common occurence so we were getting away with it.
More than “monorepo” vs “non-monorepo” the real cost (what I believe @nlopes and others referred to) is the “support contract” between these subprojects. If we’re not in a monorepo but I still have to clone clang and test it before my change to LLVM, I’m not sure not having a monorepo would be a win.
I wanted to +1 the idea that we have to draw a line somewhere to avoid unbounded growth of the monorepo.
However, given that this is the infrastructure and set of practices that we have today, I don’t think it makes sense to declare the CIRCT is where we should draw the line. If people are willing to do the work and the community wants to be here, we should merge in the project.
I also wanted to add that, given how things have developed, I think I’ve changed my mind a bit about whether the monorepo was a good idea. When these decisions were being made, I was working on MSVC C++ ABI functionality that cut across Clang, LLVM, and compiler-rt, and I really valued the ability to land commits across both projects. I had that ability with SVN and I didn’t want to lose it. At the time, I was very focused on C/C++, and I really thought of LLVM as everything you need in a C/C++ toolchain: compiler, linker, debugger, assembler, etc. I subscribed to the idea that it was nice that parts of LLVM were reusable library components, but in reality, those priorities were secondary and could be compromised to build a better toolchain.
Since then, the scope of the monorepo has increased in ways that I couldn’t foresee, and maybe we’d be on better footing now to grow the project if we’d made different decisions then. However, we had good reasons for our decisions then, we are here now today, and the grass always seems greener on the other hill. So, let’s make the best of it going forward, and try to be open to change in the future.
I apologize for continuing to discuss the monorepo in this thread despite the desire to fork it out, but it seems like it really is the critical issue for most folks here.
This is a huge issue that a lot of people care about, but it is a hard problem and while I don’t discount the impact of the large (and growing) monorepo on this issue, many of the problems here are orthogonal and probably not on-topic for this thread.
I think there is a really difficult balance in keeping contribution barriers low while keeping the quality bar high. I don’t personally think we’re doing a great job here as a community, so it is something I think we should work on.
I can clean build LLVM on a $1000 MacBook Air in less than 15 minutes. I think there is a bigger gap in documenting and communicating reasonable workflows than in the technical limitations for building LLVM. That said, if you’re also building Clang, the runtime libraries, LLDB, MLIR… yea, clean builds get slow, but incremental builds still aren’t bad and tools like sccache make it very reasonable to develop on affordable machines. I believe this is more a gap in documentation than technical.
Yea, that’s true, and has always been the trajectory that LLVM was on. Not saying it is right, but that is how we’ve been going for years.
Ditto on the cheapest M1 Mac (~30m for a universal build). On the flip side, we are often seeing build times in excess of 4h on various capped CI systems that may be closer to the 4-5 year old commodity systems that I imagine are still in wide use. And we regularly get grumbling about build/test times from people with professional grade equipment too. I think there are improvements to be made, and I don’t think that this proposal will move the needle really at all in either direction for people not directly working on it.
FWIW I agree here. I don’t think Circt is the thing that will bring everything crashing down, and I personally see some value in bringing it in (from an MLIR perspective, Circt is one of the few production projects that can be used to benchmark/test/etc. against). My previous posts were intended to strongly push for us to be cognizant of how fragile a monorepo can be (once you go too far, it’s nearly impossible to go back).
I would put myself somewhere between neutral and soft +1 in terms of integrating circt. I’m sympathetic to both sides of the discussion, but I don’t want to be blocking either way at this point.
My opinion is the CIRCT (and pretty much anything else to be honest) should not be added to the monorepo. I think the monorepo is already too big and the technical challenges of hosting so many projects in a single repository are going to start to negatively impact the project very soon (if they have not already).
FTR - our policy does explicitly encourage a trajectory towards the monorepo (to be mediated by a discussion like the one we are having here). I would not be opposed to revisiting this policy (in fact, quite supportive, if it helped us get to a place where we are better able to technically handle a larger umbrella of LLVM aligned projects), especially in light of the clear reservations of a part of the community.
But I am against having a policy that doesn’t just set a high bar, but an impossible one. People make ~years of investments with such policies in view (as CIRCT has done), and I believe that we should honor that. If we are going to declare the monorepo full with respect to new projects, then we need to set expectations and policy accordingly.
For myself, I would consider it a good outcome to evaluate CIRCT under the current expectations, without considering it precedent setting or “case law” and also start a simultaneous discussion to revisit the monorepo as the ultimate destination for all umbrella projects.