I really like the idea of having a green nightly tag for upstream. I’ve mentioned this on other threads, but we pick a ‘green’ commit every night to do nightly builds for Fedora, and it would be great if the community could standardize on a single commit every night. It would save a lot of duplicated effort.
What kind of testing happens in the fedora nightly? One of the things that keeps us rooting around in Google repos to pick a green commit is the level of testing that we know goes into those commits. There are never any guarantees, but it is always nice if the thing you are syncing to has some real world mileage on it. Wondering if fedora’s nightly has any fringe benefits on that front?
Right now, we are just building and running make check. Our build system uses low power machines, so there is a limit to how much we can do in a day. We really just want to produce nightly binaries that users can easily install, and starting with a green commit, even if it’s just been build tested helps cut down on our build failures a lot.
A while ago I wrote a script that takes in a range of commits, a number of Buildbots, and finds sub ranges where most of them are green, and sorts by number of Buildbots in each sub range. I used this to find the best commit to merge our project into.
It sounds simple, but given that each bot build time is almost random, it gets hideously complex. After a number of fixes every time I had to rebase, I gave up and just took a leap of faith on visual checks.
If the bots had flags (mlir, clang, specific targets), we could more easily say things like: this target’s most stable range yesterday is A…B. With that, anyone having to merge things that worries about target X and MLIR, would just find intersections, every day backwards, until one is found.
My idea was to run that script every day and just mark the days when we had a good intersection, so then later we could just pick based on what new feature we wanted.
If this is to be used across teams that don’t normally interact with each other, then I suggest we do something similar and publish a list of stable sets based on LLVM sub projects, and one for all, on the projects that strongly interact with others.
Of course, this is getting close to releases, but with less promise and less work by a lot of people. I still want to believe it’s doable, but I have given up to do on my own. Hopefully other people have more luck than I did?
I’ve been slow to keep up with the recent conversation on the binary serialization format, but wanted to mention that for TOSA, we have this problem too - there are live projects from TF & TFLite , Torch-MLIR and ONNX-MLIR to TOSA right now.
We can sort of get away with it for now, despite all these three projects having common builder code described in this RFC: [RFC] Moving TOSA construction lib into MLIR core . I’m in the process of implementing this, but it moves all the dependencies into a single point of coordination within llvm-project and thus runs straight into this particular problem - this was one of the reasons I held off on it.
Another factor that would impact TOSA (and ONNX , Torch and TF) in MLIR is the need for production support for. multiple major versions and the backward compatibility between them. For example, TOSA 1.0 will be live soon. There’ll be a 2.0 at some point in future and it’s expected that both will be active formats with some defined PLC on the older version(s).
Note: it explicitly excludes stability/versioning right now.
This is a problem that is not automatically solved by a binary format (renaming an op in a dialect or adding a new attributes or a new type to an op won’t be automatically solved like this).
(similarly: I don’t see how “shallow dialects” would solve anything in its own either)
See also for versioning purpose this previous approach: [RFC] IR Versioning
I don’t think there is an expectation that anything is automatically solved, but that we can see a path that, for things we care about, we can build further gaurantees. I haven’t grokked the whole “shallow dialects” thing, but I suspect it is aiming to provide some “interface decoupling” for things we care about elevating more towards an API.
I guess I don’t even see the path then. But I’ll wait to see any proposal in this direction!
This is partially how we’ve approached the problem. We depend on mlir as well as clang and lld internally and it’s difficult to pick a green commit based on just the buildbots that are available, so we have an internal bot which will run and build all of the configurations we care about based on main once a day. Then once a week we’ll merge the latest green commit we’ve identified this way (or if the most recent one is read, we’ll generally follow up to fix whatever issue we’ve identified). This largely solves the problem of how to identify the correct commit for us, but it does not help with the actual work of merging that commit into the various projects where we need it.
Exactly, this is the key.
This is something that we discussed in the community long before the monorepo, and the problem is still largely the same: it’s hard to define what is a “good commit” with so many different projects depending on each other.
For example, your project depends on MLIR, clang and lld, but if a user of your project also needs compiler-rt, and you pick a commit that breaks RT, that user can’t use your project until you update the commit to one that doesn’t. The only way out is to not update your project on their side, but with more projects depending on each other, this turns into a dependency hell.
On the other side, trying to find a commit that doesn’t break “something” is really hard and only comes with releases. Twice a year, some people work for a month to make sure all projects build together, with release candidates and a lot of downstream testing, so picking a release is a sure way to work across projects without dependency problems.
But that doesn’t cut for tightly coupled LLVM side projects, especially in-development downstream MLIR dialects that need cutting edge core code.
The only middle ground I can see is for a part of the LLVM community that cares about a few selected projects (ex. MLIR+clang+lld) to create a CI and publish a series of “known good commit ranges” for that particular combination. If we get enough of those sub-communities, we may even do intersection analysis and find good ranges across more than just this or that group of sub-projects.
So, while this sounds a bit selfish, it can scale and turn into a larger effort if it needs to without a significant effort from any one group. At least I hope this is the case. In my experience, I haven’t been able to turn that into action, but I’m blaming myself (and the time I had to do that), not the idea itself.
This is the script we use for finding a ‘green’ commit: https://github.com/kwk/llvm-daily-fedora-rpms/blob/main/github/get-good-commit.py
In this case I’d probably just try every day at the latest version TensorFlow is synced to (take advantage of the work the LLVM integration folks at Google has done and the amount of targets tested), and only if that fails go searching. Given syncs ~2x a day, that’s a pretty recent snapshot. Added benefit is that you can see in the change which bumped the version, all the changes needed.
As far as I can tell, the tensorflow updates are much more minimal than what is needed in onnx-mlir or torch-mlir and they are not guaranteed to work either. For example, tensorflow does not maintain any shared library builds, but onnx-mlir does, and we’ve run into scenarios where picking up a new version of MHLO + matching LLVM still breaks onnx-mlir because now the shared library builds are broken.
I suppose what we could do is run our own testing on whatever commit tensorflow picked, so that we can validate whether it is “green” and then go from there, but I think it would be cleaner if we had a way to pick commits that everyone updates to rather than picking a commit that someone else happened to update to.
There hasn’t been much discussion about that yet, but at Google we’ve been internally bootstrapping something called StableHLO - a version of MHLO that aims to provide compatibility guarantees (and also a specification, a test suite and a reference implementation, although that’s probably offtopic for this thread).
By next week, StableHLO will be switching to GitHub-first development process - the code will be developed via pull requests, there will be a GitHub-based test suite, GitHub Issues will be used to track the work, and GitHub Discussions / Discord will be used for discussions. We’re in the final stages of approvals for all this, and I expect that we’ll be able to tell (and show) more shortly. https://github.com/tensorflow/community/pull/419 talks a bit more about the context behind all this.
The overall goal for StableHLO is to create a community to build an amazing portability layer between ML frameworks and ML compilers, and compatibility will be a very important discussion topic. I’m looking forward to chatting about what we all would like to see from MHLO compatibility-wise.
Correct, I meant it as seed set rather than choosing an arbitrary commit based on time of day (e.g., if today you are already just doing a daily build based on arbitrary build, then this is a more filtered set and then you have much higher chance of MHLO working).
While that’s true, we have a shared library build for MLIR which we actively monitor, so the LLVM side is not broken for long and most of these AFAIK are CMake changes related when shared lib has been broken. So the delta from there to a green commit is small in my mind (but I don’t have data to back that up). MHLO it could be lagging and I don’t recall if the shared library build is that actively checked.
I think step 1 here is to externalize more of the needs. E.g., there should be a dashboard which everyone can see, contains the projects/back ends/features (e.g., ASAN clean & bazel working, compiled on Linux, Max & Windows, compiling with projects X, Y, Z, for back ends Foo, with hardware support Zeta). Potentially it runs every periodically as number of configs large. Getting folks to agree on what that set is, is the tricky part as Renato alluded to. Heck even is bazel or shared library build broken may be controversial to require both from everyone, while important to some.
It still is a scenario of dominoes as currently positioned though, someone has to pick one of these, others then have to pick what someone else picked or hope it works at some other commit which also works for others (but the first one may have already picked before the others discover an issue and then be at a completely different revision when informed). So following the groups that pick more frequently and updates, sets up a tree of updates where the less fast updating following the faster updating creates a set of potentially useful spots. But the fast updating one’s may be constrained on a given day or week and only be able to find a commit that works for all of their needs rather than everyone else’s needs (that’s a function of how active folks are in fixing issues indeed, but some changes are big and require many days of updates, so the faster update may be at a commit not too far ahead of previous while slower updating group may have less dependent projects and so could handle bigger change easier). At least without some additional cherry picking (which different groups will feel different about). Which is long way of saying I agree with Geoffrey. I’m also sure someone has done some theoretical studies on such problems
But this is all hypotheticals, data would be more useful. The only externalized signals today are the buildbots and the commits which the different repos get updated to. I’m suggesting the latter is better signal and also includes all changes needed for an update (it could have been nice to even have multiple commits per update corresponding exactly to upstream change to make patching or intermediates commits usable, but that’s a high bar). Perhaps the simple solution works most of the time and for everything else there are cherry picks or release repos or …
Probably it is a bit off topic but this is also going to give a quite negative externality to non-enterprise contributors:
As currently positioned, all of these things are just projects in the world, and I think we are reaching for something that really is going to need more cohesion for the domain we are working on. If I were just bequeathed a GitHub org with all of these pieces in it and asked to make sense of it, I would probably first create a couple of tiers, distinguishing between foundation and leaf dependencies for the MLIR ML compilation ecosystem.
I would focus on the foundation dependencies first – these are the transitive deps which define the interfaces. In this world, this tier world consist of torch-mlir, MHLO, and onnx-mlir. I’d be aiming for a few things:
- A super CI/forge which advanced all of the foundation deps to new llvm commits at the same time, publishing a new numerical version tag on green and blocking further updates on breakage (ie. Pending a patch).
- For the foundation dialects, make sure that the dialects are versioned and developed towards compatibility principles, using efficient serializations as the primary way of coupling.
- Organize the foundation dialects so that they are ready to vendor into leaf projects. I’m thinking of literal vendoring: copy them into leaf dependencies, including as little of the surrounding project infra as needed to couple via serialization. If a leaf dependency includes foundation version 10 dialects, it can load and interop at that version level. It can stay pegged there for as long as it desires and can even advance its llvm commit – albeit with increasing patching required the further it diverges (encouraging to upgrade the vendored deps versus doing local patch management).
A two tier hierarchy of foundation vs leaf – with a stronger culture of API like consistency of the foundation dialects and serializations – would both establish the center of mass, giving us versions and a commit clock. And it would enable the leaf projects to decouple and manage their schedule locally within bounds. Making sure that we’ve organized the foundation code for lightweight vendoring gets us out of multi level submodule commit hash jail and let’s the leaves synchronize to a “slower clock” as typically benefits oss projects and ci infra.
We are all paying a lot for individual foundation components. I expect that creating more of a commons and donating a bit of maintenance and support would go a long way – if we could agree on what goes into this basic level of projects.
My 2 cents. Might not be feasible to get to that level of organization, but I feel like if we did, we would be really effective at scaling the utility of this ecosystem.
In my experience, these projects come and go faster than a community this large can organise. Not necessarily cancelled, but lose focus or importance, or be replaced by something else.
Also, there are so many of them, even inside the same company (ex. Google, Microsoft) there are competing projects, and teams, as to what’s more important than what, what replaces what, and when.
With the risk of a major bike-shedding for little gain and the possibility of people getting offended very easily, I’d recommend against it. The social side-effect will be even harder to navigate than the already nontrivial engineering effort.
Reading the thread, and reflecting on the past of LLVM, I see two main issues to solve:
- Main branch stability, with the added complexity of the monorepo bringing multiple communities together.
- Coordination of user projects (outside of the monorepo) trying to find a stable point in the monorepo.
The first point dictates the second. The harder it is to keep the monorepo “stable”, the harder finding a green window will be. So there are really two forces at play: the people needing a more stable main branch and the people needing to collaborate across vastly different groups and projects on the same repo.
[Side note: submodules would only make this version hell a lot worse, so I’m not advocating against the monorepo, just outlining the inherent problems of developing around LLVM]
If finding a green window in the main repo is hard, we may try to tackle it from the other side: how can we make it more stable, so that it’s easier for user projects to use LLVM HEAD, which is what we all want and recommend?
The main reasons why the repo is “unstable” are:
- The code base is too vast and there are too many people working on it
- Some people work in small changes on local sub-projects, while others refactor large parts
- Our commit policy is very permissive to allow all of these people working together
This could be made much better with CI, but:
- There are just too many targets and it’s hard to do a meaningful pre-commit CI
- Our post-commit CI is not just too slow, it has wildly different time frames for different targets/tests
I would personally focus on two main topics to help make the green windows larger and more useful:
- Try to create a more focused commit policy for the majority of cases. This is being discussed in other threads (around git issues migration) so I won’t replicate here. This isn’t about restricting what we can do, but focusing on the “standard” way that the majority of commits will come as, and let the rest be exceptions.
- Focus on pre-commit CI for first-class targets. Nowadays, at least x86 and Arm have fast enough builders to perform basic pre-commit CI tests on every PR change. If other targets don’t, we could perhaps setup cross-builds on fast machines. We also need a build of all projects in the monorepo that interact with each other to make sure we don’t get silly build issues.
[Side note: we could also have a soft policy that we don’t commit breaking changes on Friday/weekends, which allows time for us to fix the bots and keep them green until Monday]
If we get these right, we may get a cadence of green windows and then the whole discussion of how to find them or who will get what will be a lot less important.
I also think that these solutions are easier to converge, both the engineering and social parts, than trying to synchronise versions of side-projects across each other and LLVM.
[Foot note: these issues apply to projects like MLIR, but less so to stable targets, where user projects can take releases, as nothing changes much between them. In time, MLIR will be more stable naturally, so this isn’t just about MLIR, it’s about any project in LLVM that isn’t stable yet]
This is the main reason that forces projects that have a dependency on MLIR to actually rely on a rolling release policy.
But you are right as more generally this is true for every project with a rolling dependency on an LLVM monorepo sub-component that could not rely on stable releases.
Other then the CI this is going to impact many non enterprise third party contributors cause the llvm rolling dependency it will go to create huge compiling overhead as LLVM is not a so lightweight dependency to be re-compiled frequently on the avg developer HW resources.