We have been discussing for a while now how to reduce notification noise from long running bots. One of the ideas which looks quite attractive is to have one or a set of “gate keeper” builders to do quick checks and all not so fast builders build only revisions for which the “gate keepers” reported the green status. The main idea behind this is such dependency filters simple problems and supposedly address cases when long running bots are unfortunate with commit/revert or commit/fix pairs.
I have played with this idea for a while and have a draft of the patch for llvm-zorg.
I live out of the scope of this RFC the idea of having the “gate keeper” builders do actual commits. We can discuss this in a separate thread if we want to.
First of all, let me go through pros and cons of this setup.
Feedback to trivial problems will be fast, selective (since each commit will be built), and will not have a long tail of failure notifications keep arriving sometime days after the problem in question got resolved. This is a big win, up to me.
Long running builds wouldn’t fail on trivial problems if they will not build those revisions in the first place. This wouldn’t make blame list shorter, though, just eliminate “oh, this issue is known, and has been reverted or fixed hours ago” cases.
Questionably it would save slower builders resources, since they will not waste time building known to be bad commits.
We add a new layer of dependencies. Some builders will depend on the “gate keeper” builders, so the “gate keeper” builders become a point of failure. No revision will be built on a significant number of builders if “gate keeper” builders fail on hardware or platform issues, shortage of resources, service outage, and such. I hope this wouldn’t be often, but it could be quite annoying for everyone if doesn’t work well.
Not running slower builders on known to be bad commits will not make the blame lists shorter, it will make them longer in fact, because skipped revisions will add to the blame list.
“Gate keeper” builders are “gates”, so they are essentially all-or-nothing. The list of hosts is going to be very limited at the beginning. Like x86_64 Linux only, I think. However, this could be a motivating point to have more fast-running reliable builders.
If we move further with this idea, we would get the “gate keeper” builders in place, running for a while to prove the correctness and reliability, then add the dependencies on them for slower running builders.
Notifications fom the “gate keeper” builders will have distinct subject line to attract attention, I think.
There are things I’d like to discuss before moving further:
What should be “gate keeper” build configurations?
We want these builders to be fast, build each and every commit to llvm-project, provide short (about 3 minutes or better) response time, and be representative for catching those “easy to catch” problems in commits. Partially this depends on how fast is a particular worker, and how much resources it has. But also depends on what exactly we build and what tests we run. Obviously, less we build and check - shorter the response time. Ccache helps to make builds “incremental” in a sense. We can make more granular builders to guard particular projects and build only changes to those projects. Effectively that would reduce the testing footprint, as LLVM projects is a good test bench by itself.
Do we want to have some kind of a policy for how long the “gate keeper” builders could be broken before a patch in question gets reverted?
Longer we wait less trivial revert is going to be in general, since some dependencies on that change could be introduced. Since the “gate keeper” builders are going to be a critical infrastructure point, they must be green most of the time and failures must be addressed quickly.
How that policy will be enforced?
Options I see:
“Gate keeper” builders do auto-reverts. And manually address only cases when auto-revert is not possible for any reason.
Repository lock and “code sheriffs” who will be on a shift and would do a revert or fix it quickly.
No repository lock, but “code sheriffs” step up if an author didn’t revert or fix the issue in time.
I think the kind of failures we should work the hardest to prevent are failures that block other people from doing work. The worst kinds of failures, in my opinion, are build failures, because then, I, as a developer can’t do local development on top of trunk. Or as a distributor I can’t do my daily or weekly snapshot.
I think for “gate keeper” configurations, we should have build-only jobs that build all (or a subset of the projects based on the files touched by the commit) projects for the 3 major OSes that people develop on (Mac, Windows, Linux).
I’m not opposed to adding ‘make check’ configurations as well, but I think we should be very selective about this. Trying to make check all projects can introduce a lot of noise in my experience, so I would prefer if we kept the testing limited (e.g. to just llvm or llvm, clang, and lld).
I don’t think solving this problem is particularly urgent. Reverting is a human decision today, and it makes sense to establish a norm to revert more aggressively for gate keeper breakage, and I expect that’s enough for the time being.
If we do want to solve it properly, IMHO that would involve something like bors which maintains a commit queue and uses tests to gate pushing to the main branch in the first place. (As opposed to reverting after the fact.)
We could call them something else having one or more smoke-test bots seems like an improvement over the current situation. Let’s not have the excellent be the enemy of the better. Implementing a smoke-test bot doesn’t affect anyone’s workflow, while improving the quality of life. A true gate-keeper would be a distinct workflow change (one I would like to have, but is a bigger deal for the community).
The smoke-test bot(s) would inherently be very selective? I don’t see make-check adding noise, rather it would reduce noise. I made ~50 small commits over the past couple of months, and while I didn’t keep statistics, I think build-fail notices were far less frequent than test-fail notices. Most of those were not actionable on my part, or I had already take the necessary action, so a make-check gate would have distinctly reduced the noise I had to deal with.
On this topic: it would be ideal to not have a single “gate” but multiple. For example the slow Linux clang builders should be gated by a fast Linux clang build, the slow Mac clang gated by a fast MacOS clang builder, and the slow flang builder by a fast flang build, etc.
Something annoying is if a slow builder is not running because of an unrelated failure to what this slow builder is testing, which then just increases the blamelist for the slow builder unnecessarily.