Aggregating builders and deprecate Buildbot as the one ring

When answering a question in a buildbot review, it occurred to me that with the current state our CI is (highly distributed across Buildbot, Jenkins, Buildkite, Github), we really don’t want to be discussing how we’re going to “bring them all together”, but rather thinking how can we aggregate their results in a way that allows people to create whatever CI they’re more comfortable with, as long as they follow some minimal guidelines.

Many years ago I wrote a buildbot monitor (that is still being maintained by Linaro) because the main page was a nightmare to follow.

Because we had our own master, I added support for multiple servers, and now, it seems, that monitor also supports Buildkite. After all those years (since I wrote that script), the need for an aggregator hasn’t changed. In fairness, it’s probable more relevant today than it was then.

There’s a lot of talks about buildbot quality, pre-merge CI, noise etc, and these are all hard problems to solve. Trying to bring them all under a single CI technology doesn’t help.

At the very least, it stifles innovation because we’re not allowed to experiment with new builders. But it also forces people to use some CI technology they’re not used to, and end up creating sub-par quality tests (like I did many times over the last many years) because doing better involves spending weeks learning a new CI tool that I’ll never use again.

So I ask: Is there such a thing as a CI aggregator?

And can we just give up relying on a single Buildbot Master instance and a single person to manage it all?

About pre-merge CI, Github is good enough to register many sources (I’ve used DevOps, Buildkite and Github Actions, so far), so it doesn’t matter what we use once we move to PRs.

Is there a general discussion / work group looking into this?

@pogo59 @dblaikie @tstellar @gkistanova

1 Like

2.5.22. Multimaster — Buildbot latest documentation exists but I’ve not seen it in practice. I assume it presents a single web interface, perhaps with categories per master.

I think Python uses Buildbot too, maybe we can find an existing example.

I guess you could take inspiration from that, but pre-github the problem is what system to push the results to.

What kinds of questions are you looking to ask this system?

I use the Linaro monitor for:

  • Are all <architecture> bots failing
  • Are all <project> bots failing

And as a committer, “how is my commit doing generally” is good. Buildbot has a view for that (albeit slow at times) but as you say that’s just one part of the testing universe.

This sounds a bit like LNT. For instance we have Buildbots running LNT, some people use Jenkins, I’ve heard of TeamCity being used, all pushing to LNT servers.

I plan to work on pre-commit testing infra in the next year, with the assumption that it will be triggered from github PRs. No group as of yet but I got a good level of interest at and after llvm dev.

One status page to rule them all would be awesome. As things stand, I look at buildbot and never think about anything else unless I get fail-mail.
I do kind of like the buildbot page’s grid in principle, as a comparatively compact presentation, although there’s really too much data to present coherently in one view.

I was not aware of this, thanks!

Huh. When I look at the linaro page, I see only arm/aarch64 bots. If there’s a way to get other architectures, I don’t see it.

That was the whole point! I’m glad it’s still valid as is! :smiley:

Ah, well, he probably mean “all Arm architectures”. :slight_smile:

If we could extend (or chose some more professional tool), we could have for all arches.

I’d also love to have a shopping basked where I can filter per feature (test-suite, clean build, csky) and only list the bots, or only the failures, etc.

That still doesn’t fix the other problems:

  • What happens when Buildbot gets deprecated (like Phabricator)?
  • What about people/company that have to use other CI tech (Jenkins, Buildkite)?

But I still vote for a hybrid system of builders. It’s easier to get people/companies to contribute. It’s better if one does something better than the other, we use each for different things. It’s easy to get them under Github Actions for both pre and post commit CI.

+1. Requires some investment in web UI to manage the filtering, but maybe someone can take inspiration from a shopping website where this is common practice.

I was hoping there was something, perhaps not as full featured as that. Maybe Github Actions?

GitHub Actions supports steps: build, check clang, check lld, check llvm, …

But we have to build the LLVM Github App for data analytics.

@tstellar was looking into using PRs and had positive experience with github actions as I remember.
I am in the process of moving all debian / unix agents for premerge from one GCP to another (and upgrading them in process). Will be happy to migrate to github actions or update buildkite to work with PRs.

We’ve been using Buildkite with Github for a while now, it’s pretty straightforward. There’s a plugin and you get the auth code and that’s it. Then you can search the builds in the branch protection settings and select the pre-merge CI you want to happen. I’m guessing the same is true for other tools.

Github Actions also has “pages”, which I’m guessing exist to help build the kind of monitor we’re talking about. At least it’s hosted by Github and if it does 1/10 of what we want (in this thread), it’d already be an improvement.

Unfortunately it is Jekyll and not PHP. It is for static pages.

My 2c, as someone who tends towards authoritative/singular solutions - I think it’d be unfortunate to lean into a diversity of testing tools, as it’d risk making them even more opaque than they are to developers trying to understand what’s going on and having to deal with different UIs, etc, even if there is a rollup UI.

I worry about folks jumping on the next new thing and no one being incentivized to pay down the technical debt of old infrastructure/porting it to the new thing.

GitHub checks can serve as an aggregator. That seems like a nice thing - it is tied to the PR, it will be something which you can easily lookup, and it has a singular reporting site. If there are non-GHA builders involved, there is an API to post the result. While I think that @dblaikie has valid points, it is something which alleviates some of the concern - a singular UI that will point you to the logs is very helpful. It also would help prevent the “next shiny thing” syndrome since you cannot control it to the same degree (it is part of GitHub).

1 Like

Fair points @compnerd - yeah, if it’s mostly common UI down to the raw log file, that’s OK enough for me.

I mean even within buildbot we have varied quality at that level - some configurations use more distinct buildbot level actions, while the sanitizers run the whole thing in a single shell script/one action which makes it harder to understand the logs.

But seems like we’d be no worse off with the GHA sort of solution, so fair enough.

1 Like

For post-merge CI, the things we don’t get out of the box on some system but is critical:

  • Dashboards (we could build it, but seems better to not maintain infrastructure ourselves): in particular the kind of view buildbot provide with all the changes and all the builders, and also the ability to see the history of the builds for a particular builder. This is critical to quickly find when a problem was introduced.
  • Blame list: when a build fail, which changes did it include since the previous build?
  • Email notification to the author (and committer?) when something breaks (and not when the build was already broken…).

I don’t know how much GitHub Action provide these days, historically it wasn’t providing much from this point of view as far as I saw (but my experience with it is pretty outdated).

1 Like

For presubmit it is less of an issue, even though I would say that whatever config runs in a particular pre-merge should also exist in post-merge, it should very easy to see the status at HEAD (and the history) for a config failing pre-merge testing.
These post-submit configs are also likely to be high-priority to keep “green” (as they affect subsequent pull-requests).

1 Like

I think that GHA are one aspect, but the other aspect is the GH UI itself. The checks page on the PR allows you to track the status of a build both pre and post commit. You can go to a particular commit which triggered a set of CI checks and will list the checks that ran and the results of them. This effectively gives you a singular view of pre and post execution of the checks. So this should alleviate the concern about the pre-commit vs post-commit checks.

I am not sure I see the value of a dashboard for the current state - that is effectively the state of the checks at HEAD. However, this is something that the GH UI also is able to provide through the checks system. It provides a green :heavy_check_mark: or a red :negative_squared_cross_mark: to indicate the build status. If you click on it it will list the full set of checks effectively giving you the benefits of the dashboard.

To be clear, I am not arguing that this is all obvious or intuitive or could not be improved by different tooling. There is clearly going to be adjustments for all of us as it is a different UX. I am just not sure I am convinced that the time spent by most of us is best utilised on such a project. If someone is motivated and wishes to do this as a separate project, it would be fine. But spending LLVM resources on this is what I am questioning.

As to the blame list … yeah, that would be nice, as the only way that I currently know of to find the regression point is to just scroll through the commit statuses.

For the record, I totally agree with this statement. This isn’t at all what I was proposing. I was just asking if people know of a good way to aggregate, and gave my feeble attempt as an example of what I mean.

Correct, however, a single green/red mark is useless when there’s always broken bots reporting in. We’ll literally have 100% red marks.

My view is that GH UI is useful for:

  1. Pre-merge checks (so that we can force branch protection to only merge when all pre-checks are green). We need extremely high quality and quick tests only for this one.
  2. High quality post-merge checks that are mostly green and only break when a new commit breaks things.

These two categories do not cover all tests that we want to do, not even the ones we want to make noise. There is a third category, slower but stable bots, from less popular targets, or build in less popular ways, that we really want those to be green, but given speed, their reporting will be a bit off.

Another concern is adding/removing bots from the “high quality” pool. If a bot is noisy and we want to silence it or remove it from the GH marks, someone with admin rights has to change the branch protection. This is a heavy hammer, prone to mishaps, on the main branch.

I would create a tiered class of buildbots, and only the highest quality, most stable over time, can be in the GH marks/PRs. There could be a graduating scheme, but that’s a slow, months-long process, etc. so that’s only done sparingly.

But there are still a lot of other useful bots, even if they’re silent, or slow, because they may test things that not all others do. Those, if we cut most out of the GH UI, will effectively never be looked at. I want to give them a platform in which to be useful without impairing our testing quality.

I also want to allow people/companies to test in the best way they can and share that with the community, and that may very well not be Buildbots (or whatever).

1 Like

Rust calls it platform support:
https://doc.rust-lang.org/nightly/rustc/platform-support.html
e.g. x86_64-apple-darwin is Tier 1. The buildbots must be green, otherwise revert immediately. There will be for each release binaries.
powerpc64-ibm-aix is Tier 3. If the buildbot is red, so what.

I believe that GitHub Actions are great for pre-merge checks. For post-CI buildbot is better tool.

1 Like