RFC: Prototyping Pre-commit Testing Using Buildbot

First up, this is more of a call for interest than comments on particular work. You’ll see why.

In my presentation at the last US LLVM dev (https://llvm.org/devmtg/2022-11/slides/TechTalk5-WhatDoesItTakeToRunLLVMBuildbots.pdf) I ended with a proposal to move llvm’s testing into pre-commit.

There was a decent amount of interest (some recorded in the notes for the subsequent roundtable Buildbots roundtable notes US LLVM Dev 2022).

Therefore I had planned to have something for folks to look at early this year and start a discussion about the next steps.

Unfortunately due to my personal schedule and other work commitments I haven’t been able to do that and I won’t be able to work on this until around July at the earliest. And ideally, we’d have something prototyped by around the time of the switch to Github PRs late in the year.

So that I am not blocking anyone else, here’s what I had intended to try. I appreciate there will be a lot of debate (perhaps too much) over vague plans like this, but I’d rather be open so others can work on it if they wish.

Prototyping Pre-Commit

We already have build bots and there are other projects that use them in pre-commit. For example, Buildbot itself (though I am not 100% sure because Github has hidden the older checks for example here Github hook: Include "previous filename" key in list of files by DavidSpickett · Pull Request #6807 · buildbot/buildbot · GitHub).

Even if no one has done it, I’m sure we can use a webhook to trigger from Github somehow.

Most importantly, pre-commit is basically an extension of builds on demand. So the first thing I’d build is a way to manually ask a bot to build a particular PR. Then build on top of that.

The minimum demo system would be:

  • Some simple repo with a single python script that for example prints the value of an environment variable (so you know it ran on the right machine).
  • A few buildbots with different values of that variable set.
  • A command on the Github side that can trigger a build on a specific bot.
  • Some way to report those results back to Github. A URL at first, a Github native status box even better.

Then you can experiment by saying “/buildbot build buildbot_with_1” and seeing results on the PR.

Perhaps later one could add commands like “/buildbot owner buildbot_with_1”, quality of life features.

There’s a whole mess here about access control and preventing effectively denial of service attacks using this, but I figure that’s for a later stage (and Github does have some features for that e.g. a maintainer must approve checks for first time contributors).

At some point you could start running it on a fork of llvm to approximate the real experience.

Resources Needed

You would need:

  • A Github repo.
  • A Buildbot master.
  • A few build bots connected to that (I would expect that it’s possible to make all these docker containers on a single host).

Reference Points

This is again vague because I haven’t had the time to do the research, but:

Conclusions

I think it would be relatively easy to prototype an on demand bot system for Github PRs, but I don’t have the time to do the work for the next few months. If anyone else agrees and has the time, I don’t want to stand in their way.

I don’t think that putting any effort into Phabriactor makes sense right now either, Github is the future.

There are alternatives to Buildbot such as Buildkite, used very successfully by libcxx. I personally wanted to explore keeping the existing infrastructure first, but ultimately have not made up my mind on that. I welcome someone trying Github PR → Buildkite integration (in fact it probably has official plugins).

I am still very much into the idea of pre-commit testing, so I will put time in where I can find it. Including pursuing this proposal if it has not progressed by that time.

So let me know:

  • What do you think of the proposed first prototype step?
  • Do you have other things you’d rather try?
  • Do you already use something like this in another project?
  • Would the pre-commit topic benefit from a wider discussion now, or later when there are prototypes we can reference?

cc @luken-google @sqlbyme who I had talked to over email about this.

3 Likes

Thanks for working on this! It has the potential of being greatly impactful.

There is a major benefit from using buildbot (assuming it can be done “as conveniently” as buildkite or other): we don’t fragment the CI and bot owner can setup one infra for pre and post merge testing.

Something not clear to me with your experiment is how would one make an existing post-commit config available for pre-merge testing? Alternatively: in the future how would we create and maintain config to not duplicate the work between pre/post merge environments?
(I assume you can’t just reuse a post-merge “buildbot builder” for pre-merge testing because it would mess with the history, notification, blamelist, etc.)

Yes. My guess is that pre-commit bots will also need more resource and using a separate builder will help there too (so you don’t block post commit builds).

Speaking of, @kbeyls you have some scripting for Phabricator I think. Would it be possible to get an idea from that of the average number of reviews and updates to those reviews across the last months/year?

For example if Linaro wanted clang-armv8-quick to also run in pre-commit, we would add a clang-armv8-quick-pre-commit builder to zorg. To prevent duplicating the config we can reference the existing bot’s config in this file https://github.com/llvm/llvm-zorg/blob/main/buildbot/osuosl/master/config/builders.py#L32.

We do want very high reliability from any final system, so the configuration may end up very different in the end. That’s a question we can answer by running all this against a fork of llvm later on.

Hi there,

I must say that I’m happy to see a tendency towards Buildbot. As you know I’ve talked about a PoC a while ago. While that PoC was fun I came up with another PoC that I work on right now and maybe there’s some interest in talking about this at the EuroLLVM in Glasgow if anyone is going. I am.

The only remainder of my previous PoC are the locally running containers for a buildbot master and some workers. I decided to build a simple Github App that I’m going to share here as soon as it is a bit more ready and working. :wink:

In the diagram below I’ve sketched out the architecture of my approach. I have a github repo foo/bar (think llvm/llvm-project) that I’ve installed my app in. Installing a Github app doesn’t mean that you have to create the app. It’s a bit weird but also cool. Having a github app in github.com basically only means you let github know what permissions your app needs and which events it is going to receive. Speaking of receiving, you need to tell github where to send webhook events. For this purpose I’m currently using ngrok as a way to receive events from github on my local developer machine. Et voilà, you are in control of everything you want, which is access to buildbots and workers and you can talk back to github (as an app and not as a user!).

If you want to deploy something like this you essentially only need to deploy the red box and tell github to send events to that “box”, aka its service URL.

As for the developer machine it can be set up with podman or docker quite easily.

Oh, btw. the red box below essentially a web server listening locally on port 8081 and handles various event payloads to distinguish the various github events that can arrive (e.g. OnIssueComment or OnPullRequestReviewComment).

I hope you like this idea and I look forward to feedback.

I like it a lot, it’s a good chunk of the work done already!

I am not but some of my colleagues are, anyone from Linaro will be happy to discuss it for sure.

So the only difference between “pre-commit” and the manual /buildbot buildon whatever is what is sent via the webhook? Very neat. I guess in future that webhook destination could run in the llvm lab.

Have you also looked at having custom commands on the Github side, like /buildbot ...? Perhaps we can learn from our existing release automation here (done by Tom Stellard, I think?).

Thank you.

I look forward to it.

Github sends payloads to the webhook that finally terminates in the github app on my developer machine. It doesn’t differentiate between a comment like /buildbot ... on a PR and the creation of the PR itself. Both result in payloads being sent to the github app. Hence, if the github app listens to new PRs being created it can test them without any manual intervention but through the same underlying messaging.

In other words, it is a matter of parsing the webhook payload for a string like /buildbot .... This is what @tstellar is doing in the release mgmt as well. Creating new commands is done like this.

I hope this clarifies some of your questions.

It does. The Github commands are a lot more free-form than I thought, pretty exciting what you could do with that. Plus we can manage the source of it just like we do llvm-zorg today without everyone needing to be some kind of admin on the Github side.

If you are able to share the setup scripts for your proof of concept it could be the base for working on that.

In terms of some of the existing practice for LLVM-based compilers perhaps the following can be worth looking at:

Swift: Swift.org - Swift Continuous Integration

Rust: Contribution Procedures - Rust Compiler Development Guide

r+
After someone has reviewed your pull request, they will leave an annotation on the pull request with an r+. It will look something like this:
@bors r+
This tells @bors, our lovable integration bot, that your pull request has been approved. The PR then enters the merge queue, where @bors will run all the tests on every platform we support. If it all works out, @bors will merge your code into master and close the pull request.

Rust: How Rust is tested

In Rust we do CI in a very particular way, and one which we are very proud of. Rust’s creator, Graydon, originally described it in a blog post, “The Not Rocket Science Rule”. The thing we do differently from most is that we run the full test suite against every patch, as if it were merged to master, before committing it into the master branch, whereas most CI setups test after committing, or if they do run tests against every PR, they do so before merging, leaving open the possibility of regressions introduced during the merge.

https://graydon2.dreamwidth.org/1597.html

The Not Rocket Science Rule Of Software Engineering:
automatically maintain a repository of code that always passes all the tests

  • Bors-NG implements a continuous-testing workflow where the main branch never breaks. It integrates GitHub pull requests with a tool like GitHub Actions that runs your tests.
    https://github.com/bors-ng/bors-ng
  • Homu: A bot that integrates with GitHub and your favorite continuous integration service
    https://github.com/servo/homu

I know of other organizations using the not-rocket-science tactic. For at least one of them it’s optional; but people who don’t use it, and break the build, are subjected to serious razzing.

The main prerequisite for this, though, is a build/test cycle that can keep pace with the rate of new commits. The LLVM project overall gets, what, 40-50 per day? So if the bot cycle takes longer than, say, 15 minutes, it won’t be able to keep up. Even a 15-minute pace will tend to get backed up during the most active parts of the day.

1 Like

Doesn’t this just mean that there’s a gap between an approved patch before it hits the main branch? If you depend on changes that you’ve just got approved to hit main I’d say that this is something every developer can cope with by having a patch locally on which you base your next change.

And don’t we actually save ourselves some turnarounds if we avoid false positive build statuses on patches that only fail because something in main is broken?

1 Like

Hi @MattPD we can absolutely implement things like this with the approach I’ve presented. Like I’ve mentined before: If our github app inspects the payload of comments and looks for @bors r+ it can ask buildbot to do whatever we want. So it’s really only a matter of our imagination what practices we want to copy or try out.

Like @DavidSpickett mentioned here RFC: Prototyping Pre-commit Testing Using Buildbot, we should start with a simple use case of explicitly asking for a buildbot worker to build a PR on. From there onwards it is a matter of generalizing the semantics of PR comment commands and conventions we want to follow.

For example if @bors +r shall mean to test a patch on all supported architectures, it is just a matter of asking a defined set of workers to build your patch on. I hope you see the beauty in the flexibility this brings us if we have an architecture similar to what I proposed.

And all that can be tested against a mock of the real API so people can hack on it as they please.

I was thinking more about promptness of feedback. Today if I make a mistake, likely one of the fast builders will nag me fairly quickly. I won’t have that if the build queue is backed up by several hours, and might have gone home for the day.

OTOH, with pre-commit qualification, if my commit gets bounced, there are no consequences to anyone else and the fail-mail is guaranteed to be specifically about my commit. So that’s a plus.

Thinking about how this interacts with slow builders: The slow builders obviously can’t be part of the prequalification step, or development will come to a screeching halt. One of the biggest complaints currently about the slow builders is that they (a) run into the same fails that the fast builders do, except up to a day later, and (b) they spam all authors in a batch about the fail. A prequalification scheme will mitigate (a), which means (b) will happen less often and be more specifically about the environments used by the slow builders (I’m mainly thinking about things like expensive-checks and sanitizer bots), so still spam but higher-quality spam, more likely to be actionable.

So, we’re back to a question that I don’t remember ever being resolved in past discussions: Just what are the criteria for the pre-qual builds and tests?

It’s not clear to me why not?
So far we haven’t discussed making anything mandatory, I’ve seen only references to using commands on the pull-request to trigger bots (or group of bots). If I broke a slow bot and my patch was reverted, it can be useful to me to be able to trigger this config before retrying to land my patch!

I was responding to the “not rocket science” “rust does full testing pre-commit” sub-thread, which basically was talking about being mandatory.

I agree that being able to ask for an arbitrary bot to run an arbitrary commit would be incredibly useful. Although it means letting people push branches, so the bot can find it; pushing branches is not something we’ve been too excited about.

We’re talking about PR testing: if there is a PR we don’t need another branch I believe: the commit can be reached in the repo explicitly (also I’m not sure what would be different about the “on-demand” vs “mandatory” from this point of view of having a branch?)

Thanks I missed this: I’m quite strongly opposed to shoot for a mandatory flow before it proves itself as an opt-in system (we don’t even have pre-merge code review mandatory right now!).

Agreed, trust in the system is easily lost.

We start on a test project, then try an llvm fork, then an opt in group on the real llvm and so on. Anyone (like me) who is into the idea can put their time on the line to prove that it is useful (and fix it when it isn’t).

2 Likes

The project gets around 100 commits per day.

Speaking of, @kbeyls you have some scripting for Phabricator I think. Would it be possible to get an idea from that of the average number of reviews and updates to those reviews across the last months/year?

I’ve written a script, and for 2022, it seems the average of number of diff updates per hour is as follows:

For the year 2022, number of new or updated diffs on reviews.llvm.org per hour:
0h UTC: 16.9 diffs per hour on average, i.e. roughly one diff update every 3.5 minutes
1h UTC: 14.1 diffs per hour on average, i.e. roughly one diff update every 4.3 minutes
2h UTC: 11.9 diffs per hour on average, i.e. roughly one diff update every 5.0 minutes
3h UTC: 10.2 diffs per hour on average, i.e. roughly one diff update every 5.9 minutes
4h UTC: 9.6 diffs per hour on average, i.e. roughly one diff update every 6.2 minutes
5h UTC: 6.9 diffs per hour on average, i.e. roughly one diff update every 8.6 minutes
6h UTC: 7.0 diffs per hour on average, i.e. roughly one diff update every 8.6 minutes
7h UTC: 8.0 diffs per hour on average, i.e. roughly one diff update every 7.5 minutes
8h UTC: 8.9 diffs per hour on average, i.e. roughly one diff update every 6.7 minutes
9h UTC: 12.1 diffs per hour on average, i.e. roughly one diff update every 5.0 minutes
10h UTC: 14.2 diffs per hour on average, i.e. roughly one diff update every 4.2 minutes
11h UTC: 15.1 diffs per hour on average, i.e. roughly one diff update every 4.0 minutes
12h UTC: 12.2 diffs per hour on average, i.e. roughly one diff update every 4.9 minutes
13h UTC: 13.5 diffs per hour on average, i.e. roughly one diff update every 4.4 minutes
14h UTC: 15.3 diffs per hour on average, i.e. roughly one diff update every 3.9 minutes
15h UTC: 17.6 diffs per hour on average, i.e. roughly one diff update every 3.4 minutes
16h UTC: 20.4 diffs per hour on average, i.e. roughly one diff update every 2.9 minutes
17h UTC: 22.6 diffs per hour on average, i.e. roughly one diff update every 2.7 minutes
18h UTC: 21.4 diffs per hour on average, i.e. roughly one diff update every 2.8 minutes
19h UTC: 23.0 diffs per hour on average, i.e. roughly one diff update every 2.6 minutes
20h UTC: 21.9 diffs per hour on average, i.e. roughly one diff update every 2.7 minutes
21h UTC: 20.2 diffs per hour on average, i.e. roughly one diff update every 3.0 minutes
22h UTC: 21.3 diffs per hour on average, i.e. roughly one diff update every 2.8 minutes
23h UTC: 20.0 diffs per hour on average, i.e. roughly one diff update every 3.0 minutes

I think that during the busy times of the day, a precommit bot, to be able to keep up with the influx of new diffs to be tested, should have a turn-around time of not much more than 2 minutes per diff, assuming it can test only 1 diff concurrently?

Does that help?

Two aspects are worth noting: I may iterate with 5 to 10 revisions on phab today before landing, so each commit in main may need potentially 10 run of the CI.

Another aspect in the other direction is that bots aren’t all testing all the commits, for example llvm bots don’t get triggered on changes to mlir or to flang.
So we should look at bots statistics instead of git log here.