[RFC] Add a warning when bypassing the premerge testing

mehdi_amini · March 12, 2024, 2:29am

Hi,

Following-up on:

I’d like to re-enable partially the checks that Tom added and that were reverted.

Since there was some issues with the code-formatter and the “changes requests” part of the checks last time, I’d like to leave them out and focus on the pre-merge testing in this RFC (see below).

Goal

The goal is to reduce the amount of breakage of our bots, and in particular keep the pre-merge config as green as possible.

What is the change

First what this isn’t: there is no policy change, no existing practice should be affected. We can add as much documentation to make this explicit (suggestions welcome).

The goal here is to avoid people breaking the build inadvertently. As a bot maintainer I find every week a few PRs that lands with a broken pre-merge testing. The author often just didn’t notice that the build was broken. This is time consuming as I end up chasing which PR to revert and have to do it. It is also disruptive for everyone who try to open a PR rebased at HEAD when it is broken.

GitHub has a feature to help preventing this: when clicking on the merge button it can display a warning before allowing to merge:

This does not prevent merging: one can still inspect the pre-merge failure, looks at the buildbot, and consider that the build is already broken at HEAD in the same way and merge. The actually change is this one extra box to tick before merging!

Also we’re updating the developer guide, see the proposed changes.

Previous problems that led to the rollback / Why is this different?

The change implemented by @tstellar included two other checks:

Warn on the code formatter (clang-format): this created concerns because of false positive. Let’s keep this out-of-scope for the sake of making progress: we’ll open another RFC for the code formatter.
Warn on trying to merge without getting approval (this was brought up by @rengolin in Block Github merge on changes requested?)
This was added with a side-effect: to be able to enable the warning when someone wants to merge without approval, we had to also tick the box which says “Require a pull request before merging”. This didn’t prevent direct pushes to main, but added the following annoying message:

This made the author of the push think they did something wrong by pushing to main without a pull-request.
We can’t get around removing this message because the way GitHub is structured.
So similarly to the code formatter, for the sake of making progress on the CI checks, I propose that we keep this for another RFC and keep this focused exclusively on the CI pre-merge check.

We will still get as a side effect of this that folks who push directly to main will see this message from git:

remote: Resolving deltas: 100% (1/1), completed with 1 local object.
remote: Bypassed rule violations for refs/heads/main:
remote:
remote: - Required status check “buildkite/github-pull-requests” is expected.

Pinging some folks who chimed in the previous discussion: @tstellar @rengolin @preames @AaronBallman @clattner @rnk

jh7370 · March 12, 2024, 8:32am

As someone who doesn’t contribute their own PRs these days, the impact on me would be the tiny number of occasions when I have to merge a new contributor’s PR, so please take my opinion with a pinch of salt.

I wonder whether comments like those posted upon failure of the clang-format check would work even better than the checkbox (with the added advantage that the user doesn’t have to explicitly tick the box on an unrelated failure). The advantage is that these will get emailed to the user (assuming that they have notification emails enabled, which I expect is the norm). If these comments (and emails) could contain the list of tests that failed, even better. I think part of the problem with the current system is that a user doesn’t get notified by a failing check, so unless they explicitly look at the appropriate section in a PR, they’ll not even realise that there was one. Even if they did see one, they’re likely to assume it’s an unrelated failure, since (based on PRs I’ve looked at) this seems to be the case 90% or more of the time. A comment in the thread flags it up to both the PR author and any reviewers.

nikic · March 12, 2024, 8:49am

For reference, this is how this currently looks like:

If pre-merge tests passed:
checks_passed

If pre-merge checks failed:
checks_failed

Regarding point two in “previous problems”: Wouldn’t enabling a required status check still produce a minor variation of that message when pushing to main? I haven’t tried it, but I’d expect to see something like this:

Bypassed rule violations for refs/heads/main:

- 1 of 1 required status checks are expected.

rengolin · March 12, 2024, 10:25am

I strongly support this.

This is not about forcing people to do anything, just not giving them a green button when there are errors. People will make the right choices on their own, but we have to give them the right tools.

Right, this was my first reaction, too. I’ve learned to ignore it, but first authors are the ones most likely to panic.

I’d be happy with any movement towards making buildbots green, even if we tackle each one of these points at a time.

AaronBallman · March 12, 2024, 2:01pm

I think this seems reasonable given the contentious bits have been removed; I’ve definitely seen HEAD get into a bad state from pushing things that were caught by precommit CI. But at the same time, Precommit CI has false positives that make me worry for how annoying this will be in practice. For example, I ran into this one this morning: (review) [Clang] [Parser] Support [[omp::assume]] by Sirraide · Pull Request #84582 · llvm/llvm-project · GitHub (builtkite claiming it failed) Github pull requests #46165 (actual problem was entirely unrelated to changes in the PR) clang-ci #13755 It would be nice to resolve this sort of thing before we move forward with the stricter controls around merging, but that said, I think the benefit outweighs the annoyance if we wanted to move forward with this first.

preames · March 12, 2024, 2:35pm

I am unopposed at the moment. You seem to have addressed the major concerns from the last time. I’m not entirely convinced that this will improve the situation in the way you hope, but I’m certainly fine with the attempt being made. As usual, if there are unexpected impacts, we’ll reevaluate.

jyknight · March 12, 2024, 3:39pm

Yes, we should do this. Then we should also as a quick follow-on do the same (warn but don’t prohibit) for missing approvals.

Given that (AFAICT) the problems last time were NOT really with the change itself, but rather based on a misunderstanding/miscommunication as to expectations, should we explicitly document this in the developer docs before implementing it?

mehdi_amini · March 12, 2024, 7:58pm

You’re right! I thought this was triggered by the “requires a pull-request” check, but it actually applies to the other checks in isolation. I edited the original message to mention this, thanks for catching this!

mehdi_amini · March 12, 2024, 8:01pm

As far as I can tell, this isn’t a “false positive”, it is the “build is broken at HEAD” isn’t it? If so, it kind of makes my point: someone broke the test by merging with a broken CI before which then gets noisy for everyone else

rnk · March 12, 2024, 11:01pm

We should do this. This change is entirely advisory, and as you say, there is no policy change. This is really just nudging people to wait for premerge tests to finish and pass before they merge.

I recall there was some discussion of a “merge queue”-like feature, where you push a button, and your change lands after tests pass. Do we have that already, does this move us closer to having that capability, or do we have to do a lot more to get there?

mehdi_amini · March 12, 2024, 11:19pm

I was planning to bring this in the next RFC, trying to lock-in some small incremental progress

rengolin · March 13, 2024, 10:57am

IIUC, this isn’t a nice as it could be. I think what it does is just merge if the current tests pass. It will not try to test again when main starts passing, though that would be a really cool feature.

Currently, we’d still need to trigger CI again on every PR failing because main was broken, once it gets fixed. We do that on our repo, at least.

AaronBallman · March 13, 2024, 12:49pm

No, the project is not broken at HEAD because the project is not required to be fully clang-format compliant and have no trailing whitespace. This is disruptive for code reviewers because the patch in question was being flagged as failing precommit CI due to totally unrelated changes and I’m suggesting that making that kind of false positive “failure” more prominent is not helpful (what would be more helpful is using git-clang-format to only check the changes from the PR).

That said, I still think we should go forward with the change you are suggesting as I don’t think it will be so disruptive as to make the current situation significantly worse. But hopefully we can also fix these sort of false-positives in the near future (for example, we have test files that purposefully do not follow clang-format rules and never will, so checking formatting for only the changed lines in a patch is a behavior we need regardless of whether we format the “whole” codebase in the future or not).

pogo59 · March 13, 2024, 1:14pm

At the risk of diverging from the main point:

I was under the impression that /test/* files were not checked. Maybe I’m wrong, and the clang-format hook should be changed. If these files are in a context that should be checked, could they be marked as clang-format off?

In any case, checking only changed lines is preferable. Checking a whole file will encourage people to include random formatting fixes in their functional patch, which is something we explicitly do not want.

rengolin · March 13, 2024, 1:14pm

For the particular case of clang-format, I agree. We do not have that requirement and other people’s formatting issues should not stop your patch from being merged.

Especially because clang-format doesn’t have stability guarantees. In my projects, when we have that requirement, we force people to always use the same version and make it a required step into ninja check-all. We don’t have that, let’s not add this to Github’s requirements.

Here, I think we can enable the clang-format action to run on all PRs but not add it to the branch protection as a requirement. So it always runs, but never blocks the merge.

This is not particularly hard to do, but it is clumsy and LLVM is a very large project, with very different sub-projects. I’d decouple this effort from even the other merge checks and postpone it to a time where we’re all happy with our merge checks.

AaronBallman · March 13, 2024, 1:23pm

Personally, I’d strongly prefer to keep clang-format comments out of the source tree (they’re distracting and it’s a slippery slope to allowing other markings like NOLINT, etc). Either we format the file or we don’t format the file, but I don’t know that we should encourage “special situations” in source. That said, not checking the formatting of files in the test directory makes a lot of sense to me (same for docs, www, unittests, *.td files, *.txt files, and others).

+1, and it also eats up (a tiny amount of) CI resources that could be better spent elsewhere compared to checking only changed lines.

mehdi_amini · March 13, 2024, 7:54pm

Sorry but I don’t agree with your characterization right now.
From the point of view of the check, the project is broken at HEAD. You disagree with the check, but that is a different discussion…
This is specific to the clang-ci and if you think this isn’t the right check, please send a patch for llvm-project/clang/utils/ci/run-buildbot at 744a23f24b08e8b988b176173c433d64761e66b3 · llvm/llvm-project · GitHub
This is orthogonal to the discussion here IMO.

This is the very definition of “the project is broken at HEAD”, and I’ll repeat: this is exactly what this RFC is helping to address: the “unrelated changes” broke the presubmit and other people see the failures on their PR.

This isn’t a clang-format failure, this is a clang-CI failure.

To be clear this RFC does not elevate clang-format checks to be warnings here, only the buildkite failure.
Clang somehow diverges from the rest of the project (not sure why) and someone decided that adding some formatting change to clang-ci (this is not clang-format) is a good idea, I’m not sure why either. You’re pretty involved in clang I believe, you should know better than I do what’s right to do in clang-ci

JonChesterfield · March 14, 2024, 12:30am

My impression is that pre-merge testing is usually broken, though maybe it’s unusually bad in the GPU/cross-compiling world. Just checked the status of one of my PRs and it’s failing the pre-merge on windows falling over:

CMake Error at C:/BuildTools/Common7/IDE/CommonExtensions/Microsoft/CMake/CMake/share/cmake-3.20/Modules/CMakeDetermineCompilerId.cmake:777 (file):
file STRINGS file
“C:/ws/src/build/build-clang-windows/CMakeFiles/3.20.21032501-MSVC_2/CompilerIdC/CMakeCCompilerId.exe”
cannot be read.

Drawing more attention to something which is usually spuriously broken is a bad thing - it teaches people to ignore warnings. Much like that message from GitHub on commit does.

mehdi_amini · March 14, 2024, 1:53am

Can you link your PR? This looks like a bot problem! @lnihlen would likely want to know about such failures.

I tracked closely the “greenness” of these config for the last two weeks now and I don’t see any excessive failures so far.
I’m also getting an email for every failure of these bots at HEAD now:

Builders/premerge-monolithic-linux
Builders/premerge-monolithic-windows

(For pull-requests, the history can be explored here: buildkite pull-requests)

nikic · March 15, 2024, 9:24am

I’m generally fine with this change, but I do want to see a developer policy (or other documentation) update land ahead of it, that specifies:

It’s fine to ignore the bypass message when pushing to the repo – you did nothing wrong, it’s just GitHub being dumb!
It’s fine to bypass the branch protection when merging, if the pre-merge failure looks unrelated to the change.
Finally – this point might be controversial, but this is why it’s important to clarify – it’s fine to bypass the branch protection if the Linux buildkite job succeeded, and Windows is still pending. I think people should be able to do this in good conscience, as it’s not reasonable to delay merges for many hours to wait on Windows results.

Topic		Replies	Views
Block Github merge on changes requested? GitHub	44	1362	January 25, 2024
Pre-merge checks updates May 2023 Project Infrastructure	6	393	May 15, 2023
Pre-merge testing and transition to PRs Project Infrastructure	0	161	August 31, 2023
Pre-merge testing, running checks on main branch Infrastructure Working Group	5	514	February 8, 2022
Pre-merge testing enabled by default LLVM Dev List Archives	1	79	February 28, 2020

[RFC] Add a warning when bypassing the premerge testing

Goal

What is the change

Previous problems that led to the rollback / Why is this different?

Related Topics