Clarification on expectations of buildbot email notifications

Hi all,

Over the past year or so, all of us have broken the buildbots on many occasions. Usually we get notified on IRC, or via an buildbot email notification sent to everyone on the blamelist.
If I happen to be on IRC I’ll see the notification, but if not, the next best thing is an email that was automatically sent to me (along with everyone else on the blamelist) from the buildbot with information about the failure.
And then finally, I’ll occasionally get a response to my commit message telling me that it’s broken, and the patch may be reverted with information in the commit message explaining which bot was broken and providing a link to it.

However, we have some buildbots on the public waterfall which are specifically configured not to send emails to people. In some cases it’s because the bots are experimental, but there are a handful where the reasoning I’ve been given is that it “wastes peoples time and contributes to build blindness”, but we are still expected to keep them green (usually by people manually reaching out to us when they fail, or patches getting reverted and us getting notified of the revert).

It is this last case that I’m concerned about, as it appears to be in direct conflict with our own developer policy [https://llvm.org/docs/DeveloperPolicy.html#id14], which states this

Hi all,

Over the past year or so, all of us have broken the buildbots on many occasions. Usually we get notified on IRC, or via an buildbot email notification sent to everyone on the blamelist.
If I happen to be on IRC I'll see the notification, but if not, the next best thing is an email that was automatically sent to me (along with everyone else on the blamelist) from the buildbot with information about the failure.
And then finally, I'll occasionally get a response to my commit message telling me that it's broken, and the patch may be reverted with information in the commit message explaining which bot was broken and providing a link to it.

However, we have some buildbots on the public waterfall which are specifically configured not to send emails to people. In some cases it's because the bots are experimental, but there are a handful where the reasoning I've been given is that it "wastes peoples time and contributes to build blindness", but we are still expected to keep them green (usually by people manually reaching out to us when they fail, or patches getting reverted and us getting notified of the revert).

It is this last case that I'm concerned about, as it appears to be in direct conflict with our own developer policy [https://llvm.org/docs/DeveloperPolicy.html#id14], which states this
-----
We prefer for this to be handled before submission but understand that it isn’t possible to test all of this for every submission. Our build bots and nightly testing infrastructure normally finds these problems. A good rule of thumb is to check the nightly testers for regressions the day after your change. Build bots will directly email you if a group of commits that included yours caused a failure. You are expected to check the build bot messages to see if they are your fault and, if so, fix the breakage.

Commits that violate these quality standards (e.g. are very broken) may be reverted. This is necessary when the change blocks other developers from making progress. The developer is welcome to re-commit the change after the problem has been fixed.

-----

I'm sending this email to get a sense of the community's views on this matter. If I'm correctly reading between the lines in the above passage, buildbots which do not send emails should not be subject to the revert-to-green policy. To be honest, it's actually not even clear from reading the above passage where the burden of fixing a "broken" patch on a silent buildbot lies at all - with the patch author or with the bot maintainer.

Would anyone care to weigh in with an unbiased opinion here?

I think for any regressions whether they affect buildbots or not, the
patch author should be responsible for fixing the issue. In my experience,
this is also usually what happens when I report a regression.

I think the responsibility of the regression reporter (which could be
a buildbot) is to provide a reasonably prompt notification of failure along
with clear instructions for reproducing the issue. If that criteria is met,
then I think it is always fair to ask for a revert if the issue can't be fixed
in a few days.

Even without a prompt notification, though, I still think reverts are appropriate in
some cases and that the patch author should take the lead in fixing the issue.

The buildbots that automatically send notifications,though, are a little
bit of a special case, because when they are broken it affects everybody.
I think for those bots, having an immediate revert to green policy, like
we do now is fine.

I don't think it's reasonable to expect developers to monitor nightly testers,
so maybe this part of the developer policy should be changed. There is
almost always at least one one bot that is red.

-Tom

I don’t think whether a buildbot sends email should have anything to do with whether we revert to green or not. Very often, developers commit patches that cause regressions not caught by our buildbots. If the regression is severe enough, then I think community members have the right, and perhaps responsibility, to revert the change that caused it. Our team maintains bots that build chrome with trunk versions of clang, and we identify many regressions this way and end up doing many reverts as a result. I think it’s important to continue this practice so that we don’t let multiple regressions pile up.

I think what’s important, and what we should, after this discussion concludes, put in the developer policy, is that the person doing the revert has the responsibility to do their best to help the patch author reproduce the problem or at least understand the bug.

This can take many forms. They can link directly to an LLVM buildbot, which should be self-explanatory as far as reproduction goes. It can be an unreduced crash report. If they’re nice, they can use CReduce to make it smaller. But, a reverter can’t just say “Revert rNNN, breaks $RANDOM_PROJECT on x86_64-linux-gu”. If they add, “reduction forthcoming” and they deliver on that promise, I think we should support that.

In other words, the bar to revert should be low, so we can do it fast and save downstream consumers time and effort. If someone isn’t making a good faith effort to follow up after a revert, then authors have a right to push back.

I agree with Paul that we should remove the text about checking nightly builders. That suggestion seems a bit dated.

I think we could/should be a little bit more precise here:

… any regressions whether they affect buildbots or not, the
patch author should be responsible for fixing the issue.

especially if we say that the bar for a revert is low. That is, the “any regression” needs a bit more clarifications. Assuming we are talking about performance regressions (not language conformance or correctness):

  1. We sometimes see regressions where code generation is (almost) the same, but the code layout is different. Some micro-architectures are more sensitive to this than others, causing significant regressions. We always thought it was unfair to ask for a revert for these kind of regressions, and thus never ask for that.

  2. We also sometimes see that patches that cause regressions actually do the right thing, but have all sorts of knock on effects e.g. causing different codegen and regressions. Sometimes this is just unlucky (e.g. regalloc making different decisions), but sometimes other passes can’t handle the IR or machine code less efficient and something need to be actually fixed. But we also very rarely ask for a revert in these cases.

  3. The obvious and straightforward case is when a patch is not doing the right thing or e.g. forgets certain cases. Usually what we do is leave a comment on the Phab ticket, and when the author responds fast and works on a fix we can live with the regression for a few days (but it looks like we could be a bit more aggressive with reverts if we wanted to).

The straightforward cases are 1) and 3), where the former is not worth a revert (but it would be good to be explicit about this), and 3) is definitely worth a revert.

  1. is the tricky one, because it has a lot of grey areas. I guess the reason why we are not very aggressive with reverts is that we don’t want to stop others from making progress, and also thought that in some cases it was just our problem and not the author’s. In the example of knock on effects and some heuristic making a different/wrong decision, I thought it was unfair to the author to ask for a revert. A more aggressive revert policy here could easily lead to people not making any progress or a lot less fast.

Cheers,
Sjoerd.

I don’t think whether a buildbot sends email should have anything to do with whether we revert to green or not. Very often, developers commit patches that cause regressions not caught by our buildbots. If the regression is severe enough, then I think community members have the right, and perhaps responsibility, to revert the change that caused it. Our team maintains bots that build chrome with trunk versions of clang, and we identify many regressions this way and end up doing many reverts as a result. I think it’s important to continue this practice so that we don’t let multiple regressions pile up.

I think what’s important, and what we should, after this discussion concludes, put in the developer policy, is that the person doing the revert has the responsibility to do their best to help the patch author reproduce the problem or at least understand the bug.

This can take many forms. They can link directly to an LLVM buildbot, which should be self-explanatory as far as reproduction goes. It can be an unreduced crash report. If they’re nice, they can use CReduce to make it smaller. But, a reverter can’t just say “Revert rNNN, breaks $RANDOM_PROJECT on x86_64-linux-gu”. If they add, “reduction forthcoming” and they deliver on that promise, I think we should support that.

In other words, the bar to revert should be low, so we can do it fast and save downstream consumers time and effort. If someone isn’t making a good faith effort to follow up after a revert, then authors have a right to push back.

I really strongly endorse this approach. This, IMO, is the crux of revert-to-green: somewhat regardless of the source of green vs. red, we need to revert quickly and with relatively low bar. The result of a revert is a shared obligation between reverter and author to find a path forward and Reid nicely outlines how the reverter can address their end of the bargain.

I want to emphasize that “quickly” here often (but definitely not always) needs to be much shorter than “a few days” or even “a day” due to the rate of incoming patches and the need to minimize compound failures hiding precise regression signal.

Anyways, +1 =]
-Chandler

I also agree with the approach, but was just trying to say that it’s perhaps not as simple as any regression. There are a lot of grey areas here, but I appreciate it is difficult to capture all of that in a policy. Reid’s proposal does indeed seem to capture the process well. IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

Reid said:

I don't think whether a buildbot sends email should have anything to do
with whether we revert to green or not. Very often, developers commit
patches that cause regressions not caught by our buildbots. If the
regression is severe enough, then I think community members have the
right, and perhaps responsibility, to revert the change that caused it.
Our team maintains bots that build chrome with trunk versions of clang,
and we identify many regressions this way and end up doing many reverts
as a result. I think it's important to continue this practice so that
we don't let multiple regressions pile up.

My team also has internal bots and we see breakages way more often than
we'd like. We are a bit reluctant to just go revert something, though,
and typically try to engage the patch author first.

Engaging the author has a couple of up-sides: it respects the author's
contribution and attention to the process; and once you've had to fix
a particular problem yourself (rather than someone else cleaning up
after your mess) you are less likely to repeat that mistake.

I think what's important, and what we should, after this discussion
concludes, put in the developer policy, is that the person doing the
revert has the responsibility to do their best to help the patch author
reproduce the problem or at least understand the bug.

This can take many forms. They can link directly to an LLVM buildbot,
which should be self-explanatory as far as reproduction goes. It can be
an unreduced crash report. If they're nice, they can use CReduce to make
it smaller. But, a reverter can't just say "Revert rNNN, breaks
$RANDOM_PROJECT on x86_64-linux-gu". If they add, "reduction forthcoming"
and they deliver on that promise, I think we should support that.

In other words, the bar to revert should be low, so we can do it fast
and save downstream consumers time and effort. If someone isn't making
a good faith effort to follow up after a revert, then authors have a
right to push back.

We have been on the wrong side of a revert where it was "this broke us"
and then nothing. I was inclined to just re-apply the patch, but that's
my "Mr Grumpy" avatar talking. How do we address failure to conform to the
community norms?

I agree with Paul that we should remove the text about checking nightly
builders. That suggestion seems a bit dated.

That was Tom Stellard, not me, but I agree with him.
--paulr

Reid said:

I don't think whether a buildbot sends email should have anything to do
with whether we revert to green or not. Very often, developers commit
patches that cause regressions not caught by our buildbots. If the
regression is severe enough, then I think community members have the
right, and perhaps responsibility, to revert the change that caused it.
Our team maintains bots that build chrome with trunk versions of clang,
and we identify many regressions this way and end up doing many reverts
as a result. I think it's important to continue this practice so that
we don't let multiple regressions pile up.

My team also has internal bots and we see breakages way more often than
we'd like. We are a bit reluctant to just go revert something, though,
and typically try to engage the patch author first.

Engaging the author has a couple of up-sides: it respects the author's
contribution and attention to the process; and once you've had to fix
a particular problem yourself (rather than someone else cleaning up
after your mess) you are less likely to repeat that mistake.

In my opinion engaging the author first is the best approach for internal
bots, and I think this should be captured in some way in the developer guide.

We don't want to send the message that is OK to revert patches at any
time just because one of your internal tests failed. In my experience
most community members do engage with the author first, like Paul describes,
so this isn't usually a problem, but new contributors may not be familiar
with this precedent.

-Tom

This is kind of what I was getting at with my original email, so thank you for wording it better than I did.

If we can agree that “contact the author first for internal bots” is better than “revert automatically, even for internal bots” (which may not be the case, I don’t want to speak for others), then the problem becomes one of defining what an “internal bot” is. In my opinion, an internal bot it is one that does does not satisfy both conditions: a) on the public waterfall, b) automatically sends notifications, but perhaps not everyone agrees with this definition.

Hi,

maybe another interesting case is when there is disagreement over
whether there is a regression.

For instance, one of my patches made clang emit an additional warning
when compiling a popular project. This was not intentional by my
patch, but due to an inconsistent implementation of the warning in
clang. However, the warning was legitimate. I reverted the patch (it
had another problem), but before I recommitted it, I put up a patch
for review that fixed the inconsistent implementation such that the
warning is always emitted.

My question here: Should the patch be reverted even if it did not have
the other problem?

Michael

Zachary Turner via llvm-dev <llvm-dev@lists.llvm.org> writes:

> Reid said:
>
>> I don't think whether a buildbot sends email should have anything to do
>> with whether we revert to green or not. Very often, developers commit
>> patches that cause regressions not caught by our buildbots. If the
>> regression is severe enough, then I think community members have the
>> right, and perhaps responsibility, to revert the change that caused it.
>> Our team maintains bots that build chrome with trunk versions of clang,
>> and we identify many regressions this way and end up doing many reverts
>> as a result. I think it's important to continue this practice so that
>> we don't let multiple regressions pile up.
>
> My team also has internal bots and we see breakages way more often than
> we'd like. We are a bit reluctant to just go revert something, though,
> and typically try to engage the patch author first.
>
> Engaging the author has a couple of up-sides: it respects the author's
> contribution and attention to the process; and once you've had to fix
> a particular problem yourself (rather than someone else cleaning up
> after your mess) you are less likely to repeat that mistake.
>

In my opinion engaging the author first is the best approach for internal
bots, and I think this should be captured in some way in the developer
guide.

We don't want to send the message that is OK to revert patches at any
time just because one of your internal tests failed. In my experience
most community members do engage with the author first, like Paul
describes,
so this isn't usually a problem, but new contributors may not be familiar
with this precedent.

-Tom

This is kind of what I was getting at with my original email, so thank you
for wording it better than I did.

If we can agree that "contact the author first for internal bots" is better
than "revert automatically, even for internal bots" (which may not be the
case, I don't want to speak for others), then the problem becomes one of
defining what an "internal bot" is. In my opinion, an internal bot it is
one that does does not satisfy both conditions: a) on the public waterfall,
b) automatically sends notifications, but perhaps not everyone agrees with
this definition.

I would argue that "internal vs external" is the wrong division here.
It does come up that internal bots or weird local configurations find
significant problems in practice sometimes, and reverting to green can
be completely reasonable for these cases. Obviously some discretion is
necessary, but reverting more or less any change that causes issues
severe enough to legitimately block you or seriously hamper your ability
to notice further issues is fair game in my eyes.

Like Reid says, the important part about reverting is the contract
between the person doing the revert and the original committer. When a
patch is reverted, the reverter has a responsibility for making sure the
originator has the means to fix whatever problem they found. Any revert
that isn't tied to a public bot absolutely must come with reasonable
instructions to reproduce the problem, and the reverter needs to
actively engage with the originator to get the patch back in in a form
that doesn't hit the problem.

I would say not necessarily. If a new warning is added that fires on a popular project and the warning is working as intended, that may be a signal that the warning shouldn’t be on by default (or shouldn’t be part of -Wall). We obviously need to allow ourselves the freedom to add new warnings over time. Just because a project uses “-Werror -Wall” doesn’t mean that their code will compile cleanly with new compilers. However, if the warning really is low value, then we may want to remove it. If someone wants to revert a new warning because it is too noisy or has false positives, they need to actively engage the patch author to support their position.

Elaborating on your question of what constitutes a regression in general, I would say that everything is case-by-case, but we have some clear cut ones, like miscompiles or new compiler crashes. People often misdiagnose UB as a miscompile, so to revert for a miscompile, you need some evidence that the code doesn’t exercise UB. Reverting for a performance regression would need to come with a strong rationale to motivate why the slowdown outweighs the supposed benefits of the patch in question.

+1 to what Chandler and Reid said

+1 to Justin's comment

The only standard for revert should be: it's broken, and here's a reproducer. Nothing else should matter.

... says the person w/a ton of internal regression testing which regularly finds crashes upstream, and is terrified of the implied effort of having to engage each author of a broken patch individually while being unable to ship or see what other breakage is coming in. Admittedly, self serving perspective. :slight_smile:

Philip

p.s. Speaking personally, I actually *prefer* having a patch of mine simply reverted w/o discussion when there's a problem. This way there's no expectation of timely response on my side, and if I happen to be out of office, or on vacation, I can get to it when I get back and have time. I do expect to be *informed* of the revert of course.

From: llvm-dev [mailto:llvm-dev-bounces@lists.llvm.org] On Behalf Of
Philip Reames via llvm-dev
Sent: Wednesday, February 20, 2019 11:26 PM
To: Justin Bogner; Zachary Turner via llvm-dev
Subject: Re: [llvm-dev] Clarification on expectations of buildbot email
notifications

+1 to Justin's comment

The only standard for revert should be: it's broken, and here's a
reproducer. Nothing else should matter.

... says the person w/a ton of internal regression testing which
regularly finds crashes upstream, and is terrified of the implied effort
of having to engage each author of a broken patch individually while
being unable to ship or see what other breakage is coming in.
Admittedly, self serving perspective. :slight_smile:

(I haven't noticed you doing tons of reverts either, but maybe I just
haven't been looking for them.)

This is why my crew has a staging branch for merging in upstream stuff,
in proper Theory of Constraints "buffer" style. Between upstream breaks
and merge issues, it's actually kind of rare to have a clean automated
merge onto our master branch. We usually end up looking for a "mostly
harmless" merge point and manually bringing it in, then fix up the issues
later, just so we can stay sort-of caught up.

But, it does give us some breathing room to engage authors and give them
a reasonable period to fix things up themselves, before it materially
affects our actual work branches.
--paulr

+1 to Justin’s comment

The only standard for revert should be: it’s broken, and here’s a
reproducer. Nothing else should matter.

+1 here.

With only some wiggle room for “is this regression only in some esoteric mode that only this person/group cares about & it’s unreasonable to expect other people to care about/be able to reproduce” (even something like: I can only reproduce this while running on certain hardware, where it’s unreasonable to expect someone else to have that hardware - then it gets into weird cases where the esoteric hardware owner can reasonable slow down the rest of the project by having a patch reverted while they investigate, or if they should take the cost of having their branch broken while they investigate & the project continues on regardless, to some degree)

… says the person w/a ton of internal regression testing which
regularly finds crashes upstream, and is terrified of the implied effort
of having to engage each author of a broken patch individually while
being unable to ship or see what other breakage is coming in.
Admittedly, self serving perspective. :slight_smile:

Philip

p.s. Speaking personally, I actually prefer having a patch of mine
simply reverted w/o discussion when there’s a problem. This way there’s
no expectation of timely response on my side, and if I happen to be out
of office, or on vacation, I can get to it when I get back and have
time. I do expect to be informed of the revert of course.

Yeah, I’m of two minds about this personally - I personally get frustrated (but honestly, it’s probably as much frustrattion with myself than anyone else) when a patch is reverted for relatively trivial reasons - like a mistyped CHECK line (oh, this should have “requires: asserts”? OK). I’d personally appreciate some courtesy from folks - and tend to extend the same in return (you broke the -Werror build? Oh, I’ll just add that cast-to-void to fix your unused variable warning in non-asserts builds, etc).

  • Dave

It was the -Wformat family of warnings which we probably want to have
in -Wall. The problem was/is, ngettext (GNU gettext) has multiple
__format_arg__ attributes, but clang only checked one of the format
strings. My patch changed which one was checked, so it warned about
format strings that had been ignored before.

Michael

+1 to all of Dave's comments (and agreements)