please stabilize the trunk

We've had a lot of churn in all the trunks (llvm, llvm-gcc, clang) recently, and the testing buildbots have been failing repeatedly.

I spoke with Chris this AM, and he suggested we have a "stabilization day." Please avoid large, destabilizing changes for about twenty-four hours. We would like for the testing bots to begin working again.

Thanks,

stuart

OK.
I wonder if we might be able to automate the stabilization somewhat. I'm not at all sure this can be done without introducing worse problems that it solves, but here's some discussion fodder:

Have the buildbots (or, probably better, one Master Buildbot) do auto-reversion when they see a new failure. They would need to be able to detect bogus failures, such as temporary inability to connect to the svn server.

Have checkins go to a branch, and have the buildbots automove them into mainline only after passing regression checks on the branch.

If the procedures go wrong I can easily imagine the tree getting into a state where nobody knows what's in it very quickly, so we need to be careful...

I'm not too keen about seeing buildbots play with trunk :wink:

How about starting simple, and just auto-tagging builds that work?
Could be done per OS/arch, and one global tag when all buildbots pass.

It would also be useful to have a SVN pre/post-commit hook display the
number of current buildbot failures,
as a reminder that trunk is broken.

Best regards,
--Edwin

I don't think auto-reversion is ever going to be a very good idea, and
don't envy anyone trying to implement it correctly. Usually it is the
case that build breakages can be easily fixed, when not it usually the
author is most capable of reverting them safely.

An alternative approach to improving the current status is to make the
buildbots detect failures sooner. Right now the Apple bootstrap
buildbot takes two hours. This means a four hour latency between what
may be a failure and what may be a fix. A simple fix for this is to
add a new buildbot which only tests one particular Apple style build,
instead of the 6 or however many it currently builds.

- Daniel

I wonder if we might be able to automate the stabilization somewhat.
I'm not at all sure this can be done without introducing worse
problems that it solves, but here's some discussion fodder:

Have the buildbots (or, probably better, one Master Buildbot) do auto-
reversion when they see a new failure. They would need to be able to
detect bogus failures, such as temporary inability to connect to the
svn server.

Have checkins go to a branch, and have the buildbots automove them
into mainline only after passing regression checks on the branch.

If the procedures go wrong I can easily imagine the tree getting into
a state where nobody knows what's in it very quickly, so we need to be
careful...

I'm not too keen about seeing buildbots play with trunk :wink:

Nor am I; normally I'd be the last person to suggest something like this. But in the last few days we've seen just how bad a job humans can do...

We would need a much much more sophisticated testing system before we
can do something automated with reverting patches. One unfortunate
side-effect of auto-reverting is that it could revert one patch that
has a fix for that patch in the pipeline. Leading to unnecessary churn
by the build bots.

The core problem, in my opinion, is that people *don't* pay attention
to the build bot failure messages that come along. This is as much a
systemic problem as it is a community problem. Community in that we
need to foster the submission of well-tested patches. Systemic because
this is not how people hacking on open source projects typically
develop projects.

I like Daniel's idea of throwing more machines at the problem. It's a
brute force method, but there you go.

-bw

I wonder if we might be able to automate the stabilization somewhat.
I'm not at all sure this can be done without introducing worse
problems that it solves, but here's some discussion fodder:

Have the buildbots (or, probably better, one Master Buildbot) do
auto-
reversion when they see a new failure. They would need to be able to
detect bogus failures, such as temporary inability to connect to the
svn server.

Have checkins go to a branch, and have the buildbots automove them
into mainline only after passing regression checks on the branch.

If the procedures go wrong I can easily imagine the tree getting into
a state where nobody knows what's in it very quickly, so we need to
be
careful...

This seems like the right juncture to remind everyone of Wyland's First Law of Automation:

  Anything that can be done /for/ you, automatically,
  can be done /to/ you, automatically.

:wink:

I'm not too keen about seeing buildbots play with trunk :wink:

Nor am I; normally I'd be the last person to suggest something like
this. But in the last few days we've seen just how bad a job humans
can do...

We would need a much much more sophisticated testing system before we
can do something automated with reverting patches. One unfortunate
side-effect of auto-reverting is that it could revert one patch that
has a fix for that patch in the pipeline. Leading to unnecessary churn
by the build bots.

The core problem, in my opinion, is that people *don't* pay attention
to the build bot failure messages that come along. This is as much a
systemic problem as it is a community problem. Community in that we
need to foster the submission of well-tested patches. Systemic because
this is not how people hacking on open source projects typically
develop projects.

I like Daniel's idea of throwing more machines at the problem. It's a
brute force method, but there you go.

Adding more buildbots sounds good to me.

stuart

That's largely because of the number of false positives.

-Eli

There have been fewer and fewer of these in recent times.

-bw

That depends on what you call a false positive. The public buildbot
regularly fails because of mailing Frontend tests, and I have had
continues failures of some DejaGNU tests for a long time on some
builders. Its not a false positive per se, but one starts to ignore
the failures because they aren't unexpected.

- Daniel

That depends on what you call a false positive. The public buildbot
regularly fails because of mailing Frontend tests, and I have had
continues failures of some DejaGNU tests for a long time on some
builders. Its not a false positive per se, but one starts to ignore
the failures because they aren't unexpected.

Yes. Probably the only way this will work better is if we get the testsuite to 0 failures, everywhere, conditionalizing as necessary to get rid of expected failures. Then regressions will be more visible. I doubt that will happen unless we freeze the tree for a while and get everybody to fix bugs, or disable tests, instead of doing new stuff (at least, that was the case for gcc).

2009/7/15 Dale Johannesen <dalej@apple.com>

That depends on what you call a false positive. The public buildbot
regularly fails because of mailing Frontend tests, and I have had
continues failures of some DejaGNU tests for a long time on some
builders. Its not a false positive per se, but one starts to ignore
the failures because they aren’t unexpected.

Yes. Probably the only way this will work better is if we get the
testsuite to 0 failures, everywhere, conditionalizing as necessary to
get rid of expected failures. Then regressions will be more visible.
I doubt that will happen unless we freeze the tree for a while and get
everybody to fix bugs, or disable tests, instead of doing new stuff
(at least, that was the case for gcc).

This is exactly what we’re supposed to do for releases, and in theory, all of the time.

We’ve been having a lot of churn lately. This is a good thing overall, since it means there’s lots of contributions going into the project. What’s different about this is that we have a lot of large-scale, sweeping changes that touch a lot of code. In the past we’ve generally serialized this sort of thing between contributors, or broken changes up to be extremely incremental. The reason this is happening less now is that we, as developers, are growing more ambitious with our fixes to LLVM systematic problems, and doing so on a tigher schedule. Once again, this is a good thing.

There’s two issues with buildbots. Firstly, we need more buildbots on more platforms. For example, there are no Darwin buildbots, so if I commit a change that breaks Darwin I won’t get immediate notice about it, nor a log of the failure. We could even consider having a buildbot a prerequisite to being a release-blocking platform. The other is that we need some level of quality control on buildbots. We can accomplish this by either publishing a few buildbot guidelines (ie., don’t install llvm-gcc on your buildbot machine because it will cause false-positives as llvm and llvm-gcc get out of step) and by enhancing the buildbot system to let us mark problems as expected. We already have part of that by XFAILing tests.

Even so, better buildbots will improve visibility into how the tree is progressing on a commit-by-commit basis, but it does nothing to prevent breakage in the first place. I suspect most of our grief will go away as some of the current major changes finish. If not, we’ll have to come up with a better way to handle so many large changes, maybe something like a “schedule of merges” so that committers don’t step all over each other. I think GCC does something like this already?

We’ve deferred imposing structure like that until we discover that we need it, and I’m not conviced we’re quite there yet, but perhaps it’s time to start thinking about it.

Nick

> That depends on what you call a false positive. The public buildbot
> regularly fails because of mailing Frontend tests, and I have had
> continues failures of some DejaGNU tests for a long time on some
> builders. Its not a false positive per se, but one starts to ignore
> the failures because they aren't unexpected.

Yes. Probably the only way this will work better is if we get the
testsuite to 0 failures, everywhere, conditionalizing as necessary to
get rid of expected failures. Then regressions will be more visible.
I doubt that will happen unless we freeze the tree for a while and get
everybody to fix bugs, or disable tests, instead of doing new stuff
(at least, that was the case for gcc).

This is exactly what we're supposed to do for releases, and in theory, all of the time.

We've been having a lot of churn lately. This is a good thing overall, since it means there's lots of contributions going into the project. What's different about this is that we have a lot of large-scale, sweeping changes that touch a lot of code. In the past we've generally serialized this sort of thing between contributors, or broken changes up to be extremely incremental. The reason this is happening less now is that we, as developers, are growing more ambitious with our fixes to LLVM systematic problems, and doing so on a tigher schedule. Once again, this is a good thing.

There's two issues with buildbots. Firstly, we need more buildbots on more platforms. For example, there are no Darwin buildbots, so if I commit a change that breaks Darwin I won't get immediate notice about it, nor a log of the failure.

This isn't 100% true. :slight_smile: We have a series of build bots at Apple building in various ways. Failures are sent to the mailing list, but they are not very meaningful to non-Apple employees because they don't have access to the machines and log files. We monitor them very closely, so we will pester people about any breakages. :slight_smile: Normally, a breakage on our build bots will also break on the Google ones. It's not always the case, but it happens most of the time.

Things get really out of hand (and I tend to lose my temper and write hotly worded emails) when things obviously break, and the build bots send out emails about these breakages, but people ignore them, and the build is broken for half a day or more. This morning, I got to the office and couldn't build TOT, it was so bad.

We could even consider having a buildbot a prerequisite to being a release-blocking platform. The other is that we need some level of quality control on buildbots. We can accomplish this by either publishing a few buildbot guidelines (ie., don't install llvm-gcc on your buildbot machine because it will cause false-positives as llvm and llvm-gcc get out of step) and by enhancing the buildbot system to let us mark problems as expected. We already have part of that by XFAILing tests.

I think that a policy guideline for build bots would be a very Good Thing(tm). I'm a novice at creating the build bot configure file, but Daniel and I can probably summarize how the build bots are run at Apple, which would be a good first-step towards this.

Even so, better buildbots will improve visibility into how the tree is progressing on a commit-by-commit basis, but it does nothing to prevent breakage in the first place. I suspect most of our grief will go away as some of the current major changes finish. If not, we'll have to come up with a better way to handle so many large changes, maybe something like a "schedule of merges" so that committers don't step all over each other. I think GCC does something like this already?

We've deferred imposing structure like that until we discover that we need it, and I'm not conviced we're quite there yet, but perhaps it's time to start thinking about it.

I don't think we need to impose a constrictive structure on people. We just need to foster good programming practices. The GCC people require that patches be run through the GCC testsuite with no regressions. That testsuite is *huge* and doesn't run cleanly for us. But our modest regression testsuite is a good first step. For major changes, running some subset of the llvm-test directory is appropriate. There are other things too, of course...

-bw

Bill Wendling wrote:

Things get really out of hand (and I tend to lose my temper and write
hotly worded emails) when things obviously break, and the build bots
send out emails about these breakages, but people ignore them, and the
build is broken for half a day or more. This morning, I got to the
office and couldn't build TOT, it was so bad.

So... what's the situation at the moment? Should both LLVM and llvm-gcc
from trunk build without errors on GNU/Linux 32-bit x86 at the moment? I
get linker errors related to LLVMContext all over the place in llvm-gcc,
and I think I'm fully up-to-date w.r.t svn.

Or should I wait a while before attempting this in these turbulent times? :slight_smile:

Bye,
Paul

We've talked about this before and I've been working on setting up
such a system. Unfortunately, I can't figure out why my buildbots
fail to configure llvm-gcc.

Is there a link to the buildbots on the website? I can't find it.

                             -Dave

I've experienced that in my own local copies. For example, I've had
19-21 unexpected failures for weeks that no one else seems to see.

Something about our test infrastructure is fragile to the point that
changing environments somehow causes different results.

                                  -Dave

How about starting simple, and just auto-tagging builds that work?
Could be done per OS/arch, and one global tag when all buildbots pass.
    
We've talked about this before and I've been working on setting up
such a system. Unfortunately, I can't figure out why my buildbots
fail to configure llvm-gcc.
  
I didn't know that it still fails.
Can you send me your full buildbot config? I'll try to have a look if I
can reproduce the failure on my machine.

Is there a link to the buildbots on the website? I can't find it.
  
The google buildbots are here, I don't see any link on llvm.org to it:
http://google1.osuosl.org:8011/waterfall

Best regards,
--Edwin

Paul Melis wrote:

Bill Wendling wrote:
  

Things get really out of hand (and I tend to lose my temper and write
hotly worded emails) when things obviously break, and the build bots
send out emails about these breakages, but people ignore them, and the
build is broken for half a day or more. This morning, I got to the
office and couldn't build TOT, it was so bad.
    
So... what's the situation at the moment? Should both LLVM and llvm-gcc
from trunk build without errors on GNU/Linux 32-bit x86 at the moment? I
get linker errors related to LLVMContext all over the place in llvm-gcc,
and I think I'm fully up-to-date w.r.t svn.

Or should I wait a while before attempting this in these turbulent times? :slight_smile:
  

I think I've tracked it down. The llvm-gcc build picked an old Release
build of LLVM to link against (which didn't have any LLVMContext stuff),
while I'm actually building LLVM Release-Asserts these days. Seems to
get further in the build after deleting the old build and doing a clean
llvm-gcc build.

Paul

> That depends on what you call a false positive. The public buildbot
> regularly fails because of mailing Frontend tests, and I have had
> continues failures of some DejaGNU tests for a long time on some
> builders. Its not a false positive per se, but one starts to ignore
> the failures because they aren't unexpected.

Yes. Probably the only way this will work better is if we get the
testsuite to 0 failures, everywhere, conditionalizing as necessary to
get rid of expected failures. Then regressions will be more visible.
I doubt that will happen unless we freeze the tree for a while and get
everybody to fix bugs, or disable tests, instead of doing new stuff
(at least, that was the case for gcc).

This is exactly what we're supposed to do for releases, and in theory, all
of the time.

We've been having a lot of churn lately. This is a good thing overall, since
it means there's lots of contributions going into the project. What's
different about this is that we have a lot of large-scale, sweeping changes
that touch a lot of code. In the past we've generally serialized this sort
of thing between contributors, or broken changes up to be extremely
incremental. The reason this is happening less now is that we, as
developers, are growing more ambitious with our fixes to LLVM systematic
problems, and doing so on a tigher schedule. Once again, this is a good
thing.

+1

There's two issues with buildbots. Firstly, we need more buildbots on more
platforms. For example, there are no Darwin buildbots, so if I commit a
change that breaks Darwin I won't get immediate notice about it, nor a log
of the failure.

I plan to solve this this week by serving a Darwin buildbot off my
home machine. I also hope to add an MSVC cmake based (and very slow)
buildbot relatively soon.

That will bring me up to serving a total of 4 buildslaves out of my
house, so if anyone else wants to contribute, please step up.

However, as Bill notes we have lots of internal bots and its fair that
Apple people have to maintain them (even if the breakage is due to an
external commit).

We could even consider having a buildbot a prerequisite to
being a release-blocking platform. The other is that we need some level of
quality control on buildbots.

I'm not really sure what this means. The llvm-gcc problem I regard as
a bug in the LLVM test suite.

We can accomplish this by either publishing a
few buildbot guidelines (ie., don't install llvm-gcc on your buildbot
machine because it will cause false-positives as llvm and llvm-gcc get out
of step) and by enhancing the buildbot system to let us mark problems as
expected. We already have part of that by XFAILing tests.

What actual enhancements would we need?

Even so, better buildbots will improve visibility into how the tree is
progressing on a commit-by-commit basis, but it does nothing to prevent
breakage in the first place. I suspect most of our grief will go away as
some of the current major changes finish.

I agree.

- Daniel

How about starting simple, and just auto-tagging builds that work?
Could be done per OS/arch, and one global tag when all buildbots pass.

We've talked about this before and I've been working on setting up
such a system. Unfortunately, I can't figure out why my buildbots
fail to configure llvm-gcc.

I plan to check my buildbot configuration "stuff" into the
llvm-project repository soon. I have to refactor things a bit to
support use by other people, but currently I have configurations for
llvm, clang, and llvm-gcc in various target / self-hosting
permutations. I also have a regular buildbot driver nightlytest (the
smoosh-01 results on llvm-testresults).

My hope is that once checked in Danny and I can merged the current
public buildbot configuration and my configuration so that all
buildbots are driven from the same master configuration (with
additional local configuration magic). This should make it easier for
other organizations to set up their own buildbots, and to add new
buildbot configurations (like MSVC/cmake).

Is there a link to the buildbots on the website? I can't find it.

Chris updated the website, its now listed under "Dev. Resources".

Please consider contributing a buildslave for the platform you care about.

- Daniel