> That depends on what you call a false positive. The public buildbot
> regularly fails because of mailing Frontend tests, and I have had
> continues failures of some DejaGNU tests for a long time on some
> builders. Its not a false positive per se, but one starts to ignore
> the failures because they aren't unexpected.
Yes. Probably the only way this will work better is if we get the
testsuite to 0 failures, everywhere, conditionalizing as necessary to
get rid of expected failures. Then regressions will be more visible.
I doubt that will happen unless we freeze the tree for a while and get
everybody to fix bugs, or disable tests, instead of doing new stuff
(at least, that was the case for gcc).
This is exactly what we're supposed to do for releases, and in theory, all of the time.
We've been having a lot of churn lately. This is a good thing overall, since it means there's lots of contributions going into the project. What's different about this is that we have a lot of large-scale, sweeping changes that touch a lot of code. In the past we've generally serialized this sort of thing between contributors, or broken changes up to be extremely incremental. The reason this is happening less now is that we, as developers, are growing more ambitious with our fixes to LLVM systematic problems, and doing so on a tigher schedule. Once again, this is a good thing.
There's two issues with buildbots. Firstly, we need more buildbots on more platforms. For example, there are no Darwin buildbots, so if I commit a change that breaks Darwin I won't get immediate notice about it, nor a log of the failure.
This isn't 100% true. We have a series of build bots at Apple building in various ways. Failures are sent to the mailing list, but they are not very meaningful to non-Apple employees because they don't have access to the machines and log files. We monitor them very closely, so we will pester people about any breakages. Normally, a breakage on our build bots will also break on the Google ones. It's not always the case, but it happens most of the time.
Things get really out of hand (and I tend to lose my temper and write hotly worded emails) when things obviously break, and the build bots send out emails about these breakages, but people ignore them, and the build is broken for half a day or more. This morning, I got to the office and couldn't build TOT, it was so bad.
We could even consider having a buildbot a prerequisite to being a release-blocking platform. The other is that we need some level of quality control on buildbots. We can accomplish this by either publishing a few buildbot guidelines (ie., don't install llvm-gcc on your buildbot machine because it will cause false-positives as llvm and llvm-gcc get out of step) and by enhancing the buildbot system to let us mark problems as expected. We already have part of that by XFAILing tests.
I think that a policy guideline for build bots would be a very Good Thing(tm). I'm a novice at creating the build bot configure file, but Daniel and I can probably summarize how the build bots are run at Apple, which would be a good first-step towards this.
Even so, better buildbots will improve visibility into how the tree is progressing on a commit-by-commit basis, but it does nothing to prevent breakage in the first place. I suspect most of our grief will go away as some of the current major changes finish. If not, we'll have to come up with a better way to handle so many large changes, maybe something like a "schedule of merges" so that committers don't step all over each other. I think GCC does something like this already?
We've deferred imposing structure like that until we discover that we need it, and I'm not conviced we're quite there yet, but perhaps it's time to start thinking about it.
I don't think we need to impose a constrictive structure on people. We just need to foster good programming practices. The GCC people require that patches be run through the GCC testsuite with no regressions. That testsuite is *huge* and doesn't run cleanly for us. But our modest regression testsuite is a good first step. For major changes, running some subset of the llvm-test directory is appropriate. There are other things too, of course...