buildbot failure in LLVM on clang-native-arm-cortex-a9

The Buildbot has detected a new failure on builder clang-native-arm-cortex-a9 while building cfe.
Full details are available at:
  http://lab.llvm.org:8011/builders/clang-native-arm-cortex-a9/builds/14552

Buildbot URL: http://lab.llvm.org:8011/

Buildslave for this Build: as-bldslv1

Build Reason: scheduler
Build Source Stamp: [branch trunk] 198489
Blamelist: alp

BUILD FAILED: failed compile
The bug is not reproducible, so it is likely a hardware or OS problem.
make[5]: *** [/home/buildslave/slave_as-bldslv1/clang-native-arm-cortex-a9/llvm/tools/clang/lib/ASTMatchers/Dynamic/Release+Asserts/Registry.o] Error 1
make[5]: Leaving directory `/home/buildslave/slave_as-bldslv1/clang-native-arm-cortex-a9/llvm/tools/clang/lib/ASTMatchers/Dynamic'
make[4]: *** [Dynamic/.makeall] Error 2
make[4]: Leaving directory `/home/buildslave/slave_as-bldslv1/clang-native-arm-cortex-a9/llvm/tools/clang/lib/ASTMatchers'
make[3]: *** [ASTMatchers/.makeall] Error 2

Would it be possible to skip sending mail on hardware/OS/out-of-disk messages?

I imagine this is just a matter of checking the process exit code from the build system: 0 for success, 1 for build failure that sends notifications, everything else is an admin problem.

If the script in use has no code owner, I'll appreciate a pointer to what's sending the mails and I'll see if someone can look into it and submit a patch.

We should be more proactive and disable noisy build servers until a technical solution is available rather than the other way round, given how they drown out real problems.

Thanks

Alp.

Would it be possible to skip sending mail on hardware/OS/out-of-disk
messages?

I imagine this is just a matter of checking the process exit code from the
build system: 0 for success, 1 for build failure that sends notifications,
everything else is an admin problem.

No, exit codes don't tell the whole story. One would have to grep for
specific messages like "disk full" or "not reproducible".

If the script in use has no code owner, I'll appreciate a pointer to what's

sending the mails and I'll see if someone can look into it and submit a
patch.

I have no idea where is this code, or who is responsible.

We should be more proactive and disable noisy build servers until a

technical solution is available rather than the other way round, given how
they drown out real problems.

It's not that simple. The ARM boards we have been using are all development
boards, built with the quality you'd expect from evaluation hardware. The
only production hardware you can find with an ARM chip inside are mobile
phones, tablets and the Samsung Chromebook (which we use at Linaro), but
they are not fit for being servers by a long shot. The only server-grade
ARM hardware, Calxeda, went bankrupt last month. :frowning:

Unfortunately, those bots are our only solution for now, and we'll have to
keep them running the best we can. We must fix the problem (grep on errors,
and all the other things we discussed last week), not turn off the only
buildbots we have.

cheers,
--renato

    Would it be possible to skip sending mail on
    hardware/OS/out-of-disk messages?

    I imagine this is just a matter of checking the process exit code
    from the build system: 0 for success, 1 for build failure that
    sends notifications, everything else is an admin problem.

No, exit codes don't tell the whole story. One would have to grep for specific messages like "disk full" or "not reproducible".

    If the script in use has no code owner, I'll appreciate a pointer
    to what's sending the mails and I'll see if someone can look into
    it and submit a patch.

I have no idea where is this code, or who is responsible.

So, I did some digging:

zorg/buildbot/commands/StandardizedTest.py has logic that converts logs and status reports into an actionable test results.

    We should be more proactive and disable noisy build servers until
    a technical solution is available rather than the other way round,
    given how they drown out real problems.

It's not that simple. The ARM boards we have been using are all development boards, built with the quality you'd expect from evaluation hardware. The only production hardware you can find with an ARM chip inside are mobile phones, tablets and the Samsung Chromebook (which we use at Linaro), but they are not fit for being servers by a long shot. The only server-grade ARM hardware, Calxeda, went bankrupt last month. :frowning:

Unfortunately, those bots are our only solution for now, and we'll have to keep them running the best we can. We must fix the problem (grep on errors, and all the other things we discussed last week), not turn off the only buildbots we have.

I didn't realise these bots were the last line of defence for ARM support! In that case let's keep them in commission and focus on the grep fix you suggest. Agree that stderr is a more practical informant than exit codes.

The most spammy patterns are predictable and relate to SVN outage, network failures, out-of-disk-space and non-deterministic results presumably related to the hardware flakiness you described. Those should only be sent the device admins and maybe the module owner, never individual committers to whom they're unactionable.

Think we have a handle on this now but a "pong, XXX owns this module" would be appreciated from anyone in the know.

Alp.

A better way of looking at it would be that most LLVM developers only
"pre-test" their
patches on their own development machines, which tend to be either x86-64 or x86
machines. While ARM is mostly similar, there are things that are done
differently at
both the LLVM and clang levels (eg, the Itanium C++ ABI used does some
things differently
to the x86 based ABIs.) In a way they're a lot more of a first line of
defence, trying to
ensure that all commits get at least superficial ARM testing
regardless of what platform
the developer is using.

It is a bit annoying that none of the available ARM boards suitable
for putting in a board
farm have quite the reliability level that would be ideal for buildbot use.

Cheers,
Dave