buildbot failure in LLVM on clang-ppc64-elf-linux2

This buildbot appears to have been failing for several weeks now ( http://lab.llvm.org:8011/builders/clang-ppc64-elf-linux2/builds/19490 ). Does anyone know/own/care about it?

[+Bill and Bill]

[+Bill and Bill]

> From: "David Blaikie via llvm-dev" <llvm-dev@lists.llvm.org>
> To: "llvm-dev" <llvm-dev@lists.llvm.org>
> Sent: Tuesday, September 29, 2015 12:39:02 PM
> Subject: [llvm-dev] Fwd: buildbot failure in LLVM on clang-ppc64-elf-linux2
>
>
>
> This buildbot appears to have been failing for several weeks now (
> http://lab.llvm.org:8011/builders/clang-ppc64-elf-linux2/builds/19490
> ). Does anyone know/own/care about it?

Answer: yes.

We've had it under investigation since the problem started happening.
It was difficult to track it down. Eventually we traced it back to the
shrink-wrap enablement, and the developer provided a fix. The fix has
proven to be insufficient; it fixed a problem with a gcc-built Clang,
but we still have a problem with a self-hosted Clang. The developer is
currently verifying that disabling shrink-wrap on a self-hosting build
will resolve the problem. If so, the change to enable shrink-wrap for
PowerPC will be reverted at once. If not, I believe we will have to
XFAIL the test to let the bot start working again until we can figure
out another cause.

So, people have been working on this, but it is Not Good that the bot
has been disabled for this long, and we will try to make sure this
doesn't happen again. The problem has been figuring out exactly what to
revert, which has been more difficult than usual this time.

Thanks,
Bill

> [+Bill and Bill]
>
> > From: "David Blaikie via llvm-dev" <llvm-dev@lists.llvm.org>
> > To: "llvm-dev" <llvm-dev@lists.llvm.org>
> > Sent: Tuesday, September 29, 2015 12:39:02 PM
> > Subject: [llvm-dev] Fwd: buildbot failure in LLVM on
clang-ppc64-elf-linux2
> >
> >
> >
> > This buildbot appears to have been failing for several weeks now (
> > http://lab.llvm.org:8011/builders/clang-ppc64-elf-linux2/builds/19490
> > ). Does anyone know/own/care about it?

Answer: yes.

We've had it under investigation since the problem started happening.
It was difficult to track it down. Eventually we traced it back to the
shrink-wrap enablement, and the developer provided a fix. The fix has
proven to be insufficient; it fixed a problem with a gcc-built Clang,
but we still have a problem with a self-hosted Clang. The developer is
currently verifying that disabling shrink-wrap on a self-hosting build
will resolve the problem. If so, the change to enable shrink-wrap for
PowerPC will be reverted at once. If not, I believe we will have to
XFAIL the test to let the bot start working again until we can figure
out another cause.

Thanks for the update!

So, people have been working on this, but it is Not Good that the bot
has been disabled for this long,

It's more than that the bot has been disabled - were it just disabled, I
wouldn't have seen it, nor written that email. The bot has been red and
prone to (though not always - as Renato's pointed out, a red bot that
remains red may not send email, but purple (exception)->red (even if it
were red before the exceptional result) does send email) send email on some
of its failures, which adds noise to the system and makes it hard for
developers to trust/act on buildbot results.

and we will try to make sure this
doesn't happen again. The problem has been figuring out exactly what to
revert, which has been more difficult than usual this time.

Is there any reason the test couldn't've been XFAIL'd from the start, while
investigation continued?

- Dave

        > [+Bill and Bill]
        >
        > > From: "David Blaikie via llvm-dev"
        <llvm-dev@lists.llvm.org>
        > > To: "llvm-dev" <llvm-dev@lists.llvm.org>
        > > Sent: Tuesday, September 29, 2015 12:39:02 PM
        > > Subject: [llvm-dev] Fwd: buildbot failure in LLVM on
        clang-ppc64-elf-linux2
        > >
        > >
        > >
        > > This buildbot appears to have been failing for several
        weeks now (
        > >
        http://lab.llvm.org:8011/builders/clang-ppc64-elf-linux2/builds/19490
        > > ). Does anyone know/own/care about it?
        
        Answer: yes.
        
        We've had it under investigation since the problem started
        happening.
        It was difficult to track it down. Eventually we traced it
        back to the
        shrink-wrap enablement, and the developer provided a fix. The
        fix has
        proven to be insufficient; it fixed a problem with a gcc-built
        Clang,
        but we still have a problem with a self-hosted Clang. The
        developer is
        currently verifying that disabling shrink-wrap on a
        self-hosting build
        will resolve the problem. If so, the change to enable
        shrink-wrap for
        PowerPC will be reverted at once. If not, I believe we will
        have to
        XFAIL the test to let the bot start working again until we can
        figure
        out another cause.

Thanks for the update!

        So, people have been working on this, but it is Not Good that
        the bot
        has been disabled for this long,

It's more than that the bot has been disabled - were it just disabled,
I wouldn't have seen it, nor written that email. The bot has been red
and prone to (though not always - as Renato's pointed out, a red bot
that remains red may not send email, but purple (exception)->red (even
if it were red before the exceptional result) does send email) send
email on some of its failures, which adds noise to the system and
makes it hard for developers to trust/act on buildbot results.

        and we will try to make sure this
        doesn't happen again. The problem has been figuring out
        exactly what to
        revert, which has been more difficult than usual this time.

Is there any reason the test couldn't've been XFAIL'd from the start,
while investigation continued?

None. We'll educate folks on this and attempt to ensure we do better
next time.

Bill

         > [+Bill and Bill]
         >
         > > From: "David Blaikie via llvm-dev"
         <llvm-dev@lists.llvm.org>
         > > To: "llvm-dev" <llvm-dev@lists.llvm.org>
         > > Sent: Tuesday, September 29, 2015 12:39:02 PM
         > > Subject: [llvm-dev] Fwd: buildbot failure in LLVM on
         clang-ppc64-elf-linux2
         > >
         > > This buildbot appears to have been failing for several
         weeks now (
         > >
         http://lab.llvm.org:8011/builders/clang-ppc64-elf-linux2/builds/19490
         > > ). Does anyone know/own/care about it?
                  Answer: yes.
                  We've had it under investigation since the problem started
         happening.
         It was difficult to track it down. Eventually we traced it
         back to the
         shrink-wrap enablement, and the developer provided a fix. The
         fix has
         proven to be insufficient; it fixed a problem with a gcc-built
         Clang,
         but we still have a problem with a self-hosted Clang. The
         developer is
         currently verifying that disabling shrink-wrap on a
         self-hosting build
         will resolve the problem. If so, the change to enable
         shrink-wrap for
         PowerPC will be reverted at once. If not, I believe we will
         have to
         XFAIL the test to let the bot start working again until we can
         figure
         out another cause.

Thanks for the update!
           So, people have been working on this, but it is Not Good that
         the bot
         has been disabled for this long,

It's more than that the bot has been disabled - were it just disabled,
I wouldn't have seen it, nor written that email. The bot has been red
and prone to (though not always - as Renato's pointed out, a red bot
that remains red may not send email, but purple (exception)->red (even
if it were red before the exceptional result) does send email) send
email on some of its failures, which adds noise to the system and
makes it hard for developers to trust/act on buildbot results.
           and we will try to make sure this
         doesn't happen again. The problem has been figuring out
         exactly what to
         revert, which has been more difficult than usual this time.

Is there any reason the test couldn't've been XFAIL'd from the start,
while investigation continued?

None. We'll educate folks on this and attempt to ensure we do better
next time.

Do we have a best practices page on this topic? This seems like something which would be good to document.

Does anyone other than the bot owner get value out of this email? It seems like we should not be emailing contributors on a exceptional->error condition. Making this change might cut down the noise on the bots materially.

Philip

The bot has been red and prone to (though not always - as Renato's
pointed out, a red bot that remains red may not send email, but purple
(exception)->red (even if it were red before the exceptional result)
does send email) send email on some of its failures, which adds noise
to the system and makes it hard for developers to trust/act on
buildbot results.

Does anyone other than the bot owner get value out of this email? It
seems like we should not be emailing contributors on a
exceptional->error condition. Making this change might cut down the
noise on the bots materially.

It would certainly help if the bots emailed/irc-complained only their owners when svn issues happen. Very rarely (if ever?) could the committer legitimately be blamed for that.

Jon

We’ve discussed conditional emails based on previous status and type of error before, and we all agreed it was a good thing to do. But the lack of interest due to people moving to Jenkins got us nowhere.

I now see that I’m not the only one liking Buildbots more, so I think we ought to start that thread again.

Cheers,
Renato