clang-ppc64be-linux-lnt flakiness

Hi Bill, just a heads up but this bot seems to have spuriously failed on a build that only pulled in my commit r272505 which is unrelated.

http://lab.llvm.org:8011/builders/clang-ppc64be-linux-lnt/builds/4883

– Sean Silva

Also, Clang Tools :: include-fixer/merge.test seems to be a bit flaky:

http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/1304
http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/1315

Ben, it looks like you’re the one that added that test. Is there any chance it could be a nondeterminism or something in the code?

– Sean Silva

I threw all the sanitizers I had access to on this test and didn't
find anything. The merging uses threads so I can't rule out
nondeterminism. It's strange that it only happens on ppc64le and only
on stage 2, so an actual miscompile wouldn't surprise me either.

Thanks for taking a look. The flaky ASan test failure in the OP was stage1 so it sounds like there may be multiple problems :frowning:

– Sean Silva

This just failed again: http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/1579

Bill, could you take a look at this? This is like the 3rd time I’ve been incorrectly pinged by this buildbot due to this issue.

– Sean Silva

I looked at this a bit a while back when you first asked but I didn't see anything obviously wrong.

Benjamin, at least one of the referenced failures was on BE powerpc64. I don't remember if that one (http://lab.llvm.org:8011/builders/clang-ppc64be-linux-lnt/builds/4883) failed on stage 1 or 2 and the results pages are no longer available.

BTW, I tried testing the same revision many times on the same machine and other BE and LE powerpc machines and never saw this.

Since the test case relies on the ThreadPool class this is probably
related to https://llvm.org/bugs/show_bug.cgi?id=25829. We could
disable that test on ppc but it would be much nicer if we could figure
out why the ThreadPool doesn't work reliably on PPC. I couldn't get
this to reproduce on a ppc64 machine :frowning:

Since the test case relies on the ThreadPool class this is probably
related to 25829 – ppc64 crashes when running ThreadPool unittests. We could
disable that test on ppc but it would be much nicer if we could figure
out why the ThreadPool doesn't work reliably on PPC. I couldn't get
this to reproduce on a ppc64 machine :frowning:

Ouch! Thanks for digging in though.

-- Sean Silva

The failure just happened to me again: http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/1635/steps/ninja%20check%201/logs/FAIL%3A%20Clang%20Tools%3A%3Amerge.test

:frowning: