Marking lit::shtest-format.py unsupported on PS4?, Re: buildbot failure in LLVM on llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast

Should “lit :: shtest-format.py” (from check-lit) be marked unsupported on PS4? It seems flakey there.

This evening, it failed on my commit, r337514, and I’m fairly confident it wasn’t my commit’s fault. Then it recovered on the next commit.

http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/builds/33502

http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/builds/33503

"Duncan P. N. Exon Smith via llvm-dev" <llvm-dev@lists.llvm.org> writes:

Should "lit :: shtest-format.py" (from check-lit) be marked
unsupported on PS4? It seems flakey there.

I've had a suspicion for a while that it's unstable *everywhere*. At
least, I've seen it fail on out-of-tree bots intermittently but haven't
had time to investigate. It certainly seems to fail very infrequently
though.

FWIW, I’ve seen it fail on some of my commits too, but I don’t remember whether it was on the PS4 bot exclusively or not. Anyway, my understanding is that this test shouldn’t inherently have different behaviour on PS4 specifically, but I could be mistaken. I suspect it’s something more general to do with the configuration of the bot.

James

It fails probably a couple times a day on our internal merge bots, but I think that’s still in the Linux-target stage. I haven’t been tracking closely. It’s annoying but hasn’t been irritating enough to track down yet, if you know what I mean. We just finished finding and fixing the random xray failure, which was more frequent.

I’ll prod our merge monitors about this, but if anybody else comes up with a fix, we won’t mind. J

–paulr

Hi,

We’ve given this an eyeball, and reckon there’s interference between the lit “shtest-format” and “shtest-xunit-output” tests – the following two run commands:

RUN: not %{lit} -j 1 -v %{inputs}/shtest-format > %t.out

RUN: not %{lit} -j 1 -v %{inputs}/shtest-format --xunit-xml-output %t.xml

Jeremy Morse via llvm-dev <llvm-dev@lists.llvm.org> writes:

We've given this an eyeball, and reckon there's interference between the
lit "shtest-format" and "shtest-xunit-output" tests -- the following two
run commands:

    # RUN: not %{lit} -j 1 -v %{inputs}/shtest-format > %t.out
    # RUN: not %{lit} -j 1 -v %{inputs}/shtest-format --xunit-xml-output
%t.xml

appear in the tests respectively, using the
utils/lit/tests/Inputs/shtest-format directory as the target test suite.
For external-shell tests in that suite however, lit produces a
'.script.bat' file in the build directory to be run by the external shell:

    $ pwd

/cygdrive/d/build/utils/lit/tests/Inputs/shtest-format/external_shell/Output

    $ ls
    fail.txt.script.bat fail_with_bad_encoding.txt.script.bat
pass.txt.script.bat

Two concurrent lit invocations running on the same test suite will both
write to those files, creating racy filesystem behaviour and likely the
errors that have been seen. I've found at least one shtest-xunit-output
test run with a "Permission denied" error when writing to the script file
(on Windows). The race is still present for non-windows systems but
presumably much rarer.

I'm not very familiar with lit, and presumably making the external-shell
scripts temporarily named files would fix this, but there may be other
assumptions lit makes about a test suite not being operated on concurrently.

Thanks for looking into this. You're right, the problem is that lit
writes scripts into its test suite, so the "inputs" directory here is
kind of poorly named. I think the simplest short term fix is to just
merge the check lines from shtest-format and shtest-xunit-output into
one test. I've gone ahead and done that for now in r337718.

Arguably it'd be more correct to copy the test suites from inputs/ into
paths in %t somewhere to properly isolate things, but I think that makes
the tests slower and more complicated without very good reason.