[RFC] Switching make check to use 'set -o pipefail'

We currently don't use pipefail when running test under make check.
This has the undesirable property that it is really easy for tests to
bitrot. For example, something like

llc %s | FileCheck %s

will still pass if llc crashes after printing what FileCheck was
looking for. It is also easy to break the tests when refactoring. I
have fixed tests that were doing

%clang_cc1 -a-driver-options ... | not grep

clearly the test was changed from %clang to %clang_cc1 and we missed
the fact that the option also had to be updated.

Currently to check a command output and that it doesn't crash we have to do

llc %s > %t
FileCheck %s < %t

I would like to switch to using pipefail instead. That would meant that a simple

llc %s | FileCheck %s

would check both llc return value and output. I have already cleared
all the tests, so all that is missing is changing lit itself. Any
objections to doing so?

Cheers,
Rafael

Hi Rafael,

I think you saw my other e-mail, but just in case you haven't, do you
have any thoughts about making this an option that could be easily
disabled on the command line without maintaing a patch to lit? I think
it would help out-of-tree target maintainers to transition, since I'm
sure there will be a lot of similarly broken tests to fix.

Stephen

Hi Rafael,

I think you saw my other e-mail, but just in case you haven't, do you
have any thoughts about making this an option that could be easily
disabled on the command line without maintaing a patch to lit? I think
it would help out-of-tree target maintainers to transition, since I'm
sure there will be a lot of similarly broken tests to fix.

I don't think it is all that many since it was less than one day of
work for the in tree ones. But if there is the desire for such an
option I can try to add it. What should I use? An environment
variable?

Stephen

Cheers,
Rafael

I don't think it is all that many since it was less than one day of
work for the in tree ones. But if there is the desire for such an
option I can try to add it. What should I use? An environment
variable?

Hmm, I don't know LLVM's Makefile system well enough to know the
easiest way to implement an option; if it's non-trivial then maybe
it's not worth it.

I also don't know the workflow of most people doing out-of-tree work,
so I'm not sure how much impact this might have. It can obviously be
temporarily reverted locally pretty easily, but it assumes people are
paying attention to LLVMDev/llvm-commits and know what's going on.
Also, the commit causing all the new failures might not be as obvious
down the line to someone updating their tree irregularly (which is
probably true for a lot of academic LLVM users, I'm guessing.)

Just curious, does using pipefail give any information about where in
the pipe the failure actually comes from? Some kind of message would
be useful for debugging purposes, in addition to explaining what's
going on to someone who wasn't watching dev lists and commit messages
carefully.

Anyway, perhaps none of this is as big of a deal as I'm making it out
to be; I'll leave it to someone with more awareness of downstream
workflows to comment further.

Cheers,
Rafael

Stephen

Hmm, I don't know LLVM's Makefile system well enough to know the
easiest way to implement an option; if it's non-trivial then maybe
it's not worth it.

That is my impression at least. These errors are somewhat easy to
introduce, but also easy to fix.

I also don't know the workflow of most people doing out-of-tree work,
so I'm not sure how much impact this might have. It can obviously be
temporarily reverted locally pretty easily, but it assumes people are
paying attention to LLVMDev/llvm-commits and know what's going on.
Also, the commit causing all the new failures might not be as obvious
down the line to someone updating their tree irregularly (which is
probably true for a lot of academic LLVM users, I'm guessing.)

Just curious, does using pipefail give any information about where in
the pipe the failure actually comes from? Some kind of message would
be useful for debugging purposes, in addition to explaining what's
going on to someone who wasn't watching dev lists and commit messages
carefully.

With the external (shell based) testing we don't get anything. With
the internal (python based) one we get the last failing command line.
It doesn't say which component of it failed, but the vast majority of
the pipes we have are of the form "command | FileCheck", so it is
clear that "command" must be the one failing since the pipe was
passing before enabling pipefail.

Cheers,
Rafael

Hi,

I am in favor of using pipefail by default since it will reduce the
potential for false positives in the test suite.

> Hmm, I don't know LLVM's Makefile system well enough to know the
> easiest way to implement an option; if it's non-trivial then maybe
> it's not worth it.

That is my impression at least. These errors are somewhat easy to
introduce, but also easy to fix.

FWIW, I don't think there is much value in adding a command line option
for disabling pipefail. It shouldn't be too difficult for out of tree
targets to fix their tests.

-Tom

Hi Rafael,

Did this discussion ever get a conclusion? I support enabling
pipefail. Fallout for out of tree users should be easy to fix. As we
learned from LLVM tests, almost all tests that start to fail actually
indicate a real problem that was hidden.

Dmitri

Hi Rafael,

Did this discussion ever get a conclusion? I support enabling
pipefail. Fallout for out of tree users should be easy to fix. As we
learned from LLVM tests, almost all tests that start to fail actually
indicate a real problem that was hidden.

So far I got some positive feedback, but no strong LGTM from someone
in the area :frowning:

Dmitri

Cheers,
Rafael

+1 more for pipefail, if that helps. :slight_smile:

The only standing objection has to do with out-of-tree target maintainers,
and honestly I think they'll be fine.

> Hi Rafael,
>
> Did this discussion ever get a conclusion? I support enabling
> pipefail. Fallout for out of tree users should be easy to fix. As we
> learned from LLVM tests, almost all tests that start to fail actually
> indicate a real problem that was hidden.

So far I got some positive feedback, but no strong LGTM from someone
in the area :frowning:

+1 more for pipefail, if that helps. :slight_smile:

Another +1.

The only standing objection has to do with out-of-tree target maintainers,
and honestly I think they'll be fine.

I'm sure they will be. This is not the worst thing they have to live
with (the API churn is massive) & that's a tradeoff we/they
deliberately make for this project: trunk moves forward.

Ok, here is a strong LGTM. =]

Please make the change, and do the following things to aid out-of-tree
maintainers:

1) Add a flag to lit and an option to configure/make (I don't care about
CMake here as that is much less frequently used for out-of-tree work) to
disable pipefail.

2) Add a way to disable this behavior using the lit '.cfg' files,
especially the 'lit.local.cfg' files. This way out-of-tree targets can put
such a .cfg file in their target's test directory and ignore this change.

3) Add some significant documentation for what this means to both the lit
documentation and to the release notes so that folks are aware of the test
infrastructure change when they pick up a giant pile of changes in the
release

Also, please send a note (in a new thread) to llvmdev when the switch is
flipped with a reminder about how to disable the new behavior for folks
that can't update their test suite. You'll probably want to flip the switch
when you have time to track down lots of build bot failures. =D

Thanks for making the test infrastructure better,
-Chandler

Ok, here is a strong LGTM. =]

Please make the change, and do the following things to aid out-of-tree
maintainers:

1) Add a flag to lit and an option to configure/make (I don't care about
CMake here as that is much less frequently used for out-of-tree work) to
disable pipefail.

I have just fixed the last failures on windows. I have also added
documentation and the support for disabling pipefail in a directory.
The one thing I have not implemented yet is the configure change.

The reason is that after thinking a bit about it it looks like
something we don't want to have. What we want to provide is an easy
way for people doing out of tree work to get their tests passing after
an upgrade. We do want upstream tests to fail for them if they, for
example, break opt so that it crashes on exit. This is exactly what
the lit.local.cfg provides.

A new patch is attached. What do you think?

Also, please send a note (in a new thread) to llvmdev when the switch is
flipped with a reminder about how to disable the new behavior for folks that
can't update their test suite. You'll probably want to flip the switch when
you have time to track down lots of build bot failures. =D

OK.

Thanks for making the test infrastructure better,
-Chandler

Cheers,
Rafael

t.patch (3.56 KB)

> Ok, here is a strong LGTM. =]
>
> Please make the change, and do the following things to aid out-of-tree
> maintainers:
>
> 1) Add a flag to lit and an option to configure/make (I don't care about
> CMake here as that is much less frequently used for out-of-tree work) to
> disable pipefail.
>

I have just fixed the last failures on windows. I have also added
documentation and the support for disabling pipefail in a directory.
The one thing I have not implemented yet is the configure change.

The reason is that after thinking a bit about it it looks like
something we don't want to have. What we want to provide is an easy
way for people doing out of tree work to get their tests passing after
an upgrade. We do want upstream tests to fail for them if they, for
example, break opt so that it crashes on exit. This is exactly what
the lit.local.cfg provides.

I can see your argument here and am fine with it on further thought. Thanks.

A new patch is attached. What do you think?

Looks good. I might add in the release notes a quick "paste the following
into a lit.local.cfg file in your test subtree to turn this off if you need
to" bit? I'm just trying to avoid grumbling by giving a recipe for ignoring
this change on old out-of-tree targets that aren't likely to be updated to
test correctly.

> Also, please send a note (in a new thread) to llvmdev when the switch is
> flipped with a reminder about how to disable the new behavior for folks
that
> can't update their test suite. You'll probably want to flip the switch
when
> you have time to track down lots of build bot failures. =D

OK.

If you can give the recipe in th erelease notes, I'd paste it into the
email as well.

Thanks for working on this.