llvm-lit: 2>&1 and FileCheck

Hi all,

quite a few tests use the pattern "2>&1 | FileCheck %s". AFAIK how
stdout and stderr are merged into a single character stream is
undefined and depends e.g. on whether stdout is buffered. I think we
are often saved by the fact that standard output is written only at
the end of the program and stderr is unbuffered, i.e. always written
before stdout.

A lot of tests disable stdout using either "-o /dev/null" or
"-disable-output", but not all. For instance,
test/Transforms/SLPVectorizer/X86/reduction_unrolled.ll does not. It
checks for output from stdout and stderr using the same FileCheck. The
stderr it is checking even comes from -debug, which has an additional
buffering layer (circular_raw_ostream).

The testing guide [1] does not mention how to test stderr.

My questions:

1. Are these tests, e.g. reduction_unrolled.ll fragile? Maybe I am
missing something that says that interleaving stdout and stderr (and
llvm::dbgs()) is well-defined in llvm-lit.

2. Can -debug (or -debug-only) be used in regression tests at all? I
understood them as debugging aids only. I would not like if
adding/changing DEBUG(dbgs() << ...); lines causing regression tests
to fail.

3. What are the canonical ways to test...
3a) opt -stat output (e.g. "2>&1 | FileCheck\n; REQUIRES: asserts")
3b) A statistic from -stat being zero
3c) stderr only (and be sure that no lines from stdout will be
interleaved with it)
3d) stdout and stderr at the same time, but independently.
3e) the output of DEBUG(dbgs() << ...) lines, if allowed to do so.
3f) If not, how to replace it? Eg. how to test whether a source code
line has been executed.

Thanks in advance,
Michael

[1] http://llvm.org/docs/TestingGuide.html

Hi all,

quite a few tests use the pattern “2>&1 | FileCheck %s”. AFAIK how
stdout and stderr are merged into a single character stream is
undefined and depends e.g. on whether stdout is buffered. I think we
are often saved by the fact that standard output is written only at
the end of the program and stderr is unbuffered, i.e. always written
before stdout.

A lot of tests disable stdout using either “-o /dev/null” or
“-disable-output”, but not all. For instance,
test/Transforms/SLPVectorizer/X86/reduction_unrolled.ll does not. It
checks for output from stdout and stderr using the same FileCheck. The
stderr it is checking even comes from -debug, which has an additional
buffering layer (circular_raw_ostream).

The testing guide [1] does not mention how to test stderr.

My questions:

  1. Are these tests, e.g. reduction_unrolled.ll fragile? Maybe I am
    missing something that says that interleaving stdout and stderr (and
    llvm::dbgs()) is well-defined in llvm-lit.

  2. Can -debug (or -debug-only) be used in regression tests at all? I
    understood them as debugging aids only. I would not like if
    adding/changing DEBUG(dbgs() << …); lines causing regression tests
    to fail.

Rough guesses, based on no broad review of test cases. All of this seems OK except for the interleaved case(s) as you mentioned.

3f - generally it’s probably best not to test for whether a source code line has executed. That would make the test fragile - the observable behavior should be tested instead. Though I would imagine it comes up sometimes as the best thing to do in a bad situation.

From: llvm-dev [mailto:llvm-dev-bounces@lists.llvm.org] On Behalf Of
Michael Kruse via llvm-dev
Sent: Thursday, February 23, 2017 6:53 AM
To: llvm-dev
Subject: [llvm-dev] llvm-lit: 2>&1 and FileCheck

Hi all,

quite a few tests use the pattern "2>&1 | FileCheck %s". AFAIK how
stdout and stderr are merged into a single character stream is
undefined and depends e.g. on whether stdout is buffered. I think we
are often saved by the fact that standard output is written only at
the end of the program and stderr is unbuffered, i.e. always written
before stdout.

A lot of tests disable stdout using either "-o /dev/null" or
"-disable-output", but not all. For instance,
test/Transforms/SLPVectorizer/X86/reduction_unrolled.ll does not. It
checks for output from stdout and stderr using the same FileCheck. The
stderr it is checking even comes from -debug, which has an additional
buffering layer (circular_raw_ostream).

The testing guide [1] does not mention how to test stderr.

My questions:

1. Are these tests, e.g. reduction_unrolled.ll fragile? Maybe I am
missing something that says that interleaving stdout and stderr (and
llvm::dbgs()) is well-defined in llvm-lit.

I'd consider them fragile, but obviously their behavior has been
consistent across a variety of bots for some time. So the fragility
is a bit pedantic/theoretical. "The behavior is undefined but I know
what I'm doing!"

There are times running a test when I've seen interleaved stdout/stderr
text, but not the text that a CHECK was looking for; so I think people
are getting lucky in at least some cases.

2. Can -debug (or -debug-only) be used in regression tests at all? I
understood them as debugging aids only. I would not like if
adding/changing DEBUG(dbgs() << ...); lines causing regression tests
to fail.

The line between "debugging aid" and "event logging" is not clear, but
I have written tests relying on logging-style output; I think that's ok.
As always you want your CHECKs to be specific enough to avoid false
matches but not so specific that they become too fragile.

  • In general you should try hard not to use dbgs() output for unit testing, it is certainly an anti-pattern.

  • We do indeed not have any other logging mechanism (and I am not convinced that we need one).

  • There are areas where we should improve: For example someone should implement the equivalent of ‘opt -analyze’ for llc so we can use Pass:print() in codegen tests.

  • If all else fails I still consider using dbgs() for testing okay. It’s easy enough to run the tests and see if you broke something when you changed a DEBUG() line.

  • Matthias

Right.
Where we can, we prefer to have separate printing passes, but that only
really works well for analysis and preparation transformations.