[PATCH] A "very verbose" mode for FileCheck

Hi Dimitri,

This has long been on my list of nice things to solve, thanks for tackling it.

I actually would prefer to solve this problem in a completely different way, however, that is not specific to FileCheck.

This is how I would like this to work:

  1. We extend ‘lit’ to support associating files with test results. This is something that is good to have for other reasons, anyway.

  2. We provide a mode to lit which captures verbose test output. In those mode, it does several things:

a. Any pipes are replaced with temporary files.
b. When tests fail, any temporary files (either from pipes or from %t markers) are included with the test results.

  1. We extend the ‘lit’ test output to have a clear way to delineate additional output files, and include them in the test output (maybe with some size limits).

  2. We extend the buildbot ‘lit’ test runner to be smart enough to extract these files out of the ‘lit’ test output, and attach them to the buildbot results.

While this proposal probably involves more work than the yours, here are the reasons I prefer it:

  1. It extends to all ‘lit’ uses, not just FileCheck.

  2. It doesn’t require modifying the tests.

  3. Some of the extra work (like extending lit to associate files with test results) is useful infrastructure for future extensions.

What do you think?

  • Daniel

Note that as far as places to put temporary files, the right place to put them is alongside the other test outputs in the test output “sandbox” directory.

Somewhat orthogonal, but we should also fix up lit to purge those sandboxes before it starts a new test run.

  • Daniel

Note that as far as places to put temporary files, the right place to put
them is alongside the other test outputs in the test output "sandbox"
directory.

Somewhat orthogonal, but we should also fix up lit to purge those sandboxes
before it starts a new test run.

SGTM - though if we are going to go with the plan/features you've
outlined, it might not be so unreasonable to, rather than tunnelling
the log files through the lit output & splitting them back out, do the
work to actually gather those log files directly (it's a bit more work
in the buildbot configuration - enough that I didn't think I could
justify it given Dmitri's original proposal - but seems like it would
simplify this change & leave us roughly where I was discussing earlier
(though your suggestion of generalizing this over all of lit, rather
than just FileCheck is a step beyond what I'd proposed - it sounds
good/right though))

I asked about how to do this in the freenode buildbot channel & they
mentioned that it is possible to name log files dynamically to be
retrieved from the slave - I haven't looked into it in detail because
it did sound a bit more complicated, but should be achievable.

The drawback to your approach is that we'd have to enable this feature
unconditionally - rather than having the optimization advantage of
only dumping files on failure (see my question earlier in the thread,
Eli's concern that dumping output would be expensive, and Dmitri's
response that we'd only be dumping on failure anyway). Given that it
seems the vast majority of our failures aren't flakey, we could have
lit setup to rerun failures in a "create all the temporary files"
mode, though missing flakes would be unfortunate.

> Note that as far as places to put temporary files, the right place to put
> them is alongside the other test outputs in the test output "sandbox"
> directory.
>
> Somewhat orthogonal, but we should also fix up lit to purge those
sandboxes
> before it starts a new test run.

SGTM - though if we are going to go with the plan/features you've
outlined, it might not be so unreasonable to, rather than tunnelling
the log files through the lit output & splitting them back out, do the
work to actually gather those log files directly (it's a bit more work
in the buildbot configuration - enough that I didn't think I could
justify it given Dmitri's original proposal - but seems like it would
simplify this change & leave us roughly where I was discussing earlier
(though your suggestion of generalizing this over all of lit, rather
than just FileCheck is a step beyond what I'd proposed - it sounds
good/right though))

I'm fine not having the files dumped into the output, but I at least want
the buildbot to get the names of the files to collect from the lit output,
not just by scanning the folder.

I asked about how to do this in the freenode buildbot channel & they

mentioned that it is possible to name log files dynamically to be
retrieved from the slave - I haven't looked into it in detail because
it did sound a bit more complicated, but should be achievable.

Yes, it is possible. I can help with the buildbot implementation if need be.

The drawback to your approach is that we'd have to enable this feature
unconditionally - rather than having the optimization advantage of
only dumping files on failure (see my question earlier in the thread,
Eli's concern that dumping output would be expensive, and Dmitri's
response that we'd only be dumping on failure anyway). Given that it
seems the vast majority of our failures aren't flakey, we could have
lit setup to rerun failures in a "create all the temporary files"
mode, though missing flakes would be unfortunate.

In my experience, there isn't actually that much performance difference
(and sometimes a loss) of using a file instead of a pipe. With all the
other stuff going on in the test suite I would be surprised if this added
much impact. Note that I personally often write my tests to use temporary
files instead of pipes anyway just because I like how the RUN lines look,
so its not exactly like we aren't already storing many of these files.

There would be some code simplification advantages internally to lit too if
it avoided actually using pipes, although I'm not sure I want to go as far
as dropping that support.

If it is measurable, then my proposal is to do as you say and just rerun
the test with the collection enabled. I actually think that lit by default
should rerun failing tests so that it can diagnose flaky tests, so this
fits in with that.

- Daniel

> Note that as far as places to put temporary files, the right place to
> put
> them is alongside the other test outputs in the test output "sandbox"
> directory.
>
> Somewhat orthogonal, but we should also fix up lit to purge those
> sandboxes
> before it starts a new test run.

SGTM - though if we are going to go with the plan/features you've
outlined, it might not be so unreasonable to, rather than tunnelling
the log files through the lit output & splitting them back out, do the
work to actually gather those log files directly (it's a bit more work
in the buildbot configuration - enough that I didn't think I could
justify it given Dmitri's original proposal - but seems like it would
simplify this change & leave us roughly where I was discussing earlier
(though your suggestion of generalizing this over all of lit, rather
than just FileCheck is a step beyond what I'd proposed - it sounds
good/right though))

I'm fine not having the files dumped into the output, but I at least want
the buildbot to get the names of the files to collect from the lit output,
not just by scanning the folder.

While I'd tend to agree - is there a particular reason you're not in
favor of collecting from the folder?

(one reason I can think of is that it might mean we'd lose (or have to
work harder to maintain) the association between test files and
temporary files - but we do name the temp files in a structured way
relative to the test file name, so that doesn't seem terribly hard (&
we'll have to use a similar naming structure for the output file names
on the buildbot anyway, to make it easier to pick out the output files
associated with a failure)

I asked about how to do this in the freenode buildbot channel & they
mentioned that it is possible to name log files dynamically to be
retrieved from the slave - I haven't looked into it in detail because
it did sound a bit more complicated, but should be achievable.

Yes, it is possible. I can help with the buildbot implementation if need be.

Yep, mightn't hurt.

The drawback to your approach is that we'd have to enable this feature
unconditionally - rather than having the optimization advantage of
only dumping files on failure (see my question earlier in the thread,
Eli's concern that dumping output would be expensive, and Dmitri's
response that we'd only be dumping on failure anyway). Given that it
seems the vast majority of our failures aren't flakey, we could have
lit setup to rerun failures in a "create all the temporary files"
mode, though missing flakes would be unfortunate.

In my experience, there isn't actually that much performance difference (and
sometimes a loss) of using a file instead of a pipe. With all the other
stuff going on in the test suite I would be surprised if this added much
impact. Note that I personally often write my tests to use temporary files
instead of pipes anyway just because I like how the RUN lines look, so its
not exactly like we aren't already storing many of these files.

Fair enough.

Eli - any counterargument/other views on this issue? (since you'd
mentioned some concern previously)

There would be some code simplification advantages internally to lit too if
it avoided actually using pipes, although I'm not sure I want to go as far
as dropping that support.

If it is measurable, then my proposal is to do as you say and just rerun the
test with the collection enabled. I actually think that lit by default
should rerun failing tests so that it can diagnose flaky tests, so this fits
in with that.

SGTM

>>
>> > Note that as far as places to put temporary files, the right place to
>> > put
>> > them is alongside the other test outputs in the test output "sandbox"
>> > directory.
>> >
>> > Somewhat orthogonal, but we should also fix up lit to purge those
>> > sandboxes
>> > before it starts a new test run.
>>
>> SGTM - though if we are going to go with the plan/features you've
>> outlined, it might not be so unreasonable to, rather than tunnelling
>> the log files through the lit output & splitting them back out, do the
>> work to actually gather those log files directly (it's a bit more work
>> in the buildbot configuration - enough that I didn't think I could
>> justify it given Dmitri's original proposal - but seems like it would
>> simplify this change & leave us roughly where I was discussing earlier
>> (though your suggestion of generalizing this over all of lit, rather
>> than just FileCheck is a step beyond what I'd proposed - it sounds
>> good/right though))
>
>
> I'm fine not having the files dumped into the output, but I at least want
> the buildbot to get the names of the files to collect from the lit
output,
> not just by scanning the folder.

While I'd tend to agree - is there a particular reason you're not in
favor of collecting from the folder?

Just explicit vs implicit, I want the test output to be explicit about what
files are related and the buildbot only to pick up things it was explicitly
told about.

Also, I don't want an implicit hidden dependency between the buildbot
runners "knowing" how lit lays out its test results.

(one reason I can think of is that it might mean we'd lose (or have to

work harder to maintain) the association between test files and
temporary files - but we do name the temp files in a structured way
relative to the test file name, so that doesn't seem terribly hard (&
we'll have to use a similar naming structure for the output file names
on the buildbot anyway, to make it easier to pick out the output files
associated with a failure)

>> I asked about how to do this in the freenode buildbot channel & they
>> mentioned that it is possible to name log files dynamically to be
>> retrieved from the slave - I haven't looked into it in detail because
>> it did sound a bit more complicated, but should be achievable.
>
>
> Yes, it is possible. I can help with the buildbot implementation if need
be.

Yep, mightn't hurt.

>> The drawback to your approach is that we'd have to enable this feature
>> unconditionally - rather than having the optimization advantage of
>> only dumping files on failure (see my question earlier in the thread,
>> Eli's concern that dumping output would be expensive, and Dmitri's
>> response that we'd only be dumping on failure anyway). Given that it
>> seems the vast majority of our failures aren't flakey, we could have
>> lit setup to rerun failures in a "create all the temporary files"
>> mode, though missing flakes would be unfortunate.
>
>
> In my experience, there isn't actually that much performance difference
(and
> sometimes a loss) of using a file instead of a pipe. With all the other
> stuff going on in the test suite I would be surprised if this added much
> impact. Note that I personally often write my tests to use temporary
files
> instead of pipes anyway just because I like how the RUN lines look, so
its
> not exactly like we aren't already storing many of these files.

Fair enough.

Eli - any counterargument/other views on this issue? (since you'd
mentioned some concern previously)

> There would be some code simplification advantages internally to lit too
if
> it avoided actually using pipes, although I'm not sure I want to go as
far
> as dropping that support.
>
> If it is measurable, then my proposal is to do as you say and just rerun
the
> test with the collection enabled. I actually think that lit by default
> should rerun failing tests so that it can diagnose flaky tests, so this
fits
> in with that.

SGTM

Cool!

- Daniel

The drawback to your approach is that we'd have to enable this feature
unconditionally - rather than having the optimization advantage of
only dumping files on failure (see my question earlier in the thread,
Eli's concern that dumping output would be expensive, and Dmitri's
response that we'd only be dumping on failure anyway). Given that it
seems the vast majority of our failures aren't flakey, we could have
lit setup to rerun failures in a "create all the temporary files"
mode, though missing flakes would be unfortunate.

In my experience, there isn't actually that much performance difference (and
sometimes a loss) of using a file instead of a pipe. With all the other
stuff going on in the test suite I would be surprised if this added much
impact. Note that I personally often write my tests to use temporary files
instead of pipes anyway just because I like how the RUN lines look, so its
not exactly like we aren't already storing many of these files.

Fair enough.

Eli - any counterargument/other views on this issue? (since you'd
mentioned some concern previously)

I agree with Daniel's suggestion to benchmark it. I guess that no one
really wants our regression tests to run much slower as a result of
this change, so it's OK to wait and actually measure the performance
impact of such a change.

Eli

One contributing factor to the continued use of grep may be that the
section "Writing New Regression Tests" [1] mentions grep multiple
times, and FileCheck 0 times.

Besides that, I suspect that most new uses of grep come from cargo
culting, so just giving them nothing to cargo cult is an easy and
mechanical way to avoid that part of the issue altogether (i.e. just
removing all uses of grep in the current tests). Also, removing `not`
and `count` from utils/ would incapacitate grep significantly w.r.t.
writing tests with it, discouraging its use and making people look for
a different way.

The cargo culting also points to a lack of adequate documentation
about how to add a new test (`HowToAddATest.rst`? You might be a good
person to write that :wink:
<http://www.llvm.org/docs/SphinxQuickstartTemplate.html>). HowTo
documents are a great way to disseminate best practices.

[1] <http://llvm.org/docs/TestingGuide.html#writing-new-regression-tests>

-- Sean Silva

Good points, Sean. I'll take a look what I can do there later this week.

Eli