RFC: Improving lit's debug output

Summary

D154984 and D156954 propose changes to improve the usability of lit’s debug output, which is used by many lit users every day to understand lit test failures. Based on reviewer feedback so far, I have made multiple adjustments to those patches to avoid breaking use cases I was previously not aware of. The purpose of this RFC is to invite the broader LLVM community to check that we have not overlooked other important use cases.

Motivation

Currently, lit’s debug output for a test looks like the following:

 Script:
 --
 : 'RUN: at line 1'; echo hello | FileCheck bogus.txt && echo success
 --
 Exit Code: 2

 Command Output (stdout):
 --
 $ ":" "RUN: at line 1"
 $ "echo" "hello"
 # command output:
 hello

 $ "FileCheck" "bogus.txt"
 # command stderr:
 Could not open check file 'bogus.txt': No such file or directory

 error: command failed with exit status: 2

 --

(The “Caveat” section below discusses some cases when it doesn’t look like that.)

Notice that there are multiple sections:

  • A Script: section shows all shell commands from the test’s RUN lines after lit substitution expansion.
  • A stdout section includes an execution trace that ends at the failed command.
  • A stderr section would be printed next if it had content.

There are a number of issues with the current behavior:

  • Redundant output: All shell commands that actually execute appear in the debug output twice, once in the Script: section, and once in the execution trace. Especially for tests with many RUN lines, the result seems unnecessarily noisy. When searching through the output for a particular shell command, it is easy to get lost and mistake shell commands under Script: for shell commands that actually executed.
  • Unfaithful shell commands: In the execution trace, shell commands are not printed faithfully. They drop redirections. They break apart RUN lines at &&, |, etc. Instead of using proper shell quoting, each command-line argument is blindly enclosed in double quotes.
  • Poorly delineated stdout/stderr: In the execution trace, shell command stdout, stderr, exit status, etc. are interleaved with the shell commands and are not clearly delineated.
  • Difficult copy/paste of shell commands: While debugging a test, some lit users copy and paste shell commands into a terminal to reproduce results. For this purpose, copying shell commands from the execution trace seems convenient because the previous results of those commands are co-located with them. However, again, the commands are not always printed faithfully here. Moreover, because of the interleaved results, and because of the $ at the start of every command, commands must be copied one at a time. Some lit users thus currently depend on the Script: section for copying and pasting commands, but results are not co-located there for convenient comparison.
  • Execution trace disabled by default: Even when a test’s output is shown (requires -v for failed tests, or -a for all tests), the execution trace is sometimes (see “Caveat” section below for when) disabled by default. In that case, to determine which command actually failed, the test usually needs to be run again with -vv.
  • Future lit extensions: D154987 proposes PYTHON lines that can appear among RUN lines. How should they be represented in the Script: section, which has previously been just a shell script? Adding info about PYTHON lines in the execution trace seems more straight-forward and more useful, and D154987 already implements that. (I hope we can avoid debating PYTHON lines in this particular RFC, but I think it’s worthwhile to disclose the RFC’s full context.)

Changes

D154984 and D156954 convert the above output into the following:

 Exit Code: 2

 Command Output (stdout):
 --
 # RUN: at line 1
 echo hello | FileCheck bogus-file.txt && echo success
 # executed command: echo hello
 # .---command stdout------------
 # | hello
 # `-----------------------------
 # executed command: FileCheck bogus-file.txt
 # .---command stderr------------
 # | Could not open check file 'bogus-file.txt': No such file or directory
 # `-----------------------------
 # error: command failed with exit status: 2

 --

They address the above issues as follows:

  • The Script: section has been dropped entirely.
  • The entire execution trace can be copied and pasted in bulk to a terminal for correct execution of the RUN lines, which are printed intact as they appeared in the original RUN lines except lit substitutions are expanded. Everything else in the execution trace appears in shell comments so it has no effect in a terminal.
  • Each of the RUN line’s commands is repeated (in shell comments) as it executes to show (1) that the command actually executed (e.g., echo success above didn’t) and (2) what stdout, stderr, non-zero exit status, and output files are associated with the command, if any. Shell quoting in the command is now correct and minimal.
  • The start and end of the contents of stdout, stderr, or an output file is now delineated clearly in the trace.
  • If a test’s debug output is shown by -v or -a, the execution trace is always enabled, and -vv has no additional effect. Without these patches, -vv implied -v, so these patches make them aliases for backward compatibility.

Caveat

Lit can be configured to execute RUN lines using either lit’s internal shell or an external shell (e.g., bash, sh, window’s cmd). Lit’s internal shell is usually best for test portability because it’s implemented in python. To configure a test suite to use lit’s internal shell, make sure the following line appears in the test suite’s lit.cfg or other configuration file:

config.test_format = lit.formats.ShTest(execute_external=False)

It is important to qualify the changes these patches make based on the shell that lit is configured to use:

  • The changes these patches make for -vv affect external shells only. Lit’s internal shell already has the desired behavior that -v and -a always produce the execution trace and that -vv is an alias for -v.
  • All debug output examples shown above are for lit’s internal shell. The changes these patches make for the execution trace format affect lit’s internal shell only. In most cases, we do not know how to force external shells to produce execution traces in the manner we want, and the output looks quite different than shown in the examples above.
  • These patches drop the Script: section for all shells.
15 Likes

THANK you for working on this! I have been working with LLVM for years and I still get confused about the lit verbose output. This looks so much easier to read.

I am not really qualified to review these patches - but I really hope they land!

4 Likes

Context to readers who are unfamiliar (I was in this category before I read some code of lit to try reviewing some patches):

% rg 'REQUIRES:.*shell' -l | wc -l
165

I am in favor of the the formatting change (⚙ D156954 [lit] Improve test output from lit's internal shell), even if it is specific to the internal shell.

I am also in favor of ⚙ D154984 [lit] Drop "Script:", make -v and -a imply -vv , as path/to/llvm-lit -v a.test now tells the failed command. Previously I always use -vv to tell the failed command (Script: output lists all commands; when there are many, it’s unclear which one failed).

1 Like

This is a huge improvement compared to what we have today, big +1 from me.

Also massive thank you for working on this @jdenny-ornl ! :pray:t2:

1 Like

Nice ideas being presented here!

One nit: Prefixing the non-command output with # makes for easy copy-paste to a bash-like shell, but not to a Windows cmd terminal, where that’s not a comment character. It would need a rem prefix on Windows.

Prefixing all the output lines seems a bit weird at first, but lit scripts mainly don’t output to the terminal so it’s probably fine. Although it would be nice to know how it works for unittests, which do tend to write to the terminal a fair amount. Even there it’s probably okay, but it would be nice to see an example.

Thanks for this pointer. If I understand correctly, in the case that lit’s internal shell is enabled, REQUIRES: shell disables a test because the test requires an external shell. Is that right? Hopefully LIT_USE_INTERNAL_SHELL=1 is used in that case. I think lit.formats.ShTest(execute_external=False) would permanently disable those tests.

Good point. It should be straight-forward to change the comment sequence for windows.

I’m not a regular windows user. Do shell commands from lit’s internal shell reliably work in windows cmd?

Sorry, I’m afraid I’m not following. What do you mean by “output to the terminal”? Write to stdout/stderr?

Actually, using a colon as the prefix might be enough. I think that’s why the current lit prints out the script with them:
: 'RUN: at line 1'; echo hello | FileCheck bogus.txt && echo success
Does that work for bash as well?

I don’t know, I never do that. I’m aware that lit has a thing called the internal shell, but I haven’t a clue how that affects the construction of commands or what gets printed by -vv. Typically I set myself up in a scratch directory and work out commands to use by hand.

Yes, sorry I could have been clearer there.

Lit doesn’t use the colon command in windows. Instead, it uses echo 'RUN: at line 1' > nul. My recollection is that the colon command doesn’t exist in cmd.

Another problem with the colon command is it requires shell-quoting the argument, which could really mangle some debug output.

Okay. Typing a colon in front of arbitrary stuff on my cmd shell appears to act like a comment. It appears to throw away everything on the line, it’s not terminated by a semicolon as in the example. But we can fine-tune that as things go on. The basic ideal looks pretty good.

Which shell you use affects what you can put in RUN lines. Lit’s internal shell is mostly (maybe completely) a subset of bash.

That’s usually been my habit as well, but other lit users say they copy and paste from lit debug output, and I can certainly see how that’s convenient.

: is apparently a goto label in cmd. It seems we should stick with rem in windows.

Something that I’m missing with lit output is an easy ability to know which command fails (but I didn’t know about the internal shell either).
Would it be possible for the command output to actually show also a tag [SUCCESS] and [FAILED: <ret code>] for each invocation?

3 Likes

I think the OP example under “Changes” makes that pretty clear? The output of each individual command is preceded by an “executed command” line.

You’re right! I saw the Exit Code: 2 at the top on the first line, but didn’t see that each command also had its own results (beyond the output).

Just tried something locally (without the new patches), and one minor thing is that stdout is printed before stderr, while it’s probably more useful to read first the error before the standard output?

Anyway, great work!

I am wondering if we should turn on the internal shell by default in the MLIR test suites, what’s the downside to that? Why isn’t it the default for lit in general?

1 Like

Lit’s internal shell isn’t a full bash implementation. Many bash shell constructs won’t work.

1 Like

When they fail, they can write a ton to stderr, especially if the failing command is FileCheck.

I don’t believe these patches have any effect on unit tests, which I believe always use lit.formats.GoogleTest not lit.formats.ShTest.

Thanks for the effort here. I think the new format looks really useful.

To chime in on the sub-thread with @pogo59, it’s a little more complex than “Windows” or “not Windows” regarding the comment marker. By default these days, the Windows command prompt that you open is in fact PowerShell, which recognises # as a comment marker, but not REM. However, there are many cases where cmd is used, so we still need to handle that somehow, ideally. I would argue that it would be better to support PowerShell by default, if it’s too difficult to figure out which is actually being used, but I don’t feel strongly about it.

One downside with PowerShell is that the > operator is “reserved” and isn’t stdin redirection, for some unfathomable (to me) reason. You have to use a completely different invocation to get stdin redirection with PowerShell. Where the redirection is important to the test, I usually just drop into cmd instead.

Thanks for pointing this out.

Of course, one possibility is to stick with # and thus require cmd users to just copy and paste one command line at a time. Reminder: The point of shell-commenting all debug output except full command lines is to permit copying and pasting multiple command lines at once.

If that’s not good enough, we need to know the right comment sequence. It sounds like there’s no way to predict what shell (cmd, PowerShell, bash in Windows Subsystem for Linux, etc.) a lit user in windows might paste commands into. In that case, they’ll have to tell lit, probably via a lit command-line option. Would something like --windows-cmd-comment be best? Or would it be better to make it general like --shell-comment=rem where rem can be replaced by any string?

Interesting. Issues like this make me wonder whether copying shell commands written for lit’s (bash-like) internal shell even makes sense when pasting to windows shells. I’m not experienced enough with windows shells to make this call.

It’s not? Are you confusing it with <? > Generally works fine.

Don’t remember if they’ve implemented it in PowerShell Core 7 (I know they’ve added && and || in that version), but it doesn’t ship by default so a bit irrelevant anyway.