Multiple documents in one test file

Sometimes it is convenient if we can specify multiple independent tests
in one file. To give an example, let's discuss test/MC/ELF/debug-md5.s and
test/MC/ELF/debug-md5-err.s (.file directive in the assembler).

a) An invalid .file makes the whole file invalid. Because errors lead to a
non-zero exit code, We have to use `RUN: not llvm-mc %s` for the whole file.
This often lead to users placing good (`RUN: llvm-mc %s`) and bad tests (`RUN:
not llvm-mc %s`) separately. For some features, having both good and bad tests
in one file may improve readability.
b) .debug_line is a global resource. Whenever we add a (valid) .file, we
contribute an entry to the global resource. If we want to test some
characteristics when include_directories[0] is A, and other characteristics
when include_directories[0] is B, we have to use another test file.

The arguments apply to many other types of tests (opt on .ll, llc on .ll and .mir, clang on .c, yaml2obj on .yaml, etc).

I have a patch teaching llvm-mc about an option to split input: ⚙ D83725 [llvm-mc] Add --doc-id=<id> to support multiple documents in a file
(30+ lines)

In a comment, Richard Smith mentioned whether we can add a separate extractor utility:

# RUN: extract bb %s | llvm-mc - 2>&1 | FileCheck %s --check-prefix=BB

or

# RUN: extract bb %s -o %t.bb
# RUN: llvm-mc %t.bb 2>&1 | FileCheck %t.bb

The advantage is its versatility. The downside is somewhat verbose syntax.

Some questoms:

1. What do people think of the two approaches? An in-utility option vs a standalone utility.
2. For llvm-mc, if we go with an option, is there a better name than --doc-id? David Blaikie proposed --asm-id
   (This is my personal preference, trading 30+ lines in a utility for simpler syntax)
3. If we add a standalone utility, how shall we name it? (Note that llvm-extract exists, but people can probably distinguish 'extract' from llvm-extract

(+Richard - it’s handy to include folks from previous discussions explicitly so everyone can more easily keep track of the conversation)

Sometimes it is convenient if we can specify multiple independent tests
in one file. To give an example, let’s discuss test/MC/ELF/debug-md5.s and
test/MC/ELF/debug-md5-err.s (.file directive in the assembler).

a) An invalid .file makes the whole file invalid. Because errors lead to a
non-zero exit code, We have to use RUN: not llvm-mc %s for the whole file.
This often lead to users placing good (RUN: llvm-mc %s) and bad tests (RUN: not llvm-mc %s) separately. For some features, having both good and bad tests
in one file may improve readability.
b) .debug_line is a global resource. Whenever we add a (valid) .file, we
contribute an entry to the global resource. If we want to test some
characteristics when include_directories[0] is A, and other characteristics
when include_directories[0] is B, we have to use another test file.

The arguments apply to many other types of tests (opt on .ll, llc on .ll and .mir, clang on .c, yaml2obj on .yaml, etc).

I have a patch teaching llvm-mc about an option to split input: https://reviews.llvm.org/D83725
(30+ lines)

In a comment, Richard Smith mentioned whether we can add a separate extractor utility:

# RUN: extract bb %s | llvm-mc - 2>&1 | FileCheck %s --check-prefix=BB

or

# RUN: extract bb %s -o %[t.bb](http://t.bb)
# RUN: llvm-mc %[t.bb](http://t.bb) 2>&1 | FileCheck %[t.bb](http://t.bb)

Could make “extract” work a bit like “tee” so it can still be one line:

RUN: extract bb %s -o %t.bb | llvm-mc - 2>&1 | FileCheck %t.bb

(could even make it a bit shorter for convenience - ‘ex’ or something)

The advantage is its versatility. The downside is somewhat verbose syntax.

Some questoms:

  1. What do people think of the two approaches? An in-utility option vs a standalone utility.
  2. For llvm-mc, if we go with an option, is there a better name than --doc-id? David Blaikie proposed --asm-id
    (This is my personal preference, trading 30+ lines in a utility for simpler syntax)
  3. If we add a standalone utility, how shall we name it? (Note that llvm-extract exists, but people can probably distinguish ‘extract’ from llvm-extract

I think some of the truly internal utilities are named without the llvm prefix - isn’t stuff like “not” actually implemented as a local tool? hmm, guess not, maybe that’s a built-in inside lit.
Only risk I can think of with the name is the auto-name expansion of lit replacing any token ‘ex’ with the full path to the tool, so you might have to be careful about not using that character sequence as a standalone argument on a RUN line - but that seems OK.

`extract` +1 for consistency.

We have a similar option (-split-input-file) in mlir-opt: https://github.com/llvm/llvm-project/blob/master/mlir/test/Dialect/Affine/invalid.mlir

With a single RUN: lit invocation the tool itself will loop over all the split sections in the file. This is convenient to test error cases where the tool would abort at the first error otherwise. I don’t think we can easily achieve this with a single pipe and a separate extract command though?

Interesting. `-split-input-file` is indeed terse and performant for tests with similar run lines. To generalize the functionality a bit, I'm not sure if people have had the idea of let LIT to the splitting using some common separator lines like `# SEP: #`, `// SEP: //` etc.

FWIW, the way I've done this in llvm-mc so far is via a combination of
"--defsym CASE<N>" command line argument and ".ifdef" asm directives.
This has the advantage that individual "documents" don't need to be
fully standalone (though they can be), so you can put the common parts
of the tests into an unconditionally compiled block.

That said, I was using this technique for constructing test cases for
other tools via llvm-mc. Things might get a bit awkward if you try to
test .ifdef processing itself this way...

pl

+1 for --defsym. I'm sure I've written pairs of tests that could have been
done more simply this way.

Regarding maskray's original post, which is about valid/invalid .file
directives in the same .s file, --defsym and .ifdef work perfectly:

.ifdef CASE1
.file 1 "./case1.c"
.endif
.ifdef CASE2
.file 1 "./case2.c"
.endif
nop

llvm-mc --defsym=CASE1=1 will emit a .debug_line with "case1.c"
and --defsym=CASE2=1 will emit it with "case2.c". So this would be
ideal for verifying the MD5 cases (good and bad) in one test file.
--paulr

My bad memory that I should have considered .ifdef (I just saw it last
week [PATCH] RISC-V: Support GNU indirect functions. ).
The syntax appears to be good enoguh. An advantage other than being a
built-in feature is that line numbers are retained.

A note about the proposed external tool 'extract': we probably should
insert '\n' to retain the original line numbers, so that the following
can work:

#--- aa
[[#@LINE+1]]: error: .....
.....

Created https://reviews.llvm.org/D83834 "Add test utility 'extract'"

Some metaprogramming in lit will be helpful.

For example, the below is adapted from a real test case. The test wants to
checks several variants of Mips. The duplicated code is acceptable
becuase the commands are short:

   # RUN: llvm-mc -triple=mips-unknown-linux-gnu %s \
   # RUN: | FileCheck --check-prefix=ASM %s
   # RUN: llvm-mc -triple=mips64-unknown-linux-gnu %s \
   # RUN: | FileCheck --check-prefix=ASM %s
   # RUN: llvm-mc -triple=mipsel-unknown-linux-gnu %s --target-abi=o32 \
   # RUN: | FileCheck --check-prefix=ASM %s

   (Similar examples can be found in clang: -std=c++11 -std=c++14 -std=c++17)

In lld/test/ELF/ there are some bad cases, e.g lld/test/ELF/ppc64-tls-le.s
It needs to test both little-endian and big-endian variants. Since one
test consists of multiple commands, it looks awful:

   # RUN: llvm-mc -filetype=obj -triple=powerpc64le %s -o %t.o
   # RUN: llvm-readobj -r %t.o | FileCheck --check-prefix=INPUT-REL %s
   ## IE
   # RUN: ld.lld -shared %t.o -o %t.so
   # RUN: llvm-readobj -r %t.so | FileCheck --check-prefix=IE-REL %s
   # RUN: llvm-objdump -d --no-show-raw-insn %t.so | FileCheck --check-prefix=IE %s
   ## IE -> LE
   # RUN: ld.lld %t.o -o %t
   # RUN: llvm-readelf -r %t | FileCheck --check-prefix=NOREL %s
   # RUN: llvm-objdump -d --no-show-raw-insn %t | FileCheck --check-prefix=LE %s
      # RUN: llvm-mc -filetype=obj -triple=powerpc64 %s -o %t.o
   # RUN: llvm-readobj -r %t.o | FileCheck --check-prefix=INPUT-REL %s
   ## IE
   # RUN: ld.lld -shared %t.o -o %t.so
   # RUN: llvm-readobj -r %t.so | FileCheck --check-prefix=IE-REL %s
   # RUN: llvm-objdump -d --no-show-raw-insn %t.so | FileCheck --check-prefix=IE %s
   ## IE -> LE
   # RUN: ld.lld %t.o -o %t
   # RUN: llvm-readelf -r %t | FileCheck --check-prefix=NOREL %s
   # RUN: llvm-objdump -d --no-show-raw-insn %t | FileCheck --check-prefix=LE %s

RISC-V tests can be unpleasant as well (32-bit and 64-bit variants)

   # RUN: llvm-mc -filetype=obj -triple=riscv32 %s -o %t.32.o
   # RUN: ld.lld -pie %t.32.o -o %t.32
   # RUN: llvm-readelf -s %t.32 | FileCheck --check-prefix=SYM32 %s
   # RUN: llvm-readelf -S %t.32 | FileCheck --check-prefix=SEC32 %s
   # RUN: not ld.lld -shared %t.32.o -o /dev/null 2>&1 | FileCheck --check-prefix=ERR %s
      # RUN: llvm-mc -filetype=obj -triple=riscv64 %s -o %t.64.o
   # RUN: ld.lld -pie %t.64.o -o %t.64
   # RUN: llvm-readelf -s %t.64 | FileCheck --check-prefix=SYM64 %s
   # RUN: llvm-readelf -S %t.64 | FileCheck --check-prefix=SEC64 %s
   # RUN: not ld.lld -shared %t.64.o -o /dev/null 2>&1 | FileCheck --check-prefix=ERR %s

This can be simplified if lit supports for loop constructs:

   # RUN: %for i in 32 64
   # RUN: llvm-mc -filetype=obj -triple=riscv%i %s -o %t.%i.o
   # RUN: ld.lld -pie %t.%i.o -o %t.%i
   # RUN: llvm-readelf -s %t.%i | FileCheck --check-prefix=SYM%i %s
   # RUN: llvm-readelf -S %t.%i | FileCheck --check-prefix=SEC%i %s
   # RUN: not ld.lld -shared %t.%i.o -o /dev/null 2>&1 | FileCheck --check-prefix=ERR %s
   # RUN: %}

The clang has a `clang-offload-bundle` tool that can combine and
extract multiple IR modules in/from a single file. Implementation is
here: https://github.com/llvm/llvm-project/blob/master/clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp

Michael