[RFC] Compiled regression tests.

Hello LLVM community,

For testing IR passes, LLVM currently has two kinds of tests:
1. regression tests (in llvm/test); .ll files invoking opt, and
matching its text output using FileCheck.
2. unittests (in llvm/unittests); Google tests containing the IR as a
string, constructing a pass pipeline, and inspecting the output using
code.

I propose to add an additional kind of test, which I call "compiled
regression test", combining the advantages of the two. A test is a
single .cxx file of the general structure below that can be dumped
into the llvm/test directory. I am not proposing to replace FileCheck,
but in a lot of cases, domain-specific verifiers can be more powerful
(e.g. verify-uselistorder or `clang -verify`).

    #ifdef IR
      define void @func() {
      entry:
        ret
      }
    #else /* IR */
      #include "compiledtestboilerplate.h"
      TEST(TestSuiteName, TestName) {
        unique_ptr<Module> Output = run_opt(__FILE__, "IR",
"-passes=loop-vectorize");
        /* Check Output */
      }
    #endif /* IR */

That is, input IR and check code are in the same file. The run_opt
command is a replica of main() from the opt tool, so any command line
arguments (passes with legacy or new passmanager, cl::opt options,
etc.) can be passed. It also makes converting existing tests simpler.

The top-level structure is C++ (i.e. the LLVM-IR is removed by the
preprocessor) and compiled with cmake. This allows a
compile_commands.json to be created such that refactoring tools,
clang-tidy, and clang-format can easily be applied on the code. The
second argument to run_opt is the preprocessor directive for the IR
such that multiple IR modules can be embedded into the file.

Such tests can be compiled in two modes: Either within the LLVM
project, or as an external subproject using llvm_ExternalProject_Add.
The former has the disadvantage that new .cxx files dumped into the
test folder are not recognized until the next cmake run, unless the
CONFIGURE_DEPENDS option is used. I found this adds seconds to each
invocation of ninja which I considered a dealbreaker. The external
project searched for tests every time, but is only invoked in a
check-llvm run, no different than llvm-lit. It uses CMake's
find_package to build against the main project's results (which
currently we do not have tests for) and could also be compiled in
debug mode while LLVM itself is compiled in release mode.

The checks themselves can be any of gtest's ASSERT/EXPECT macros, but
for common test idioms I suggest to add custom macros, such as

    ASSERT_ALL_OF(InstList, !isa<VectorType>(I->getType()));

which on failure prints the instruction that does not return a vector.
Try that with FileCheck. PattenMatch.h from InstCombine can be used as
well. Structural comparison with a reference output could also be
possible (like clang-diff,
[llvm-canon](http://llvm.org/devmtg/2019-10/talk-abstracts.html#tech12),
https://reviews.llvm.org/D80916).

Some additional tooling could be helpful:

* A test file creator, taking some IR, wrapping it into the above
structure, and write it into the test directory.
* A tool for extracting and updating (running opt) the IR inside the
#ifdef, if not even add this functionality to opt itself. This is the
main reason to not just the IR inside a string.

A Work-in-Progress differential and what it improves over FileCheck
and unittests is available here: https://reviews.llvm.org/D82426

Any kind of feedback welcome.

I’m pretty change averse - and am in this case (but doesn’t mean other folks won’t be in favor and doesn’t mean it isn’t worth trying, etc - but if it were up to me, at the moment, I’d decline)

To take one point from the phab review (to try to keep the discussion here rather than there - might be worth a note on the phab review to discourage commenting there so the discussion doesn’t get spread through different threads):

Because of all of the above, maintenance of all the regression tests is a nightmare. I expect it to be a serious issue for introducing opaque pointers. My prediction is that we will have a typed-pointer command line flag to not have to require updating all the write-only regression tests.

Actually I already did a lot of work with the initial patches years ago for opaque pointers (that added explicit types to gep/store/etc) and used (& provided in the commit messages) python scripts to migrate all the tests, both the IR itself and the CHECK text. This is probably more readily automatable than a more free-form C++ based checking proposed here.

That said, it sounds like the proposal is a lot closer to the GUnit tests, and if this testing strategy is valuable, it seems like it could be mostly achieved by adding some API utilities (like the one you proposed in the patch) to make it more convenient to run optimization passes in a GUnit test. It doesn’t seem to me like an #ifdef based approach to embedding IR in C++ would result in particularly more manageable/formattable code than a raw string. Perhaps the proposed improvements could be used to reduce/remove the cost of adding new GUnit tests/the need to touch CMake files/etc. (though I’d worry about the divergence of where optimization tests are written - as unit tests or as lit/FileCheck tests - that doesn’t mean experimentation isn’t worthwhile, but I think it’d require a pretty compelling justification to propose a replacement to the FileCheck approach (& perhaps a timeline for an experiment before either removing it, or deciding that it is the intended future state) - and if it’s not a replacement, then I think we’d need to discuss what sort of situations this new thing is suitable and what FileCheck should be used for going forward)

  • Dave

I'm pretty change averse - and am in this case (but doesn't mean other folks won't be in favor and doesn't mean it isn't worth trying, etc - but if it were up to me, at the moment, I'd decline)

To take one point from the phab review (to try to keep the discussion here rather than there - might be worth a note on the phab review to discourage commenting there so the discussion doesn't get spread through different threads):

> Because of all of the above, maintenance of all the regression tests is a nightmare. I expect it to be a serious issue for introducing opaque pointers. My prediction is that we will have a typed-pointer command line flag to not have to require updating all the write-only regression tests.

Actually I already did a lot of work with the initial patches years ago for opaque pointers (that added explicit types to gep/store/etc) and used (& provided in the commit messages) python scripts to migrate all the tests, both the IR itself and the CHECK text. This is probably more readily automatable than a more free-form C++ based checking proposed here.

+1, i'm not sure what kind of scalability issues with updating
existing tests there is.

That said, it sounds like the proposal is a lot closer to the GUnit tests, and if this testing strategy is valuable, it seems like it could be mostly achieved by adding some API utilities (like the one you proposed in the patch) to make it more convenient to run optimization passes in a GUnit test. It doesn't seem to me like an #ifdef based approach to embedding IR in C++ would result in particularly more manageable/formattable code than a raw string. Perhaps the proposed improvements could be used to reduce/remove the cost of adding new GUnit tests/the need to touch CMake files/etc. (though I'd worry about the divergence of where optimization tests are written - as unit tests or as lit/FileCheck tests - that doesn't mean experimentation isn't worthwhile, but I think it'd require a pretty compelling justification to propose a replacement to the FileCheck approach (& perhaps a timeline for an experiment before either removing it, or deciding that it is the intended future state) - and if it's not a replacement, then I think we'd need to discuss what sort of situations this new thing is suitable and what FileCheck should be used for going forward)

- Dave

Roman

I'm pretty change averse - and am in this case (but doesn't mean other folks won't be in favor and doesn't mean it isn't worth trying, etc - but if it were up to me, at the moment, I'd decline)

That's understandable. New features also come with a cost that they
need to recoup with their usefulness.

To take one point from the phab review (to try to keep the discussion here rather than there - might be worth a note on the phab review to discourage commenting there so the discussion doesn't get spread through different threads):

I added a remark to the Phabrictor thread.
(I kept these arguments there to keep the RFC itself short)

> Because of all of the above, maintenance of all the regression tests is a nightmare. I expect it to be a serious issue for introducing opaque pointers. My prediction is that we will have a typed-pointer command line flag to not have to require updating all the write-only regression tests.

Actually I already did a lot of work with the initial patches years ago for opaque pointers (that added explicit types to gep/store/etc) and used (& provided in the commit messages) python scripts to migrate all the tests, both the IR itself and the CHECK text. This is probably more readily automatable than a more free-form C++ based checking proposed here.

That said, it sounds like the proposal is a lot closer to the GUnit tests, and if this testing strategy is valuable, it seems like it could be mostly achieved by adding some API utilities (like the one you proposed in the patch) to make it more convenient to run optimization passes in a GUnit test. It doesn't seem to me like an #ifdef based approach to embedding IR in C++ would result in particularly more manageable/formattable code than a raw string. Perhaps the proposed improvements could be used to reduce/remove the cost of adding new GUnit tests/the need to touch CMake files/etc. (though I'd worry about the divergence of where optimization tests are written - as unit tests or as lit/FileCheck tests - that doesn't mean experimentation isn't worthwhile, but I think it'd require a pretty compelling justification to propose a replacement to the FileCheck approach (& perhaps a timeline for an experiment before either removing it, or deciding that it is the intended future state) - and if it's not a replacement, then I think we'd need to discuss what sort of situations this new thing is suitable and what FileCheck should be used for going forward)

As mentioned in the Differential, generating the tests automatically
will lose information about what actually is intended to be tested,
making it harder to understand the test.
If the had a method to just update all tests (Polly as `ninja
polly-update-format` to automatically update `ninja
polly-check-format` failures), updating is easier, but because of the
lack of understanding in practice most changes will just be glossed
over.

We already have a split between unittests (eg.
llvm/Transforms/Scalar/LICMTests.cpp) and regression tests
(llvm/test/Transforms/LICM/*.ll). New tests in D82426 are located next
to the .ll files. The unittests could be moved there too.

Michael

> I'm pretty change averse - and am in this case (but doesn't mean other folks won't be in favor and doesn't mean it isn't worth trying, etc - but if it were up to me, at the moment, I'd decline)

That's understandable. New features also come with a cost that they
need to recoup with their usefulness.

> To take one point from the phab review (to try to keep the discussion here rather than there - might be worth a note on the phab review to discourage commenting there so the discussion doesn't get spread through different threads):

I added a remark to the Phabrictor thread.
(I kept these arguments there to keep the RFC itself short)

> > Because of all of the above, maintenance of all the regression tests is a nightmare. I expect it to be a serious issue for introducing opaque pointers. My prediction is that we will have a typed-pointer command line flag to not have to require updating all the write-only regression tests.
>
> Actually I already did a lot of work with the initial patches years ago for opaque pointers (that added explicit types to gep/store/etc) and used (& provided in the commit messages) python scripts to migrate all the tests, both the IR itself and the CHECK text. This is probably more readily automatable than a more free-form C++ based checking proposed here.
>
> That said, it sounds like the proposal is a lot closer to the GUnit tests, and if this testing strategy is valuable, it seems like it could be mostly achieved by adding some API utilities (like the one you proposed in the patch) to make it more convenient to run optimization passes in a GUnit test. It doesn't seem to me like an #ifdef based approach to embedding IR in C++ would result in particularly more manageable/formattable code than a raw string. Perhaps the proposed improvements could be used to reduce/remove the cost of adding new GUnit tests/the need to touch CMake files/etc. (though I'd worry about the divergence of where optimization tests are written - as unit tests or as lit/FileCheck tests - that doesn't mean experimentation isn't worthwhile, but I think it'd require a pretty compelling justification to propose a replacement to the FileCheck approach (& perhaps a timeline for an experiment before either removing it, or deciding that it is the intended future state) - and if it's not a replacement, then I think we'd need to discuss what sort of situations this new thing is suitable and what FileCheck should be used for going forward)

As mentioned in the Differential, generating the tests automatically
will lose information about what actually is intended to be tested,

Agreed - and I didn't mean to suggest tests should be automatically
generated. I work pretty hard in code reviews to encourage tests to be
written as stable-y as possible so they are resilient to unrelated
changes. The python scripts I wrote to update tests didn't require
tests to be written in an automated fashion but were still able to
migrate the significant majority of hand-written FileCheck test cases.

making it harder to understand the test.
If the had a method to just update all tests (Polly as `ninja
polly-update-format` to automatically update `ninja
polly-check-format` failures), updating is easier, but because of the
lack of understanding in practice most changes will just be glossed
over.

We already have a split between unittests (eg.
llvm/Transforms/Scalar/LICMTests.cpp) and regression tests
(llvm/test/Transforms/LICM/*.ll). New tests in D82426 are located next
to the .ll files. The unittests could be moved there too.

Yep, if there's a nice way to change lit/cmake so that unit tests can
live alongside the FileCheck tests, I think that might be an
improvement. Though for now I'd only expect to find a unit test for
something when it's not reasonably FileCheck testable, in this case it
looks like the LICMTest.cpp is testing side effects on analyses when
running LICM, which makes some sense - analyses have no textual
representation, so can't be readily FileCheck tested, only their side
effects can.

- Dave

Hi,

My argument is that it is hard to impossible to really only test the
relevant bits using FileCheck. CHECK-DAG, named regexes etc are
mechanisms making this possible, but at the same time make the
verification more complicated.
I don't think it is feasible to write single-purpose update scripts
for most changes. Even if there is one, it is even less feasible to
ensure for all tests that they still test what was originally
intended, especially with CHECK-NOT.

I had to put a lot of effort into updating loop metadata tests.
Because metadata nodes have sequential numbers in order the IR emitter
decides to put them, it is tempting to use the form ![[NODENAME:.*]]
for each occurance so you can reorder the lines in the order they
occur, as indeed I found in many regression tests. Three problems with
this:
1. When the regex is specified, it will overwrite the content of
previous placeholders, i.e. if used everywhere, it is equivalent to
{{.*}}
2. Using a backtracking regex engine, there are inputs with
exponential time behaviour.having mistake
3. It will match more than needed, e.g. a list of nodes
{![[NODENAME:.*]]} will also match !{!0, !1} and FileCheck will
complain somewhere else that

!0 = {!0, !1}

does not match

![[NODENAME]] = {![[NODENAME]], ![[LOOP1:.*]]}

(if not the 'regex everywhere' mistake was made)

A "robust" test would use [[NODENAME:[0-9]+]] and CHECK-DAG, as some
tests do, but also making the CHECK lines even longer and more
cluttered. In contrast to instructions, not all metadata lines have
recognizable keywords that could indicate what it might intend to
match. CHECK-DAG will happily match with any metadata node that has
the same number of operands, deferring a mismatch report to some later
CHECK line, but due to the DAG part continuing to match with previous
lines.

Unfortunately, most tests we have do not even use placeholders for
metadata nodes, including those generated by update_test_checks.py. I
looked into improving the script in this regard, but its
implementation is function-centric, making it difficult to add
module-level placeholders.

As a result, I estimate that I had to invest about twice the time to
update/fix tests than writing the code changes themselves. Due to my
experience, I find updating FileCheck tests very frustrating.
I'd prefer not to test whether specific metadata nodes are present,
but whether the LLVM API to query them (such as `isAnnotatedParallel`)
returns the expected result.

Michael

Hi,

Hello LLVM community,

For testing IR passes, LLVM currently has two kinds of tests:
1. regression tests (in llvm/test); .ll files invoking opt, and
matching its text output using FileCheck.
2. unittests (in llvm/unittests); Google tests containing the IR as a
string, constructing a pass pipeline, and inspecting the output using
code.

I propose to add an additional kind of test, which I call "compiled
regression test", combining the advantages of the two.

You expand below on the mechanism you'd like to implement, but I am a bit puzzled about the motivation right now?

See https://reviews.llvm.org/D82426 and
http://lists.llvm.org/pipermail/llvm-dev/2020-June/142706.html for
more motivation.

I'm failing to see what kind of IR-level test (unittests are relevant for data-structures and non-IR internals IMO) we would implement this way that we can't just implement with lit/FileCheck?

My argument is not that those tests cannot be written using FileCheck,
but tend (not all of them) to check more than relevant, may be truisms
and are difficult to reverse-engineer and to update when something
changes.

Michael

I thought that preprocessor directives might be easier to identify if
one needs to do a round-trip of the embedded IR through opt (e.g. to
update from older syntax).

However, clang-format leaves raw-strings untouched, but tries to
format the IR between preprocessor directives, so I tend to agree: raw
strings are better.

Michael

Hi Michael,

You propose using the preprocessor for mixing C++ test code with its input code, such as LLVM IR. Of course, LIT, FileCheck, clang -verify, and unit tests all enable mixing their various forms of test code with input code. To my eyes, a difference between unit tests vs. LIT, FileCheck, and clang -verify tests is that the latter tend to make the input code more prominent (I prefer that) and make it easier to clarify which test code is associated with which input code. So far, I think the preprocessor approach is better in this regard than unit tests, but what about a more familiar syntax, like the following?

// RUN-CXX: #include "compiledtestboilerplate.h"
// RUN-CXX: unique_ptr<Module> Output = run_opt("%s", "-passes=loop-vectorize");

// RUN-CXX: /* Check func1 Output */
define void @func1() {
entry:
ret
}

// RUN-CXX: /* Check func2 Output */
define void @func2() {
entry:
ret
}

It seems it should be feasible to automate extraction of the RUN-CXX: code for compilation and for analysis by clang-format and clang-tidy. Perhaps there would be a new script that extracts at build time for all such uses. But there are other possibilities that could be considered: a LIT extension for RUN-CXX:, C++ JIT compilation, a clang-format-diff.py extension that greps modified files for RUN-CXX:, etc.

LIT, FileCheck, and RUN-CXX: directives should then be able to co-exist in a single test file. Thus, you might incrementally add RUN-CXX: directives to test files that already contain LIT and FileCheck directives to handle cases where FileCheck directives are difficult to use. You could keep the FileCheck directives when they are more reasonable, or you could eventually replace them. You might run opt once with a RUN: directive and then check its output .ll file with both RUN-CXX: and FileCheck directives (maybe all RUN: directives execute before all RUN-CXX: directives, or maybe C++ JIT compilation would permit them to execute in the order specified).

It’s not clear to me whether the above idea is worth the trouble, but I think I’d at least prefer the syntax to the preprocessor approach.

I also have a vague feeling that something like this has been discussed before. If so, please just point me to the discussion.

In any case, thanks for working to improve LLVM’s testing infrastructure.

Joel

You propose using the preprocessor for mixing C++ test code with its input code, such as LLVM IR. Of course, LIT, FileCheck, `clang -verify`, and unit tests all enable mixing their various forms of test code with input code. To my eyes, a difference between unit tests vs. LIT, FileCheck, and `clang -verify` tests is that the latter tend to make the input code more prominent (I prefer that) and make it easier to clarify which test code is associated with which input code. So far, I think the preprocessor approach is better in this regard than unit tests, but what about a more familiar syntax, like the following?

// RUN-CXX: #include "compiledtestboilerplate.h"
// RUN-CXX: unique_ptr<Module> Output = run_opt("%s", "-passes=loop-vectorize");

// RUN-CXX: /* Check func1 Output */
define void @func1() {
entry:
  ret
}

// RUN-CXX: /* Check func2 Output */
define void @func2() {
entry:
  ret
}

It seems it should be feasible to automate extraction of the `RUN-CXX:` code for compilation and for analysis by clang-format and clang-tidy. Perhaps there would be a new script that extracts at build time for all such uses. But there are other possibilities that could be considered: a LIT extension for `RUN-CXX:`, C++ JIT compilation, a clang-format-diff.py extension that greps modified files for `RUN-CXX:`, etc.

I think there is a significant gain of having the C++ code at the
top-level, rather than the IR. There are more tools for C++ of which
some, such as IDEs, include-what-you-use, youcompleteme, etc are
external whereas LLVM-IR is project-specific.

If func1 is unrelated to func2, I really think these should not be in
the same module. Consider this:

static const char *Func1IR = R"IR(
define void @func1() {
entry:
  ret
}
)IR";
TEST(MyTests, TestFunc1) {
  unique_ptr<Module> Output = run_opt(Func1IR, "-passes=loop-vectorize");
   /* Check func1 Output */
}

static const char *Func2IR = R"IR(
define void @func2() {
entry:
  ret
}
)IR";
TEST(MyTests, TestFunc2) {
  unique_ptr<Module> Output = run_opt(Func2IR, "-passes=loop-vectorize");
   /* Check func2 Output */
}

Part of the reason is that not all of what makes a function is
syntactically contained within that function definition: Forward
declarations, types, function attributes, metadata, etc.
When using update_test_checks.py, these are per-function anyways.

However, for cases where it is indeed beneficial, there could be some
preprocessing of the embedded string taking place. Maybe:

static const char *Func1IR = R"IR(
define i32 @func1() {
entry:
  %two = add i32 1, 1 ; ASSERT_TRUE(isa<AddInst>(F["two"]))
  ret i32 %two
}
)IR";

// All collected ASSERT_TRUE emitted into another file
// #line preprocessor hints could point to the original line.
#include "inlineasserts.inc"

I also have a vague feeling that something like this has been discussed before. If so, please just point me to the discussion.

I am not aware of any such previous discussion.

Michael

Hi Michael,

I’m sorry I’m late to this thread, but I would really rather not go this direction. Unit tests in general (and this sort of extension to the idea) come at very high cost to testing flows, particularly with large scale builds.

One of the major and important pieces of the LLVM design is how its testing infrastructure works. The choice to use a small number of tools (llc, opt, etc) is important for multiple reasons:

1) Link time of executables is a significant problem, particularly in large scale builds.
2) This encourages investing in testing tools (see, e.g. the recent improvements to FileCheck etc)
3) It reduces/factors the number of users of API surface area, making it easier to do large scale refactoring etc.
4) It encourages the development of textual interfaces to libraries, which aids with understandability and helps reinforce stronger interfaces (llvm-mc is one example of this, LLVM IR text syntax is another).
5) Depending on the details, this can make the build dependence graph more serialized.

Unit tests are very widely used across the industry, and it is certainly true that they are fully general and more flexible. This makes them attractive, but it is a trap. I’d really rather we don’t go down this route, and maintain the approach of only using unit tests for very low level things like apis in ADT etc.

-Chris

To illustrate some difficulties with FileCheck, lets make a non-semantic change in LLVM:

— a/llvm/lib/Analysis/VectorUtils.cpp
+++ b/llvm/lib/Analysis/VectorUtils.cpp
@@ -642,8 +642,8 @@ MDNode *llvm::uniteAccessGroups(MDNode *AccGroups1, MDNode *AccGroups2) {
return AccGroups1;

SmallSetVector<Metadata *, 4> Union;

  • addToAccessGroupList(Union, AccGroups1);
    addToAccessGroupList(Union, AccGroups2);
  • addToAccessGroupList(Union, AccGroups1);

if (Union.size() == 0)
return nullptr;

This changes the order of access groups when merging them.
Same background: Access groups are used to mark that a side-effect (e.g. memory access) does not limit the parallelization of a loop, added by e.g. #pragma clang loop vectorize(assume_safety). Therefore a loop can be assumed to be parallelizable without further dependency analysis if all its side-effect instructions are marked this way (and has no loop-carried scalar dependencies).
An access group represents the instructions marked for a specific loop. A single instruction can be parallel with regrads to multiple loop, i.e. belong to multiple access groups. The order of the access groups is insignificant, i.e. whether either AccGroups1 or AccGroups2 comes first, the instruction is marked parallel for both loops.

Even the change should have no effect on correctness, ninja check-llvm fails:

test/Transforms/Inline/parallel-loop-md-merge.ll

string not found in input
; CHECK: ![[ACCESSES_INNER]] = !{!“llvm.loop.parallel_accesses”, ![[ACCESS_GROUP_INNER]]}
^
:45:1: note: scanning from here
!7 = !{!“llvm.loop.parallel_accesses”, !5}
^
:45:1: note: with “ACCESSES_INNER” equal to “7”
!7 = !{!“llvm.loop.parallel_accesses”, !5}
^
:45:1: note: with “ACCESS_GROUP_INNER” equal to “4”
!7 = !{!“llvm.loop.parallel_accesses”, !5}
^

The line FileCheck points to has nothing to do with the changed order of access groups.
(what’s the point of annotating the !7 = ... line repeatedly? Why not pointing to the lines where ACCESSES_INNER/ACCESS_GROUP_INNER have their values from?)

The CHECK lines, annotated with their line numbers from parallel-loop-md-merge.ll, are:

68 ; CHECK: store double 4.200000e+01, {{.*}} !llvm.access.group ![[ACCESS_GROUP_LIST_3:[0-9]+]]
69 ; CHECK: br label %for.cond.i, !llvm.loop ![[LOOP_INNER:[0-9]+]]
70 ; CHECK: br label %for.cond, !llvm.loop ![[LOOP_OUTER:[0-9]+]]
71
72 ; CHECK: ![[ACCESS_GROUP_LIST_3]] = !{![[ACCESS_GROUP_INNER:[0-9]+]], ![[ACCESS_GROUP_OUTER:[0-9]+]]}
73 ; CHECK: ![[ACCESS_GROUP_INNER]] = distinct !{}
74 ; CHECK: ![[ACCESS_GROUP_OUTER]] = distinct !{}
75 ; CHECK: ![[LOOP_INNER]] = distinct !{![[LOOP_INNER]], ![[ACCESSES_INNER:[0-9]+]]}
76 ; CHECK: ![[ACCESSES_INNER]] = !{!“llvm.loop.parallel_accesses”, ![[ACCESS_GROUP_INNER]]}
77 ; CHECK: ![[LOOP_OUTER]] = distinct !{![[LOOP_OUTER]], ![[ACCESSES_OUTER:[0-9]+]]}
78 ; CHECK: ![[ACCESSES_OUTER]] = !{!“llvm.loop.parallel_accesses”, ![[ACCESS_GROUP_OUTER]]}

And the output of FileCheck --dump-input -v is:

38: !0 = !{!1}
39: !1 = distinct !{!1, !2, !“callee: %A”}
40: !2 = distinct !{!2, !“callee”}
41: !3 = !{!4, !5}
check:72 ^~~~~~~~~~~~~~
42: !4 = distinct !{}
check:73 ^~~~~~~~~~~~~~~~~
43: !5 = distinct !{}
check:74 ^~~~~~~~~~~~~~~~~
44: !6 = distinct !{!6, !7}
check:75 ^~~~~~~~~~~~~~~~~~~~~~~
45: !7 = !{!“llvm.loop.parallel_accesses”, !5}
check:76 X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
46: !8 = distinct !{!8, !9}
check:76 ~~~~~~~~~~~~~~~~~~~~~~~
47: !9 = !{!“llvm.loop.parallel_accesses”, !4}
check:76 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To fix this regression test, one needs to solve the following questions:

  • Is !0 = !{!1} supposed to match with anything?
  • Which of the CHECK lines are the ones that are important for the regression tests?
  • The ![[SOMETHING]] = distinct !{} line appears twice. Which one is ACCESS_GROUP_INNER/ACCESS_GROUP_OUTER?
  • There are two lines with llvm.loop.parallel_accesses as well. The author of this regression test was kind enough to give the placeholders descriptive names, but do they match with what FileCheck matched?
  • LOOP_INNER and LOOP_OUTER are only distinguished by position and .i. Why does the inner loop come first?
  • Before the patch that modifies the order, the output of --dump-input=always was:
    38: !0 = !{!1}
    39: !1 = distinct !{!1, !2, !“callee: %A”}
    40: !2 = distinct !{!2, !“callee”}
    41: !3 = !{!4, !5}
    42: !4 = distinct !{}
    43: !5 = distinct !{}
    44: !6 = distinct !{!6, !7}
    45: !7 = !{!“llvm.loop.parallel_accesses”, !4}
    46: !8 = distinct !{!8, !9}
    47: !9 = !{!“llvm.loop.parallel_accesses”, !5}

This seems to suggest that our change caused !4 and !5 in lines 45 and 47 to swap. This is not what the patch did, which changes to order of two MDNode arguments. The only MDNode that has two other MDNodes as operands is line (line !6 and !8 are representing loops as they reference themselves as first arguments; noalias scopes do this as well, but there is no noalias metadata in this example).

An hasty fix is to change CHECK lines 76 and 77 to

76 ; CHECK: ![[ACCESSES_INNER]] = !{!“llvm.loop.parallel_accesses”, ![[ACCESS_GROUP_OUTER]]}
78 ; CHECK: ![[ACCESSES_OUTER]] = !{!“llvm.loop.parallel_accesses”, ![[ACCESS_GROUP_INNER]]}

However, ACCESS_GROUP_OUTER belongs to ACCESSES_INNER and ACCESS_GROUP_INNER to ACCESSES_OUTER? This cannot be true.

The reason is that MDNodes are emitted in topological order of a depth-first search. The patch changed the order of ACCESS_GROUP_INNER and ACCESS_GROUP_OUTER in line 41 but because in the mitted IR file, the first argument of !3 is emitted before the second argument, ACCESS_GROUP_OUTER is on line 42 and ACCESS_GROUP_INNER on line 43 instead.

Even though this regression test uses proper regex-ing of metadata node numbers, it is still fragile as it fails on irrelevant changes. Once it fails, it requires time to fix properly because it is non-obvious which lines are intended to match which output lines. The hasty fix would have mixed up ACCESS_GROUP_OUTER/ACCESS_GROUP_INNER and I wouldn’t expect someone who is working on the inliner to take the time to understand all the loop metadata. Set-semantics of metadata occurs often, such as noalias metadata, loop properties, tbaa metadata, etc.

The difficulty gets amplified when there are multiple functions in the regression tests; metadata of all functions all appended to the end and MDNodes with same content uniqued, making it impossible to associate particular MDNodes with specific functions.

Ideally the regression test would be robust and understandable, achievable with two asserts in a unittest:

Loop &OuterLoop = *LI->begin();
ASSERT_TRUE(OuterLoop.isAnnotatedParallel());
Loop &InnerLoop = *OuterLoop.begin();
ASSERT_TRUE(InnerLoop.isAnnotatedParallel());

One of the major and important pieces of the LLVM design is how its testing infrastructure works. The choice to use a small number of tools (llc, opt, etc) is important for multiple reasons:

1) Link time of executables is a significant problem, particularly in large scale builds.

You can use dynamic linking. Unfortunately, an LLVM dynamic library is
not (yet?) supported on Windows, so we need a static linking fallback.
If this is the only issue, I'd work on a solution (at least with
gtest) that works on Windows as well.

2) This encourages investing in testing tools (see, e.g. the recent improvements to FileCheck etc)

Google test also is a testing tool that is worth investing into, such
as more expressive ASSERT macros as in the RFC.

I think it makes it even easier to start new tools that begin being
used within a single test only when it does not seem worth adding a
new executable or FileCheck option.

3) It reduces/factors the number of users of API surface area, making it easier to do large scale refactoring etc.

FileCheck makes string output (including llvm::dbgs()) part of the
interface, which becomes not only harder to change, but also harder to
extend (new lines/tags to not match existing CHECK lines).

In contrast, refactoring using tools such as
clang-refactor/clang-format is not different then refactoring the
source of LLVM itself.

In the interest of downstream users, having a well checked API surface
should be more important.

4) It encourages the development of textual interfaces to libraries, which aids with understandability and helps reinforce stronger interfaces (llvm-mc is one example of this, LLVM IR text syntax is another).

While I don't disagree with good textual presentation, I think these
should be designed for human understandability and consistency, not
for machine processing.

5) Depending on the details, this can make the build dependence graph more serialized.

I don't see why this would be that case.

Unit tests are very widely used across the industry, and it is certainly true that they are fully general and more flexible. This makes them attractive, but it is a trap. I’d really rather we don’t go down this route, and maintain the approach of only using unit tests for very low level things like apis in ADT etc.

Note that we already have unittests for non low-level APIs such as
passes (VPlan, LICM, Unrolling, ...)

Can you elaborate on what the trap is?

Michael

To illustrate some difficulties with FileCheck, lets make a non-semantic change in LLVM:

\-\-\- a/llvm/lib/Analysis/VectorUtils\.cpp
\+\+\+ b/llvm/lib/Analysis/VectorUtils\.cpp
@@ \-642,8 \+642,8 @@ MDNode \*llvm::uniteAccessGroups\(MDNode \*AccGroups1, MDNode \*AccGroups2\) \{
     return AccGroups1;

   SmallSetVector&lt;Metadata \*, 4&gt; Union;
\-  addToAccessGroupList\(Union, AccGroups1\);
   addToAccessGroupList\(Union, AccGroups2\);
\+  addToAccessGroupList\(Union, AccGroups1\);

   if \(Union\.size\(\) == 0\)
     return nullptr;

This changes the order of access groups when merging them.
Same background: Access groups are used to mark that a side-effect (e.g. memory access) does not limit the parallelization of a loop, added by e.g. #pragma clang loop vectorize(assume_safety). Therefore a loop can be assumed to be parallelizable without further dependency analysis if all its side-effect instructions are marked this way (and has no loop-carried scalar dependencies).
An access group represents the instructions marked for a specific loop. A single instruction can be parallel with regrads to multiple loop, i.e. belong to multiple access groups. The order of the access groups is insignificant, i.e. whether either `AccGroups1` or `AccGroups2` comes first, the instruction is marked parallel for both loops.

Even the change should have no effect on correctness, `ninja check-llvm` fails:

test/Transforms/Inline/parallel\-loop\-md\-merge\.ll

 string not found in input
; CHECK: \!\[\[ACCESSES\_INNER\]\] = \!\{\!&quot;llvm\.loop\.parallel\_accesses&quot;, \!\[\[ACCESS\_GROUP\_INNER\]\]\}
         ^
&lt;stdin&gt;:45:1: note: scanning from here
\!7 = \!\{\!&quot;llvm\.loop\.parallel\_accesses&quot;, \!5\}
^
&lt;stdin&gt;:45:1: note: with &quot;ACCESSES\_INNER&quot; equal to &quot;7&quot;
\!7 = \!\{\!&quot;llvm\.loop\.parallel\_accesses&quot;, \!5\}
^
&lt;stdin&gt;:45:1: note: with &quot;ACCESS\_GROUP\_INNER&quot; equal to &quot;4&quot;
\!7 = \!\{\!&quot;llvm\.loop\.parallel\_accesses&quot;, \!5\}
^

The line FileCheck points to has nothing to do with the changed order of access groups.
(what's the point of annotating the `!7 = ...` line repeatedly? Why not pointing to the lines where ACCESSES_INNER/ACCESS_GROUP_INNER have their values from?)

The CHECK lines, annotated with their line numbers from parallel-loop-md-merge.ll, are:

68 ; CHECK: store double 4\.200000e\+01, \{\{\.\*\}\} \!llvm\.access\.group \!\[\[ACCESS\_GROUP\_LIST\_3:\[0\-9\]\+\]\]
69 ; CHECK: br label %for\.cond\.i, \!llvm\.loop \!\[\[LOOP\_INNER:\[0\-9\]\+\]\]
70 ; CHECK: br label %for\.cond, \!llvm\.loop \!\[\[LOOP\_OUTER:\[0\-9\]\+\]\]
71
72 ; CHECK: \!\[\[ACCESS\_GROUP\_LIST\_3\]\] = \!\{\!\[\[ACCESS\_GROUP\_INNER:\[0\-9\]\+\]\], \!\[\[ACCESS\_GROUP\_OUTER:\[0\-9\]\+\]\]\}
73 ; CHECK: \!\[\[ACCESS\_GROUP\_INNER\]\] = distinct \!\{\}
74 ; CHECK: \!\[\[ACCESS\_GROUP\_OUTER\]\] = distinct \!\{\}
75 ; CHECK: \!\[\[LOOP\_INNER\]\] = distinct \!\{\!\[\[LOOP\_INNER\]\], \!\[\[ACCESSES\_INNER:\[0\-9\]\+\]\]\}
76 ; CHECK: \!\[\[ACCESSES\_INNER\]\] = \!\{\!&quot;llvm\.loop\.parallel\_accesses&quot;, \!\[\[ACCESS\_GROUP\_INNER\]\]\}
77 ; CHECK: \!\[\[LOOP\_OUTER\]\] = distinct \!\{\!\[\[LOOP\_OUTER\]\], \!\[\[ACCESSES\_OUTER:\[0\-9\]\+\]\]\}
78 ; CHECK: \!\[\[ACCESSES\_OUTER\]\] = \!\{\!&quot;llvm\.loop\.parallel\_accesses&quot;, \!\[\[ACCESS\_GROUP\_OUTER\]\]\}

And the output of `FileCheck --dump-input -v` is:

     38: \!0 = \!\{\!1\}
     39: \!1 = distinct \!\{\!1, \!2, \!&quot;callee: %A&quot;\}
     40: \!2 = distinct \!\{\!2, \!&quot;callee&quot;\}
     41: \!3 = \!\{\!4, \!5\}

check:72 ^~~~~~~~~~~~~~
42: !4 = distinct !{}
check:73 ^~~~~~~~~~~~~~~~~
43: !5 = distinct !{}
check:74 ^~~~~~~~~~~~~~~~~
44: !6 = distinct !{!6, !7}
check:75 ^~~~~~~~~~~~~~~~~~~~~~~
45: !7 = !{!"llvm.loop.parallel_accesses", !5}
check:76 X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
46: !8 = distinct !{!8, !9}
check:76 ~~~~~~~~~~~~~~~~~~~~~~~
47: !9 = !{!"llvm.loop.parallel_accesses", !4}
check:76 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To fix this regression test, one needs to solve the following questions:

  * Is `!0 = !{!1}` supposed to match with anything?
  * Which of the CHECK lines are the ones that are important for the
    regression tests?
  * The `![[SOMETHING]] = distinct !{}` line appears twice. Which one
    is ACCESS_GROUP_INNER/ACCESS_GROUP_OUTER?
  * There are two lines with `llvm.loop.parallel_accesses` as well.
    The author of this regression test was kind enough to give the
    placeholders descriptive names, but do they match with what
    FileCheck matched?
  * LOOP_INNER and LOOP_OUTER are only distinguished by position and
    `.i`. Why does the inner loop come first?
  * Before the patch that modifies the order, the output of
    --dump-input=always was:

 38: \!0 = \!\{\!1\}
 39: \!1 = distinct \!\{\!1, \!2, \!&quot;callee: %A&quot;\}
 40: \!2 = distinct \!\{\!2, \!&quot;callee&quot;\}
 41: \!3 = \!\{\!4, \!5\}
 42: \!4 = distinct \!\{\}
 43: \!5 = distinct \!\{\}
 44: \!6 = distinct \!\{\!6, \!7\}
 45: \!7 = \!\{\!&quot;llvm\.loop\.parallel\_accesses&quot;, \!4\}
 46: \!8 = distinct \!\{\!8, \!9\}
 47: \!9 = \!\{\!&quot;llvm\.loop\.parallel\_accesses&quot;, \!5\}

This seems to suggest that our change caused !4 and !5 in lines 45 and 47 to swap. This is not what the patch did, which changes to order of two MDNode arguments. The only MDNode that has two other MDNodes as operands is line (line !6 and !8 are representing loops as they reference themselves as first arguments; noalias scopes do this as well, but there is no noalias metadata in this example).

An hasty fix is to change CHECK lines 76 and 77 to

76 ; CHECK: \!\[\[ACCESSES\_INNER\]\] = \!\{\!&quot;llvm\.loop\.parallel\_accesses&quot;, \!\[\[ACCESS\_GROUP\_OUTER\]\]\}
78 ; CHECK: \!\[\[ACCESSES\_OUTER\]\] = \!\{\!&quot;llvm\.loop\.parallel\_accesses&quot;, \!\[\[ACCESS\_GROUP\_INNER\]\]\}

However, ACCESS_GROUP_OUTER belongs to ACCESSES_INNER and ACCESS_GROUP_INNER to ACCESSES_OUTER? This cannot be true.

The reason is that MDNodes are emitted in topological order of a depth-first search. The patch changed the order of ACCESS_GROUP_INNER and ACCESS_GROUP_OUTER in line 41 but because in the mitted IR file, the first argument of !3 is emitted before the second argument, ACCESS_GROUP_OUTER is on line 42 and ACCESS_GROUP_INNER on line 43 instead.

Even though this regression test uses proper regex-ing of metadata node numbers, it is still fragile as it fails on irrelevant changes. Once it fails, it requires time to fix properly because it is non-obvious which lines are intended to match which output lines. The hasty fix would have mixed up ACCESS_GROUP_OUTER/ACCESS_GROUP_INNER and I wouldn't expect someone who is working on the inliner to take the time to understand all the loop metadata. Set-semantics of metadata occurs often, such as noalias metadata, loop properties, tbaa metadata, etc.

The difficulty gets amplified when there are multiple functions in the regression tests; metadata of all functions all appended to the end and MDNodes with same content uniqued, making it impossible to associate particular MDNodes with specific functions.

Ideally the regression test would be robust and understandable, achievable with two asserts in a unittest:

Loop &amp;OuterLoop = \*LI\-&gt;begin\(\);
ASSERT\_TRUE\(OuterLoop\.isAnnotatedParallel\(\)\);
Loop &amp;InnerLoop = \*OuterLoop\.begin\(\);
ASSERT\_TRUE\(InnerLoop\.isAnnotatedParallel\(\)\);

I definitely agree that we should not be trying to do this kind of checking using textual metadata-node matching in FileCheck. The alternative already available is to add an analysis pass with some kind of verifier output. This output, not the raw metadata itself, can be checked by FileCheck. We also need to check the verification code, but at least that's something we can keep just in one place. For parallel annotations, we already have such a thing (we can run opt -loops -analyze; e.g., in test/Analysis/LoopInfo/annotated-parallel-complex.ll). We also do this kind of thing for the cost model (by running with -cost-model -analyze). To what extent would making more-extensive use of this technique address the use cases you're trying to address?

Maybe we should, also, just make more-extensive use of unit tests?

Thanks again,

Hal

The test as written is fragile because it requires a certain ordering. If the output order is not important, use CHECK-DAG rather than CHECK. This would be a failure to understand the testing tool.

My experience, over a 40-year career, is that good software developers are generally not very good test-writers. These are different skills and good testing is frequently not taught. It’s easy to write fragile tests; you make your patch, you see what comes out, and you write the test to expect exactly that output, using the minimum possible features of the testing tool. This is poor technique all around. We even have scripts that automatically generate such tests, used primarily in codegen tests. I devoutly hope that the people who produce those tests responsibly eyeball all those cases.

The proposal appears to be to migrate output-based tests (using ever-more-complicated FileCheck features) to executable tests, which makes it more like the software development people are used to instead of test-writing. But I don’t see that necessarily solving the problem; seems like it would be just as easy to write a program that doesn’t do the right thing as to write a FileCheck test that doesn’t do the right thing.

Hal’s suggestion is more to the point: If the output we’re generating is not appropriate to the kinds of tests we want to perform, it can be worthwhile to generate different kinds of output. MIR is a case in point; for a long time it was hard to introspect into the interval between IR and final machine code, but now it’s a lot easier.

–paulr

The CHECK lines in annotated-parallel-complex.ll are:

; CHECK: Parallel Loop at depth 1
; CHECK-NOT: Parallel
; CHECK: Loop at depth 2
; CHECK: Parallel Loop
; CHECK: Parallel Loop

When adding this test, I had to change LoopInfo to emit the “Parallel” in front of “Loop”. For readability, I would have preferred the parallel info as a “tag”, such as Loop (parallel) as depth 1, but this would break other tests that check “Loop at depth 1”. Later I noticed that there are regression tests that check “LoopFullUnrollPass on Loop at depth 3 containing: %l0.0.0”, but it seems I got lucky in that none of these loops have parallel annotations.

“CHECK-NOT” is inherently fragile. It is too easy to make a change in LLVM that changes the text output and oversee that this test does not check what it was supposed to test. For a FileCheck-friendlier output, it could emit “NonParallel” and match this. However, this clutters the output for humans, will actually break the “LoopFullUnrollPass on Loop at depth 3 …” and “CHECK: Parallel” matches this as well since FileCheck ignores word boundaries.

The CHECK lines test more than necessary. The first and third CHECK lines also check the “at depth” to make it match the correct loop (and not, e.g. match the next inner loop), although we are not interested in the loop depths themselves. Ironically, is the reason why cannot be tags between “Loop” and “at depth”

Not all of the loop metadata have passes that print them. For instance, there are loop properties such as llvm.loop.isvectorized. Reading those is usually done using utility functions such as getBooleanLoopAttribute(L, “llvm.loop.isvectorized”). A solution using FileCheck would be to add another pass that prints loop metadata. That pass would only be used for testing, making the release LLVM binaries larger than necessary and still have the other problems.

Processing the IR through a tool can make the output more FileCheck-friendly, but it doesn’t make its problems disappear. IMHO it adds to the maintenance burden since it adds more textual interfaces.

Michael

The test as written is fragile because it requires a certain ordering. If the output order is not important, use CHECK-DAG rather than CHECK. This would be a failure to understand the testing tool.

CHECK-DAG does not help here since what changes is within a list on the same line, and we have no CHECK-SAME-DAG or CHECK-DAG-SAME. Even if we had it, the actual line that changed is textually the same and FileCheck would need to backtrack deep into the following lines for alternative placeholder substitutions. It would look like

CHECK-SAME-DAG: ![[ACCESS_GROUP_INNER:[0-9]+]]
CHECK-SAME-DAG: ,
CHECK-SAME-DAG: ![[ACCESS_GROUP_OUTER:[0-9]+]]

which allows the comma to appear anywhere and I don’t find readable.

My (naive?) conclusion is that textural checking is not the right tool here.

My experience, over a 40-year career, is that good software developers are generally not very good test-writers. These are different skills and good testing is frequently not taught. It’s easy to write fragile tests; you make your patch, you see what comes out, and you write the test to expect exactly that output, using the minimum possible features of the testing tool. This is poor technique all around. We even have scripts that automatically generate such tests, used primarily in codegen tests. I devoutly hope that the people who produce those tests responsibly eyeball all those cases.

The proposal appears to be to migrate output-based tests (using ever-more-complicated FileCheck features) to executable tests, which makes it more like the software development people are used to instead of test-writing. But I don’t see that necessarily solving the problem; seems like it would be just as easy to write a program that doesn’t do the right thing as to write a FileCheck test that doesn’t do the right thing.

IMHO having a tool that allows to better express what is intended to be tested is already worth a lot. For instance, we usually don’t care about SSA value names or MDNode numbers, but we have to put extra work into regex-ing away those names in FileCheck tests and as a result, most tests we have do still expect the exact number for metadata nodes. This is a problem if we we want to emit new metadata nodes in that all those tests need to be updated.

This problem goes away if the test method by default ignored value names/MDNode numbers and software development people had to put extra work if they actually want to verify this.

Hal’s suggestion is more to the point: If the output we’re generating is not appropriate to the kinds of tests we want to perform, it can be worthwhile to generate different kinds of output. MIR is a case in point; for a long time it was hard to introspect into the interval between IR and final machine code, but now it’s a lot easier.

Can you elaborate on what makes it easier?

Michael

<mailto:hfinkel@anl.gov>>:

    I definitely agree that we should not be trying to do this kind of
    checking using textual metadata-node matching in FileCheck. The
    alternative already available is to add an analysis pass with some
    kind of verifier output. This output, not the raw metadata itself,
    can be checked by FileCheck. We also need to check the
    verification code, but at least that's something we can keep just
    in one place. For parallel annotations, we already have such a
    thing (we can run opt -loops -analyze; e.g., in
    test/Analysis/LoopInfo/annotated-parallel-complex.ll). We also do
    this kind of thing for the cost model (by running with -cost-model
    -analyze). To what extent would making more-extensive use of this
    technique address the use cases you're trying to address?

The CHECK lines in annotated-parallel-complex.ll are:

; CHECK: Parallel Loop at depth 1
; CHECK-NOT: Parallel
; CHECK: Loop at depth 2
; CHECK: Parallel Loop
; CHECK: Parallel Loop

When adding this test, I had to change LoopInfo to emit the "Parallel" in front of "Loop". For readability, I would have preferred the parallel info as a "tag", such as `Loop (parallel) as depth 1`, but this would break other tests that check "Loop at depth 1". Later I noticed that there are regression tests that check "LoopFullUnrollPass on Loop at depth 3 containing: %l0.0.0<header>", but it seems I got lucky in that none of these loops have parallel annotations.

"CHECK-NOT" is inherently fragile. It is too easy to make a change in LLVM that changes the text output and oversee that this test does not check what it was supposed to test. For a FileCheck-friendlier output, it could emit "NonParallel" and match this. However, this clutters the output for humans, will actually break the "LoopFullUnrollPass on Loop at depth 3 ..." and "CHECK: Parallel" matches this as well since FileCheck ignores word boundaries.

The CHECK lines test more than necessary. The first and third CHECK lines also check the "at depth" to make it match the correct loop (and not, e.g. match the next inner loop), although we are not interested in the loop depths themselves. Ironically, is the reason why cannot be tags between "Loop" and "at depth"

We can have different printing modes. There can be a more-human-friendly mode and a more-FileCheck-friendly mode. Or modes customized for different kinds of tests. I agree, however, that this does not solve the fragility problems with CHECK-NOT.

Not all of the loop metadata have passes that print them. For instance, there are loop properties such as llvm.loop.isvectorized. Reading those is usually done using utility functions such as getBooleanLoopAttribute(L, "llvm.loop.isvectorized"). A solution using FileCheck would be to add another pass that prints loop metadata. That pass would only be used for testing, making the release LLVM binaries larger than necessary and still have the other problems.

Processing the IR through a tool can make the output more FileCheck-friendly, but it doesn't make its problems disappear. IMHO it adds to the maintenance burden since it adds more textual interfaces.

That's the interesting question... it does add to the maintenance burden. However, having textual outputs are also quite convenient when debugging things (because I can change the input and see the output quickly, without needing to create and compile another program). Obviously, at some point, this becomes ridiculous. How much is too much? I don't know. But it's also not clear to me that we're at that point yet. We could add more textual analysis outputs and still have that be a net benefit in many places.

In cases where the requisite output would just be too specific, we do have unit tests. Should we just add more? Maybe we're too happy to add lit tests instead of unit tests for some of these cases.

What I actually meant re. CHECK-DAG is to take this

72 ; CHECK: ![[ACCESS_GROUP_LIST_3]] = !{![[ACCESS_GROUP_INNER:[0-9]+]], ![[ACCESS_GROUP_OUTER:[0-9]+]]}
73 ; CHECK: ![[ACCESS_GROUP_INNER]] = distinct !{}
74 ; CHECK: ![[ACCESS_GROUP_OUTER]] = distinct !{}

and turn it into this

72 ; CHECK: ![[ACCESS_GROUP_LIST_3]] = !{![[ACCESS_GROUP_INNER:[0-9]+]], ![[ACCESS_GROUP_OUTER:[0-9]+]]}
73 ; CHECK-DAG: ![[ACCESS_GROUP_INNER]] = distinct !{}
74 ; CHECK-DAG: ![[ACCESS_GROUP_OUTER]] = distinct !{}

except that I think I was misled by the “non-semantic” remark about the change. Did you mean that the order of the INNER and OUTER elements (line 72) has been swapped? That sounds semantic, as far as the structure of the metadata is concerned! But okay, let’s call that a syntactic change, and a test relying on the order of the parameters will break. Which it did. And the correct fix is instead

72 ; CHECK: ![[ACCESS_GROUP_LIST_3]] = !{![[ACCESS_GROUP_OUTER:[0-9]+]], ![[ACCESS_GROUP_INNER:[0-9]+]]}

is it not? To reflect the change in order?

But let’s say I’m the one doing this presumably innocuous change, and have no clue what I’m doing, and don’t know much about how FileCheck works (which is pretty typical of the community, I admit). You’ve shown issues with trying to diagnose the FileCheck results.

How would the compiled regression test fail? How would it be easier to identify and repair the issue?

Re. how MIR makes testing easier: it is a serialization of the data that a machine-IR pass operates on, which makes it feasible to feed canned MIR through a single pass in llc and look at what exactly that pass did. Prior to MIR, we had to go from IR to real machine code and infer what was going on in a pass after multiple levels of transformation had occurred. It was very black-box, and the black box was way too big.

Anyway, getting back to the original topic, it sounds like you’re proposing two rather independent things: (1) an easier API for writing optimizer unittests, (2) a way to push unittests into the main llvm/test tree. Can we separate these things?

Thanks,

–paulr