RFC: [FileCheck] CHECK-DAG for multiple occurrences of string

Hi,

Using FileCheck, I have not found a way to make a group of CHECK-DAG directives match multiple occurrences of a string. For example, I naively thought the following would match successfully:

$ cat checks.txt
// CHECK: start
// CHECK-DAG: foo
// CHECK-DAG: foo
// CHECK-DAG: bar
// CHECK-NEXT: end

$ cat input.txt
start
foo
bar
foo
end

$ FileCheck --input-file=input.txt checks.txt
checks.txt:5:16: error: CHECK-NEXT: is not on the line after the previous match
// CHECK-NEXT: end
^
input.txt:5:1: note: 'next' match was here
end
^
input.txt:3:4: note: previous match ended here
bar
^
input.txt:4:1: note: non-matching line after previous match is here
foo
^

The trouble is that both “CHECK-DAG: foo” directives match the first “foo”.

I’d like this ability for testing a parallel program that outputs a series of non-unique strings in non-deterministic order. Am I trying to push FileCheck beyond its intended domain? Is there some existing feature for this purpose that I’ve overlooked? If not, I see two potential solutions:

  1. In a CHECK-DAG group, don’t let the matches for patterns overlap.

  2. Add a new CHECK-DAG-N directive, where N is some integer, to express that a pattern must have N non-overlapping matches.

An advantage of #1 that the intuitive way (at least in my mind) of expressing multiple occurrences of a string, as in the example above, would work. An advantage of #2 is that existing CHECK-DAG functionality would not change, and so there should be no chance of impacting existing well formed tests.

To understand the issue better, I’ve prototyped #2. It still needs test cases and documentation, so it’s not ready for a formal patch review. If people like the idea, I’ll polish it up.

Thanks.

Joel

I would personally like a feature like that in FileCheck because it would make it a lot easier to write MachineOutliner tests, and would make the tests significantly smaller and easier to understand.

- Jessica

Hi Jessica,

This time I'm replying all....

I would personally like a feature like that in FileCheck because it would
make it a lot easier to write MachineOutliner tests, and would make the
tests significantly smaller and easier to understand.

How do MachineOutliner tests accomplish this now? Can you point me to an
example?

Thanks.

Joel

1. In a CHECK-DAG group, don't let the matches for patterns overlap.
2. Add a new CHECK-DAG-N directive, where N is some integer, to express
that a pattern must have N non-overlapping matches.

I think #1 is much more intuitive and easy to describe/document than #2.
Changing the meaning of DAG in that way is highly unlikely to affect any
existing test, IMO. And if it does, my first expectation is that the test
wasn't doing what the author intended anyway.
--paulr

Hi Paul,

Right now, the tests try to accomplish the following

  1. Define a sequence of instructions (e.g a,b,c)
  2. Insert that sequence into k places with an unique instruction between them to make sure the outliner will yank them out without overlaps
  3. Check for k calls to an outlined function
  4. Check that the outlined sequence still exists in the program

This can result in some pretty long tests which would be a lot easier to define if we could say something like “this function must contain k instances of an outlined call, and not contain this sequence of instructions”.

test/CodeGen/machine-outliner.mir is a pretty good example of a test that would benefit from this sort of thing.

  • Jessica

Right now, the tests try to accomplish the following

1. Define a sequence of instructions (e.g a,b,c)
2. Insert that sequence into k places with an unique instruction between
them to make sure the outliner will yank them out *without overlaps*
3. Check for k calls to an outlined function
4. Check that the outlined sequence still exists in the program

This can result in some pretty long tests which would be a lot easier to
define if we could say something like “this function must contain k
instances of an outlined call, and not contain this sequence of
instructions”.

test/CodeGen/machine-outliner.mir is a pretty good example of a test that
would benefit from this sort of thing.

I'm not sure if what I'm proposing will do what you need. CHECK-DAG is for
cases where a set of expected strings is unordered. I'm trying to extend
it to handle cases where those strings aren't unique either. In your
example, I see the non-unique property but not the unordered property. Did
I miss it?

Moreover, it seems your example could be handled now by two FileCheck calls
with different prefixes, say OUT and INS. The first would use a sequence
of k "OUT:" directives to ensure that there are k outlined calls. The
second would use "INS-NOT:" directives to ensure other instructions don't
occur. The calls could also have a common prefix, say CHECK, to use for
the function labels. Does that do what you want?

Thanks

Joel

I implemented #1, and I now have 105 new unexpected tests failures from
llvm core (58), clang (45), openmp runtime (1), and compiler-rt (1) test
suites.

I've only looked at a few failures so far. Of those, the problem is the
expected one: in a CHECK-DAG group, multiple patterns match the same line,
so not permitting overlapping matches causes some patterns never to match
successfully. However, all the overlapping matches so far seem
unintentional. That is, this new implementation seems to be catching test
errors.

Here are some scenarios I found:

S1. Patterns sometimes identical

I just stumbled across r332416, which was modifying a test with multiple
identical CHECK-DAG directives, which reminded me I needed to get back
to you about this... sorry for the slow response.

From: Joel E. Denny [mailto:jdenny.ornl@gmail.com]

1. In a CHECK-DAG group, don't let the matches for patterns overlap.
2. Add a new CHECK-DAG-N directive, where N is some integer, to express
that a pattern must have N non-overlapping matches.

I think #1 is much more intuitive and easy to describe/document than #2.
Changing the meaning of DAG in that way is highly unlikely to affect any
existing test, IMO. And if it does, my first expectation is that the test
wasn't doing what the author intended anyway.

I implemented #1, and I now have 105 new unexpected tests failures from
llvm core (58), clang (45), openmp runtime (1), and compiler-rt (1) test
suites.

Okay... obviously more than I expected. But hey, if we look at it as a
percentage of all tests, it's really small! [attempting to salvage a
degree of self-respect]

I've only looked at a few failures so far. Of those, the problem is
the expected one: in a CHECK-DAG group, multiple patterns match the
same line, so not permitting overlapping matches causes some patterns
never to match successfully. However, all the overlapping matches so
far seem unintentional. That is, this new implementation seems to be
catching test errors.

Awesome. Catching test errors is a definite plus for the project.

...
Path Forward
------------------

These examples seem like strong evidence that permitting overlapping
matches is a significant source of test errors. Moreover, we have my
original goal that some test authors were apparently assuming was
already supported: to be able to handle sets of strings that are both
unordered and non-unique. Nevertheless, it's possible that there are
use cases, such as the one suggested by S1, that we lose.

Hmmm... Either that one intended for the matches to be different, and
it's a bug that they aren't; or it's an overly flexible test, and
tightening it up does no harm. The test change to accommodate the
updated FileCheck behavior would have to go through the original test
author or some other domain expert in order to determine which it is.

I don't see the loss of flexibility as necessarily a bad thing; although
we might find cases where behavior reasonably varies across targets, and
then we would need to work out how to accommodate those differences.

Here are some potential paths forward:

P1. Give up on approach #1 (non-overlapping CHECK-DAG) and go back to
#2 (CHECK-DAG-N). I don't prefer this, but it would be less work.

P2. Make CHECK-DAG non-overlapping but fix all test cases before
committing that change. Patch reviews will surely be slow.

P3. Make non-overlapping CHECK-DAG a special mode that's off by default
until all test suites have been fixed.

I vote for P3. It's easier to divide up the work if the feature is in
upstream FileCheck, and it's easy for someone doing the work to enable
the feature in their local branch. I would not bother making it a
command-line switch, unless we really do come up with cases where the
overlapping matches are absolutely the right solution for some class of
tests.

I suggest you file a bug with the complete list of test failures that
non-overlapping CHECK-DAG finds. That will help keep track of the work,
and if anyone else feels moved to pick up some part of it, they can make
notes in the bug so we can try to avoid duplicating effort. Hopefully
I can devote some time to this myself, I think it's how CHECK-DAG should
have been done in the first place and I'm sad we didn't catch it at the
time.

By the way, I think it would be generally useful for FileCheck to have
a debugging mode that prints every match. It would be particularly
useful while migrating to a non-overlapping CHECK-DAG.

Thoughts?

Joel

A debugging mode like that sounds like a useful feature for writing a
test, independent of the migration aid. Do it.

--paulr

Paul Robinson <paul.robinson@sony.com> writes:

From: Joel E. Denny [mailto:jdenny.ornl@gmail.com]

1. In a CHECK-DAG group, don't let the matches for patterns overlap.
2. Add a new CHECK-DAG-N directive, where N is some integer, to express
that a pattern must have N non-overlapping matches.

I think #1 is much more intuitive and easy to describe/document than #2.
Changing the meaning of DAG in that way is highly unlikely to affect any
existing test, IMO. And if it does, my first expectation is that the test
wasn't doing what the author intended anyway.

I implemented #1, and I now have 105 new unexpected tests failures from
llvm core (58), clang (45), openmp runtime (1), and compiler-rt (1) test
suites.

...

Here are some potential paths forward:

P1. Give up on approach #1 (non-overlapping CHECK-DAG) and go back to
#2 (CHECK-DAG-N). I don't prefer this, but it would be less work.

P2. Make CHECK-DAG non-overlapping but fix all test cases before
committing that change. Patch reviews will surely be slow.

P3. Make non-overlapping CHECK-DAG a special mode that's off by default
until all test suites have been fixed.

I vote for P3. It's easier to divide up the work if the feature is in
upstream FileCheck, and it's easy for someone doing the work to enable
the feature in their local branch. I would not bother making it a
command-line switch, unless we really do come up with cases where the
overlapping matches are absolutely the right solution for some class of
tests.

I feel like a variant of this where this mode is on by default would
make more sense. That is,

1. Modify CHECK-DAG so that overlapping patterns aren't allow, unless
   --deprecated-allow-overlapping-check-dag or something is passed to
   FileCheck.
2. Add the flag to the 105 current test failures
3. File bugs and/or create reviews to remove the flag from each of the
   tests.

There are two reasons I prefer this approach. First, new tests are
"correct by default" while we clean up the existing problematic tests.
Second, out of tree work has an easy path forward and reasonable
forewarning. They can add the flag temporarily and fix problematic tests
when time permits, rather it suddenly breaking their CI whenever open
source flips the switch.

I suggest you file a bug with the complete list of test failures that
non-overlapping CHECK-DAG finds. That will help keep track of the work,
and if anyone else feels moved to pick up some part of it, they can make
notes in the bug so we can try to avoid duplicating effort. Hopefully
I can devote some time to this myself, I think it's how CHECK-DAG should
have been done in the first place and I'm sad we didn't catch it at the
time.

By the way, I think it would be generally useful for FileCheck to have
a debugging mode that prints every match. It would be particularly
useful while migrating to a non-overlapping CHECK-DAG.

Thoughts?

Joel

A debugging mode like that sounds like a useful feature for writing a
test, independent of the migration aid. Do it.

+1

Those are excellent points, and that approach totally makes sense.
--paulr

I just stumbled across r332416, which was modifying a test with multiple
identical CHECK-DAG directives, which reminded me I needed to get back
to you about this... sorry for the slow response.

Not a problem.

> From: Joel E. Denny [mailto:jdenny.ornl@gmail.com]
>>> 1. In a CHECK-DAG group, don't let the matches for patterns overlap.
>>> 2. Add a new CHECK-DAG-N directive, where N is some integer, to express
>>> that a pattern must have N non-overlapping matches.
>>
>> I think #1 is much more intuitive and easy to describe/document than #2.
>> Changing the meaning of DAG in that way is highly unlikely to affect
any
>> existing test, IMO. And if it does, my first expectation is that the
test
>> wasn't doing what the author intended anyway.
>
> I implemented #1, and I now have 105 new unexpected tests failures from
> llvm core (58), clang (45), openmp runtime (1), and compiler-rt (1) test
> suites.

Okay... obviously more than I expected. But hey, if we look at it as a
percentage of all tests, it's really small! [attempting to salvage a
degree of self-respect]

I didn't exactly expect it either. :slight_smile:

I've only looked at a few failures so far. Of those, the problem is
> the expected one: in a CHECK-DAG group, multiple patterns match the
> same line, so not permitting overlapping matches causes some patterns
> never to match successfully. However, all the overlapping matches so
> far seem unintentional. That is, this new implementation seems to be
> catching test errors.

Awesome. Catching test errors is a definite plus for the project.

Thanks for encouraging me to pursue this implementation.

> ...
> Path Forward
> ------------------
>
> These examples seem like strong evidence that permitting overlapping
> matches is a significant source of test errors. Moreover, we have my
> original goal that some test authors were apparently assuming was
> already supported: to be able to handle sets of strings that are both
> unordered and non-unique. Nevertheless, it's possible that there are
> use cases, such as the one suggested by S1, that we lose.

Hmmm... Either that one intended for the matches to be different, and
it's a bug that they aren't; or it's an overly flexible test, and
tightening it up does no harm. The test change to accommodate the
updated FileCheck behavior would have to go through the original test
author or some other domain expert in order to determine which it is.

I don't see the loss of flexibility as necessarily a bad thing; although
we might find cases where behavior reasonably varies across targets, and
then we would need to work out how to accommodate those differences.

Agreed.

Thanks for the feedback.

Joel

Make sense. I'll work on putting the patch in phabricator and creating the
bugzilla issue. I'll get to implementing the debugging mode eventually.

Thanks.

Joel

Sorry for the delay. It's in phabricator here:

https://reviews.llvm.org/D47106

The list of test failures has changed slightly. I'm re-running and will
create a bugzilla soon.

Thanks.

Joel

The bugzilla is here:

https://bugs.llvm.org/show_bug.cgi?id=37532

The debugging mode patch is here:

https://reviews.llvm.org/D47114

Joel