[FileCheck] Fix --strict-whitespace --match-full-lines

Hi,

this patch fixes a problem with leading/trailing whitespace matching for FileCheck --strict-whitespace --match-full-lines.

Consider a text file:
...
$ cat DUMP
bla1
bla2
  bla3
bla4
  bla5
...

with some leading and trailing spaces, made more visible like this:
...
$ sed 's/ /_/g' DUMP
bla1
bla2
_bla3
bla4_
_bla5_
...

and a FileCheck file CHECK to match DUMP:
...
$ cat CHECK
// CHECK-LABEL:bla1
// CHECK-NEXT:bla2
// CHECK-NEXT: bla3
// CHECK-NEXT:bla4
// CHECK-NEXT: bla5
...

with whitespace made more visible like this:
...
$ sed 's/ /_/g' CHECK
//_CHECK-LABEL:bla1
//_CHECK-NEXT:bla2
//_CHECK-NEXT:_bla3
//_CHECK-NEXT:bla4_
//_CHECK-NEXT:_bla5_
...

When trying the match, it fails:
...
$ cat DUMP | FileCheck CHECK --strict-whitespace --match-full-lines
CHECK:3:16: error: expected string not found in input
// CHECK-NEXT: bla3
                ^
<stdin>:3:2: note: scanning from here
  bla3
  ^
...

I expect the match to succeed, because I expect leading and trailing whitespace _not_ to be ignored, because the documentation states:
...
  --match-full-lines

     By default, FileCheck allows matches of anywhere on a line. This option will require all positive matches to cover an entire line. Leading and trailing whitespace is ignored, unless --strict-whitespace is also specified.
...

After adding some debug code to FileCheck (which I proposed here on llvm-dev ml as '[FileCheck] Add --verbose'), we can see where things go wrong:
...
$ cat DUMP | /home/vries/gt/build/./bin/FileCheck CHECK --strict-whitespace --match-full-lines --verbose
CHECK:3:16: note: RegEx string match: '^bla3$'
// CHECK-NEXT: bla3
...

The resulting regexp string is '^bla3$' instead of '^ bla3$'.

The patch fixes this, and makes the behavior match the documentation.

I ran the llvm/test/FileCheck tests, no regressions.

Any comments? OK for trunk?

Thanks,
- Tom

0002-FileCheck-Fix-strict-whitespace-match-full-lines.patch (2.06 KB)

+jyknight, who added --match-full-lines

Hi,

this patch fixes a problem with leading/trailing whitespace matching for FileCheck --strict-whitespace --match-full-lines.

The resulting regexp string is '^bla3$' instead of '^ bla3$'.

The patch fixes this, and makes the behavior match the documentation.

I ran the llvm/test/FileCheck tests, no regressions.

Should probably run the entire llvm testsuite, if not clang's too. I doubt FileCheck's own tests have good coverage of the tool itself.

Jon

Please send patches to llvm-commits not llvm-dev.

Writing FileCheck tests has pitfalls. A test along these lines:

bla0
CHECK:bla1

will actually pass, because the CHECK pattern is also part of the input
so it will readily match itself. You want the CHECK lines not to match
themselves, which you can easily do by introducing {{}} into the (middle
of the) pattern. That is:

bla0
CHECK:{{bla1}}

will still pass (incorrectly), while

bla0
CHECK:bla{{1}}

will fail (correctly). A correct FileCheck test would thus be

bla1
CHECK:bla{{1}}

(I didn't include this in my recent FileCheck Follies lightning talk,
because testing FileCheck itself is kind of an obscure corner of the
LLVM world and I was already bumping up against the 5-minute time limit.)
--paulr

From: llvm-dev [mailto:llvm-dev-bounces@lists.llvm.org] On Behalf Of
Jonathan Roelofs via llvm-dev
Sent: Wednesday, December 14, 2016 8:32 AM
To: Tom de Vries; llvm-dev@lists.llvm.org
Subject: Re: [llvm-dev] [FileCheck] Fix --strict-whitespace --match-full-
lines

+jyknight, who added --match-full-lines

> Hi,
>
> this patch fixes a problem with leading/trailing whitespace matching
> for FileCheck --strict-whitespace --match-full-lines.
>
> The resulting regexp string is '^bla3$' instead of '^ bla3$'.
>
> The patch fixes this, and makes the behavior match the documentation.
>
> I ran the llvm/test/FileCheck tests, no regressions.

Should probably run the entire llvm testsuite, if not clang's too. I
doubt FileCheck's own tests have good coverage of the tool itself.

They are smoke tests, like most of clang/llvm lit tests.
Running the full clang/llvm lit suites with the modified FileCheck
seems like a pretty reasonable exercise.
--paulr

I see. ISTM though that the FileCheck tests are written in a style that is not the typical usage of FileCheck.

As far as I've seen sofar, a typical FileCheck usage looks like:
...
<comment-token> RUN: ignore-comments-and-translate %s \
<comment-token> RUN: | FileCheck %s

... input for translation ...

<comment-token> CHECK: ... check translation ...
...

So if you run the FileCheck tests in a similar way, like this:
...
; RUN: sed 's/^;.*$//' %s \
; RUN: | FileCheck %s
...
you don't have to obfuscate the check patterns to work around this problem.

I've updated the test-case accordingly.

[ If we want to use this more often, perhaps something like this would be good:
...
diff --git a/test/lit.cfg b/test/lit.cfg
index d8728b6..80580c8 100644
--- a/test/lit.cfg
+++ b/test/lit.cfg
@@ -202,6 +202,9 @@
    lli += ' -mtriple='+config.host_triple+'-elf'
  config.substitutions.append( ('%lli', lli ) )

+config.substitutions.append( ('%filter-txt-comments',
+ 'sed \'s/^;.*$//\'') )

0002-FileCheck-Fix-strict-whitespace-match-full-lines.patch (2.22 KB)

From: Tom de Vries [mailto:Tom_deVries@mentor.com]
Sent: Thursday, December 15, 2016 2:31 AM
To: Robinson, Paul
Cc: Jonathan Roelofs; llvm-dev@lists.llvm.org
Subject: Re: [llvm-dev] [FileCheck] Fix --strict-whitespace --match-full-
lines

> Please send patches to llvm-commits not llvm-dev.
>
> Writing FileCheck tests has pitfalls. A test along these lines:
>
> bla0
> CHECK:bla1
>
> will actually pass, because the CHECK pattern is also part of the input
> so it will readily match itself. You want the CHECK lines not to match
> themselves, which you can easily do by introducing {{}} into the (middle
> of the) pattern. That is:
>
> bla0
> CHECK:{{bla1}}
>
> will still pass (incorrectly), while
>
> bla0
> CHECK:bla{{1}}
>
> will fail (correctly). A correct FileCheck test would thus be
>
> bla1
> CHECK:bla{{1}}
>
>

I see. ISTM though that the FileCheck tests are written in a style that
is not the typical usage of FileCheck.

As far as I've seen sofar, a typical FileCheck usage looks like:
...
<comment-token> RUN: ignore-comments-and-translate %s \
<comment-token> RUN: | FileCheck %s

... input for translation ...

<comment-token> CHECK: ... check translation ...
<comment-token> CHECK: ... check translation ...
...

So if you run the FileCheck tests in a similar way, like this:
...
; RUN: sed 's/^;.*$//' %s \
; RUN: | FileCheck %s
...
you don't have to obfuscate the check patterns to work around this
problem.

An intriguing trick, unfortunately it didn't work when I tried it on
Windows. The Windows shell doesn't strip the single-quotes, so sed
doesn't understand the command; but a Posix shell requires the quotes,
to avoid globbing the .* part of the expression. We could work around
that by using 'REQUIRES: shell' but at that point you're basically
never testing FileCheck on Windows, which seems like a pretty serious
hole in the test coverage.

I think we're stuck with the fake regex hack, if we want single-file
tests (for better maintainability).
--paulr

From: Tom de Vries [mailto:Tom_deVries@mentor.com]
Sent: Thursday, December 15, 2016 2:31 AM
To: Robinson, Paul
Cc: Jonathan Roelofs; llvm-dev@lists.llvm.org
Subject: Re: [llvm-dev] [FileCheck] Fix --strict-whitespace --match-full-
lines

Please send patches to llvm-commits not llvm-dev.

Writing FileCheck tests has pitfalls. A test along these lines:

bla0
CHECK:bla1

will actually pass, because the CHECK pattern is also part of the input
so it will readily match itself. You want the CHECK lines not to match
themselves, which you can easily do by introducing {{}} into the (middle
of the) pattern. That is:

bla0
CHECK:{{bla1}}

will still pass (incorrectly), while

bla0
CHECK:bla{{1}}

will fail (correctly). A correct FileCheck test would thus be

bla1
CHECK:bla{{1}}

I see. ISTM though that the FileCheck tests are written in a style that
is not the typical usage of FileCheck.

As far as I've seen sofar, a typical FileCheck usage looks like:
...
<comment-token> RUN: ignore-comments-and-translate %s \
<comment-token> RUN: | FileCheck %s

... input for translation ...

<comment-token> CHECK: ... check translation ...
...

So if you run the FileCheck tests in a similar way, like this:
...
; RUN: sed 's/^;.*$//' %s \
; RUN: | FileCheck %s
...
you don't have to obfuscate the check patterns to work around this
problem.

An intriguing trick, unfortunately it didn't work when I tried it on
Windows. The Windows shell doesn't strip the single-quotes, so sed
doesn't understand the command; but a Posix shell requires the quotes,
to avoid globbing the .* part of the expression. We could work around
that by using 'REQUIRES: shell' but at that point you're basically
never testing FileCheck on Windows, which seems like a pretty serious
hole in the test coverage.

Indeed. In general the community is moving away from using sed/awk/wc in tests, in favor of pure FileCheck tests, precisely because of these kinds of portability concerns.

I think we're stuck with the fake regex hack, if we want single-file
tests (for better maintainability).

+1

Jon

What about implicit-check-not.txt?

It contains the sed command, but with '#' instead of '/':
...
; RUN: sed 's#^;.*##' %s | FileCheck -check-prefix=CHECK-PASS -implicit-check-not=warning: %s
...

Does that also not work on windows?

Thanks,
- Tom

From: Tom de Vries [mailto:Tom_deVries@mentor.com]
Sent: Thursday, December 15, 2016 9:12 AM
To: Robinson, Paul
Cc: Jonathan Roelofs; llvm-dev@lists.llvm.org
Subject: Re: [llvm-dev] [FileCheck] Fix --strict-whitespace --match-full-
lines

>
>
>> From: Tom de Vries [mailto:Tom_deVries@mentor.com]
>> Sent: Thursday, December 15, 2016 2:31 AM
>> To: Robinson, Paul
>> Cc: Jonathan Roelofs; llvm-dev@lists.llvm.org
>> Subject: Re: [llvm-dev] [FileCheck] Fix --strict-whitespace --match-
full-
>> lines
>>
>>> Please send patches to llvm-commits not llvm-dev.
>>>
>>> Writing FileCheck tests has pitfalls. A test along these lines:
>>>
>>> bla0
>>> CHECK:bla1
>>>
>>> will actually pass, because the CHECK pattern is also part of the
input
>>> so it will readily match itself. You want the CHECK lines not to
match
>>> themselves, which you can easily do by introducing {{}} into the
(middle
>>> of the) pattern. That is:
>>>
>>> bla0
>>> CHECK:{{bla1}}
>>>
>>> will still pass (incorrectly), while
>>>
>>> bla0
>>> CHECK:bla{{1}}
>>>
>>> will fail (correctly). A correct FileCheck test would thus be
>>>
>>> bla1
>>> CHECK:bla{{1}}
>>>
>>>
>>
>> I see. ISTM though that the FileCheck tests are written in a style that
>> is not the typical usage of FileCheck.
>>
>> As far as I've seen sofar, a typical FileCheck usage looks like:
>> ...
>> <comment-token> RUN: ignore-comments-and-translate %s \
>> <comment-token> RUN: | FileCheck %s
>>
>> ... input for translation ...
>>
>> <comment-token> CHECK: ... check translation ...
>> <comment-token> CHECK: ... check translation ...
>> ...
>>
>> So if you run the FileCheck tests in a similar way, like this:
>> ...
>> ; RUN: sed 's/^;.*$//' %s \
>> ; RUN: | FileCheck %s
>> ...
>> you don't have to obfuscate the check patterns to work around this
>> problem.
>
> An intriguing trick, unfortunately it didn't work when I tried it on
> Windows. The Windows shell doesn't strip the single-quotes, so sed
> doesn't understand the command; but a Posix shell requires the quotes,
> to avoid globbing the .* part of the expression. We could work around
> that by using 'REQUIRES: shell' but at that point you're basically
> never testing FileCheck on Windows, which seems like a pretty serious
> hole in the test coverage.
>

What about implicit-check-not.txt?

It contains the sed command, but with '#' instead of '/':
...
; RUN: sed 's#^;.*##' %s | FileCheck -check-prefix=CHECK-PASS
-implicit-check-not=warning: %s
...

Does that also not work on windows?

Huh. Thanks for pointing that out! sed with single-quote doesn't
work directly on the command line, but it looks like 'lit' will
convert them to double-quotes on Windows, and that does work.

Windows doesn't natively provide 'sed' but we do express a
dependency on some GnuWin32 tools, and 'sed' is one of those. So
while Jon is correct that we're not keen on lots of dependencies,
and in particular you'll see occasional rounds of people removing
the use of 'grep' from tests, a search of the test tree shows that
the 'sed' is used in a variety of places (generally with slashes)
so what you've done here should be fine.

So, LGTM!
Thanks,
--paulr