BUG in FileCheck

Hello! I am a first-time contributor to LLVM, and was going through the good-first-issues list to look
for issues open to contribution.

In the process, I came across issue [FileCheck] Misleading prefix listed for NOT match #70221 in which the bug was that FileCheck listed the wrong prefixes in the diagnostic message when the file was checked against itself, and a custom prefixes were supplied by --check-prefixes, and opened PR#78412.

In particular, while digging through the source code of FileCheck, I discovered that the design was primarily around using CHECK as prefix, and hence the very first prefix given through --check-prefixes was used in the error message for the trailing NOT string.

But, soon I found out that it was a bigger problem, in that the class FileCheck keeps a pointer to a vector of FileCheckStrings, called CheckStrings, and that this vector was populated only for strings that were not DAG or NOT strings. Thus, whenever a NOT string would give an error, the first non-NOT string prefix following it would be used in the diagnostic message.

Fortunately, I have been able to implement a fix for the issue, that covers it nicely. I have three questions specifically, which I wanted to ask before I force push to the PR I opened initially(which, I am really sorry, is a solution that does not cover all the issues).

  1. Would it be alright if I open a new umbrella issue that reports all the bugs, and open a PR against it?
  2. My solution proposes placing a nested struct inside the struct FileCheckString, that stores the pattern of the NOT/DAG strings encountered and the Prefix that the string uses, and a vector holding all these pairs for each such string encountered in the input (the most minimally invasive and simple solution I could find). Is it allowable according to LLVM Coding Guidelines?
    and
  3. Some tests that use --implicit-check-not conditions rely on only a single prefix to implement the tests. Is it alright if I replace the error prefix by IMPLICIT-CHECK-NOT for all implicit-check-not errors, as it would take a huge amount of rewriting the existing code to report the actual prefix that matches the implicit pattern. This would require some changes in the existing tests, but the overall result would be a more pin-point and neater diagnostic that reports IMPLICIT-CHECK-NOT for such errors, and the correct prefixes for other NOT strings.

If I made a mistake, please bear with me. I hope I will be able to make a significant contribution to LLVM!

Thanks!

Thank you for working on FileCheck! The tools we rely on don’t always get the attention they deserve.

Regarding your questions.

  1. An umbrella issue is most often a substitute for searching for related issues. If you are asking, can you fix multiple issues in one patch, then the answer is yes if the problems are related. I have not read your code changes, just the comments, but it seems that you have followed that guideline.

  2. The coding guidelines do not really address how data structures should work. Do what seems best for solving the problem.

  3. I’m not clear what you are asking here. implicit-check-not must not fail if there is an explicit check for the same pattern. This is deliberate; if a test wants to guarantee that a particular pattern occurs exactly once, it can use an explicit check to match the one expected occurrence, and implicit-check-not of the same pattern to make sure there are no others. You might need to rewrite your question a different way so I can understand it.

The general idea of having implicit-check-not errors using a canonical prefix seems reasonable, though.

cc @jdenny-ornl

Thanks for replying!

In the 3rd question I was only asking what you did answer at the last, that would it be alright to use a canonical prefix for all implicit-check-nots.

However, what you mention about implicit-check-nots not failing when an explicit check-not asserts the absence of the same pattern is exactly the opposite of the actual behaviour! Implicit-check-nots currently will not fail if I check for different patterns, one explicitly and the other implicitly. I thought that initially strange, but assumed that maybe that was intended behaviour.

Would that warrant another fix?

Thanks!

P.S. I attach an example, please correct me if I misunderstood your statement.

$ cat err.txt
A: placeholder1
B-NOT: placeholder3

$ FileCheck --check-prefixes A,B --implicit-check-not placeholder1  --input-file err.txt err.txt
err.txt:2:8: error: B-NOT: excluded string found in input
B-NOT: placeholder3
       ^
err.txt:2:8: note: found here
B-NOT: placeholder3
       ^~~~~~~~~~~~

Input file: err.txt
Check file: err.txt

-dump-input=help explains the following input dump.

Input was:
<<<<<<
       1: A: placeholder1 
       2: B-NOT: placeholder3 
not:2            !~~~~~~~~~~~  error: no match expected
>>>>>>

Was implicit-check-not supposed to fail here?

$ cat err.txt
A: placeholder1
B-NOT: placeholder3

$ FileCheck --check-prefixes A,B --implicit-check-not placeholder3  --input-file err.txt err.txt
command line:1:22: error: IMPLICIT-CHECK-NOT: excluded string found in input
-implicit-check-not='placeholder3'
                     ^
err.txt:2:8: note: found here
B-NOT: placeholder3
       ^~~~~~~~~~~~
err.txt:2:8: error: B-NOT: excluded string found in input
B-NOT: placeholder3
       ^
err.txt:2:8: note: found here
B-NOT: placeholder3
       ^~~~~~~~~~~~

Input file: err.txt
Check file: err.txt

-dump-input=help explains the following input dump.

Input was:
<<<<<<
          1: A: placeholder1 
          2: B-NOT: placeholder3 
not:imp1            !~~~~~~~~~~~  error: no match expected
not:2               !~~~~~~~~~~~  error: no match expected
>>>>>>

P.P.S I fixed some of the issues here, so this FileCheck was issued from my build directory.

I think the example is confusing for two reasons. First, the FileCheck script and the input file are the same, which is always tricky. A line such as B-NOT: something will always fail in that case. FileCheck tests that use the same file as script and input will typically use a meaningless regex so that the directive line won’t be matched. Something like this:

A: placeholder{{1}}
B-NOT: placeholder{{3}}
placeholder1
placeholder3

Now, FileCheck should detect that line 4 is an excluded string, and report that.

Second, in practice it should never be the case that you would put the same pattern both as an argument to --implicit-check-not and in an explicit -NOT directive. While this is not explicitly forbidden, it makes no sense. The --implicit-check-not is already checking for the absense of the pattern, so an explicit -NOT is redundant. I’d expect a duplicate report, which is what your second example has.

Here’s an example of my own.

F:\Dev\scratch>type a2.txt
Hello there!
Hello!

F:\Dev\scratch>type a-check.txt
CHECK: Hello there!
CHECK-NOT: Hello

F:\Dev\scratch>%fc% a-check.txt --input-file a2.txt --implicit-check-not=Hello
command line:1:22: error: CHECK-NOT: excluded string found in input
-implicit-check-not='Hello'
                     ^
a2.txt:2:1: note: found here
Hello!
^~~~~
a-check.txt:2:12: error: CHECK-NOT: excluded string found in input
CHECK-NOT: Hello
           ^
a2.txt:2:1: note: found here
Hello!
^~~~~

Input file: a2.txt
Check file: a-check.txt

-dump-input=help explains the following input dump.

Input was:
<<<<<<
          1: Hello there!
          2: Hello!
not:imp1     !~~~~   error: no match expected
not:2        !~~~~   error: no match expected
>>>>>>

With an unmodified FileCheck, again I see two reports, one from the command line and one from the directive.

What is it that your patch is fixing, again?

1 Like

Oh, I understand now. The bugs that I encountered in FileCheck, were almost always from checking the file against itself, which I think is what breaks the code.

My patch does not change the existing functionality, it just tries to fix this issue of checking the same file against itself(I noticed that this is done in a number of CHECK-NOT tests in llvm/tests/FileCheck/)
and get wrong prefixes listed in the error message(this was the original issue, but along the way we noticed a number of failures in the same situations).
The breakages are more pronounced when prefixes distinct from CHECK are provided through --check-prefix(es).

Thanks!