Problem
Over the course of working with a LIT tests system in LLVM, many times I was frustrated with the way FileCheck behaves. Personally, I find FileCheck to be way too “forgiving” [1] i.e. it does not really sanitize its input. As a result, one opens up the tests to bugs. Latest examples that I’ve encountered:
CHEKC- typos not “caught” and as a result the written test effectively does nothing. Examples in main branch of LLVM: #1, #2, #3, there are more. Common typos from my experience areCHEKC,CHECK-SAEM, etc.CHECK-DAG- is a very special command that doesn’t play well with anything else. For example, writing a CHECK-SAME after CHECK-DAG produces no error, yet CHECK-SAME does nothing - because support for CHECK-SAME is not implemented for CHECK-DAG (at least as far as I understand the situation). After a brief search, I do not see misuses of CHECK-DAG in LLVM but I’ve seen it multiple times in our downstream.CHECK-NOT- is also very special as its semantics is “the specified line is not present at this position”. In a large enough test, this semantics can often become an accidental test bug. Consider:
// Note: this is pseudo-MLIR example
%0 = foo(...)
%1 = bar(%0)
// succeeds:
// CHECK: bar
// CHECK-NOT: foo
// fails:
// CHECK-NOT: foo
// CHECK: bar
CHECK-SAME-LITERAL- is a “syntax error” as the intended command isCHECK-SAME{LITERAL}. As in the case of typos, such line seems to be ignored.- I do not recall more misusages from the top of my head but I could imagine they exist
Proposal
My proposal is to enhance the FileCheck tool by adding sanity checks (such as “is there a typo in the keyword?”), syntax checks and “workflow” logical checks (such as “for CHECK-SAME is there a preceding CHECK?”) that run before the tool starts to analyse the given text.
- For typos, et. al the process could follow an “is it a duck” test (i.e. “if it looks like a duck and it sounds like a duck, it’s most likely a duck”): if there’s a line that could almost be treated as a command, it is a command. Consider:
// CHEKC: ...
^ ^ ^^^^
| | the word immediately follows and is "similar" to --check-prefix value
| has a whitespace (" ") right after
starts with "//"
-> decision: IS a misspelled command
// the next line is run through CHEKC:
^^^^^
the word doesn't immediately follow
-> decision: is NOT a misspelled command
For the “approximation” algorithm, one may have something like a Levenshtein_distance based heuristic (actually, could we ask linting experts within LLVM for a good algorithm here - this is a task that’s generally solved by linters/IDEs/etc.?).
Thus, if a misspelled command, an error is produced and the test fails.
- The next phase, assuming typos are eliminated due to the procedure above, is to ensure no syntax errors such as CHECK-LITERAL vs CHECK{LITERAL} or CHECK- (with an extra
-).
Given that we “know” the command (it is “parsed” successfully), any later deviation is unexpected and constitutes a failure. In general, I guess this forces one to have a full-blown parser but given that the FileCheck “API” is rather minimalistic, perhaps we could have a set of “possible results” hardcoded and that would already be sufficient.
- To counter logical errors (such as “CHECK-SAME cannot follow CHECK-DAG”), it seems reasonable to hold a stack-based state of commands.
Upon a discovery of a command (e.g. CHECK-SAME, CHECK-NEXT) that must have a preceding command, the last value in the stack of commands is verified for correctness. As is the case with the syntax errors, perhaps we could just maintain a hardcoded list of possible combinations. AFAIR something like this is already done for CHECK-NEXT?
- Additional commands: I think it is reasonable to either extend functionality of existing commands or introduce additional commands. For example,
- CHECK-NOT-DAG (or CHECK-DAG-NOT) to search the whole text for a presence of a line - in many cases this could be preferred over CHECK-NOT (e.g. when the compiler is supposed to eliminate completely a certain command/operation/variable/etc.)
- CHECK-DAG-SAME (as a synonym to CHECK-SAME) is the command that analyses the text right after CHECK-DAG. With additional verification of CHECK-SAME, the error could suggest the following: “CHECK-SAME cannot be used after CHECK-DAG, did you mean to use CHECK-DAG-SAME”?
Conclusion
Would this be something that the community is interested in?
I am not sure I would have sufficient time to implement all this but our downstream periodically has to deal with FileCheck inconveniences so I think I could spare time for certain things in this list.
In any case, I felt like it is worth it to start a discussion and perhaps learn something new. Could it be that I missed some new developments in this topic? In this case, please feel free to point me to a similar conversation / documentation / etc.
[1]: Our downstream project resides on top of LLVM 19.x. Perhaps there’s some improvement already in 20.x and/or beyond.