Proposal: Enhance FileCheck's variable matching capabilities

Hello,

FileCheck allows us to define match variables and use them on
subsequent lines. This is quite useful, but could be even more useful
if it was possible to use the variable later on the same line it
matched. For example, I would want to write this:

; CHECK: bic [[REG:r[0-9]+]], [[REG]], #3

But I currently can't because [[REG]] will only match a REG variable
defined on a _previous_ line. As the FileCheck ref manual
(http://llvm.org/docs/CommandGuide/FileCheck.html) mentions, there are
known workarounds like having two separate CHECK lines. However, this
is a hacky and inelegant solution that makes code less readable.

I hope the rationale for the code above is clear: I want my
instruction to be acting on the same register, though I don't care
which (and don't want the test to be affected by reg-alloc changes
sometime in the future).

IMHO the proposed feature here is the natural way to match the same
register on a line, and it can be useful in writing tests. I know it
would be extremely useful in the tests I'm currently writing (not
upstream yet, but soon will be).

If the idea sounds good to people, all that's left is the
implementation :slight_smile: I already have it in my branches and can provide a
full implementation with tests (FileCheck now has a test/ dir all of
its own since r168113) if the proposal is accepted.

The rough outline of the implementation:

To enable such matching in a natural way, our regex implementation
needs to support backreferences in matches. This then allows to find
all references to a variable defined on the same line and substitute
them by backrefs.

Luckily, our regex implementation already supports backreferences,
although a bit of hacking is required to enable it. It supports both
Basic Regular Expressions (BREs) and Extended Regular Expressions
(EREs), without supporting backrefs for EREs, following POSIX strictly
in this respect. And EREs is what we actually use (rightly). This is
contrary to many implementations (including the default on Linux) of
POSIX regexes, that do allow backrefs in EREs.

Adding backref support to our EREs is a very simple change in the
regcomp parsing code. I fail to think of significant cases where it
would clash with existing things, and can bring more versatility to
the regexes we write. There's always the danger of a backref in a
specially crafted regex causing exponential matching times, but since
we mainly use them for testing purposes I don't think it's a big
problem. [it can also be placed behind a flag specific to FileCheck,
if needed].

Please share your thoughts,

Eli

If I understand correctly, the desire is to change the current
behavior in a kind of subtle way. Is there some way you could
instrument trunk's FileCheck to die if the old behavior is
encountered, and use that to definitively find all tests which rely on
the current behavior and migrate them (possibly to an interim
solution) in preparation for the change in semantics?

How much is the old (i.e. current) behavior used?

-- Sean Silva

Running through the LLVM + Clang test suites, I found a single case in
test/CodeGen/PowerPC/i64_fp_round.ll, which uses a variable defined on
the same line (and thus actually referring to its previously defined
instance).

This isn't surprising, since the current behavior is IMHO somewhat
un-intuitive and not very useful.

Naturally, once the change in behavior occurs, this test will be
fixed, as well as any other tests that start failing as a result of
the change.

Eli

I think that this is a good idea.

Nadav

The single test using this functionality was rewritten in r168588.

Eli