[PATCH] add regex/globbing to -verify diagnostics

The other day someone on IRC asked if -verify supported regex; attached is a patch for regex and glob-style matching. They're selected by using one of '-', '~' or '*' for string, regex and glob matching. Maybe '?' would be better for glob? Here's a few pertinent lines from the header docs:

/// Alternative matching modes may be specified via the first character which
/// follows 'expected'. The following modes are supported:
///
/// - standard string matching, case-sensitive
/// ~ regular-expression matching
/// * glob-style matching
///
/// Examples: match error "variable has incomplete type 'struct s'"
///
/// // expected-error {{variable has incomplete type 'struct s'}}
/// // expected-error {{variable has incomplete type}}
///
/// // expected~error {{variable has has type 'struct .'}}
/// // expected~error {{variable has has type 'struct .*'}}
/// // expected~error {{variable has has type 'struct (.*)'}}
///
/// // expected*error {{variable has incomplete type 'struct ?'}}
/// // expected*error {{variable has incomplete type 'struct [stuv]'}}
/// // expected*error {{variable has incomplete type 'struct [!abcd]'}}
/// // expected*error {{variable has incomplete type 'struct *'}}

--mike

verify.regex0.patch (21.2 KB)

mike <mikem.llvm@gmail.com> writes:

The other day someone on IRC asked if -verify supported regex;
attached is a patch for regex and glob-style matching. They're
selected by using one of '-', '~' or '*' for string, regex and glob
matching. Maybe '?' would be better for glob? Here's a few pertinent
lines from the header docs:

The expected-error, expected~error, and expected*error are so similar as
to be confusing. Why not simply have expected-error, expected-error-re,
and expected-error-glob? It's only a few extra characters, and much more
noticeable.

Patch update uses suggested '-re' and '-glob' syntax, patch attached.

- lit tests pass with clang-debug, clang-release and selfclang-debug builds
- no measurable performance impact on lit tests

***Changes from last patch:

- updated syntax to use suffixes { -re | -glob } for regex and globbing, respectively
- added glob sequence support: first element ']' treated as literal
- reworked verify-string parser to be more resilient to errors
- added lit test/Misc/verify.c

***Examples matching error: "variable has incomplete type 'struct s'"

// expected-error {{variable has incomplete type 'struct s'}}
// expected-error {{variable has incomplete type}}

// expected-error-re {{variable has has type 'struct .'}}
// expected-error-re {{variable has has type 'struct .*'}}
// expected-error-re {{variable has has type 'struct (.*)'}}
// expected-error-re {{variable has has type 'struct[[:space:]](.*)'}}

// expected-error-glob {{variable has incomplete type 'struct ?'}}
// expected-error-glob {{variable has incomplete type 'struct [stuv]'}}
// expected-error-glob {{variable has incomplete type 'struct [!abcd]'}}
// expected-error-glob {{variable has incomplete type 'struct *'}}

--mike-m

verify.regex1.patch (23.5 KB)

Patch update uses suggested '-re' and '-glob' syntax, patch attached.

- lit tests pass with clang-debug, clang-release and selfclang-debug builds
- no measurable performance impact on lit tests

I also prefer the "-re" suffix, thanks for doing that.

This is really cool, but I think it's a bit overkill. I'd rather not support glob at all, it adds a bunch of complexity for no added advantage.

You converted some loops like this:

- std::string Msg(CommentStart, ExpectedEnd);
- std::string::size_type FindPos;
- while ((FindPos = Msg.find("\\n")) != std::string::npos)
- Msg.replace(FindPos, 2, "\n");
- // Add is possibly multiple times.
- for (int i = 0; i < Times; ++i)
- ExpectedDiags.push_back(std::make_pair(Pos, Msg));

to explicit for loops with switches in them. If you're going to change them, please convert them to using the StringRef API and its algorithms.

It looks like using an invalid regex in an directive line will cause an assertion, pleasem ake it be an error from -verify mode.

Thanks for working on this!

-Chris

Hi Chris, thanks for taking a look at this patch.

Patch is updated to your specs and attached. Summary of changes since last iteration:

- removed globbing support
- invalid regex now reports proper diagnostic instead of using assert mechanism
- reworked "\\n" -> '\n' substitution with StringRef
- parameterized various verify diagnostic error definitions for { string | regex } differentiation

Existing lit tests show no measurable performance impact.

--mike-m

verify.regex2.patch (21.9 KB)

looks great to me, applied in r102516, thanks!