[lit] Run a RUN line multiple times with different % replacements

MaskRay · August 30, 2022, 3:18am

For C++/ObjC++: switch to gnu++17 as the default dialect , I am updating Clang tests to work with both gnu++14 and gnu++17 -std= defaults. This is a good opportunity improving some tests to work with as many language dialects as possible.

In ⚙ D131464 [test] Make tests pass regardless of gnu++14/gnu++17 default , I use directives like %stdcxx_98-14 (C++98, C++11, C++14) and %stdcxx_17- (C++17, C++20, C++2b):

// RUN: %clang_cc1 -fsyntax-only -verify=expected,precxx17 -triple=x86_64-linux-gnu %stdcxx_98-14 %s
// RUN: %clang_cc1 -fsyntax-only -verify=expected,cxx17 -triple=x86_64-linux-gnu %stdcxx_17- %s

For a line containing a %stdcxx_* directive, it will be nice to run the command multiple times, with different -std= values. I wish that somebody familiar with lit may implement this, or suggest an alternative.

If RUN lines with continuation have multiple occurrences of %stdcxx_98-14, we probably should use the same value for all occurrences. This allows a use case for the test clang/test/CodeGenCXX/override-layout.cpp:

// RUN: %clang_cc1 %stdcxx_98-14 -w -fdump-record-layouts-simple %s > %t.layouts \
// RUN:   %clang_cc1 %stdcxx_98-14 -w -fdump-record-layouts-simple %s > %t.before \
// RUN:   %clang_cc1 %stdcxx_98-14 -w -DPACKED= -DALIGNED16= -fdump-record-layouts-simple -foverride-record-layout=%t.layouts %s > %t.after \
// RUN:   diff -u %t.before %t.after \
// RUN:   FileCheck --check-prefixes=CHECK,PRE17 %s < %t.after

@jdenny-ornl @AaronBallman @dblaikie

jh7370 · August 30, 2022, 7:30am

@jdenny-ornl has a patch up for review, put up last week which would achieve something equivalent: ⚙ D132513 [lit] Implement DEFINE and REDEFINE directives.

Basically you’d have one of these proposed DEFINE: directives and then reuse the subsequent substitution multiple times.

AaronBallman · August 30, 2022, 1:04pm

Thank you for looking into this! FWIW, I think adding this test coverage will be fantastic. The tension between adding an explicit -std= line vs adding a new test file tends to lead to us adding the -std= line to a test more often than not. That means we end up losing test coverage for other standards modes when we update the default language mode. But also, each time we bump the default language standard, we lose a huge amount of test coverage in older language modes, which is particularly dangerous given how many DRs come out of C++ and impact older language standard modes and how many extensions Clang has. I think being able to run tests in multiple language standard modes is going to really help us out here!

pogo59 · August 30, 2022, 3:01pm

See previous discussion here: Iterating 'lit' RUN lines

jdenny-ornl · August 30, 2022, 3:35pm

I believe I understand the high-level goals here, but I’m not sure I understand them well enough to know what lit features would satisfy them. Basically, I don’t know how much should be encapsulated.

The following approach encapsulates a test file’s RUN lines so you don’t have to repeat them per standard:

// DEFINE: %{cflags} = -std=c++17 -verify=cxx17
// DEFINE: %{check} = %clang_cc1 -fsyntax-only -verify=expected \
// DEFINE:                       -triple=x86_64-linux-gnu %{cflags} %s
// RUN: %{check}

// REDEFINE: %{cflags} = -std=c++98 -verify=precxx17
// RUN: %{check}

// REDEFINE: %{cflags} = -std=c++11 -verify=precxx17
// RUN: %{check}

// etc.

Multiple RUN lines can be joined with && within %{check} if you want to run other checks across that list of standards. All of the above is possible if D132513 lands.

The following approach is the same as the previous approach except it encapsulates the list of standards so it can easily be iterated multiple times, possibly across multiple test files:

// These could be placed in a specific test file, or they could be
// defined (with different syntax) in a lit configuration file to 
// share among multiple test files.
//      
// DEFINE: %{check}( CFLAGS %) =
//      
// DEFINE: %{check-stdcxx98-14} =       \
// DEFINE:   %{check}( -std=c++98 %) && \ 
// DEFINE:   %{check}( -std=c++11 %) && \ 
// DEFINE:   %{check}( -std=c++14 %)

// These would go in a specific test file.
//      
// REDEFINE: %{check}( CFLAGS %) =                              \
// REDEFINE:   %clang_cc1 -fsyntax-only -verify=expected        \
// REDEFINE:              -triple=x86_64-linux-gnu %{CFLAGS} %s
//      
// RUN: %{check-stdcxx98-14}

Notice this form requires function substitutions. I have already implemented support (except it’s not convenient yet to define function substitutions in lit configuration files). However, I haven’t proposed this in phabricator as I’m waiting to see what the feedback for D132513 will look like first.

Finally, if the list of standards to test is mostly uniform across a large set of test files, you might want to follow the approach in the openmp/libomptarget test suite. It generates multiple sub test suites across all test files. Each generated sub test suite expands substitutions differently in order to target different devices. In your case, each generated sub test suite would target a different C++ standard instead.

AaronBallman · August 30, 2022, 7:41pm

This is really neat functionality!

It’s unclear if this is the case in Clang or not, but I suspect it’s not. Many of our tests are grouped by what part of the compiler they impact (Sema, Parser, Preprocessor, etc). Trying to further group them by language standard would be tricky because no small number of our tests are run against multiple language modes (e.g. -x c++ or -x c).

I think there’s two primary aspects to what’s needed:

Some way to specify on the RUN line “I want you to run over this range of language standards” that is obvious to reviewers/authors and doesn’t otherwise obfuscate what’s happening on the RUN line.
Some way to verify test results against the different standards versions that isn’t super onerous or confusing when the test results differ by standard.

For #1, I think we don’t want to add a bunch of boilerplate before the RUN line to get it set up as that’s likely to be intimidating to newcomers, obfuscates what the RUN line is trying to test because you have to hunt around for it more, and runs the risk of being copied around in slightly different forms from test to test. However, if that boilerplate could be generalized so that lit “auto-includes” it, that could be reasonable so long as the logic isn’t something reviewers or authors are going to need to dig into all that often.

For #2, I’m not certain there’s anything we can automate just yet because some tests use -verify-style verification, others use FileCheck, and still others use ad hoc solutions, but all of them have the need to want to run multiple language standards. So I’d be wary of trying to tie the solution to #2 into the solution for #1 just yet. As an incremental first step, I think we could leave #2 entirely up to the test writer to figure out how to do and once some patterns emerge that seem to work well, we can think about how to make them more broadly available, but that only works if #1 largely leaves the RUN lines alone instead of hides the details in replacements.

jdenny-ornl · August 30, 2022, 8:23pm

I want to be sure we’re not miscommunicating here. If there are a large number of tests that run against the same set of language modes and standards (but perhaps have varying RUN lines), then a libomptarget-style solution is probably a good solution for them. If tests have lots of differences in which modes they check, then they likely need something that enables more customization per test, as DEFINE/REDEFINE offer.

So let’s say the following is defined somewhere that permits it to be used by multiple test files without having to repeat it:

// DEFINE: %{check-std}( STD %, PREFIXES %) =
//      
// DEFINE: %{check-stdcxx98-14} =                        \
// DEFINE:   %{check-std}( c++98 %, cxx98,precxx17 %) && \ 
// DEFINE:   %{check-std}( c++11 %, cxx11,precxx17 %) && \ 
// DEFINE:   %{check-std}( c++14 %, cxx14,precxx17 %)

Then a specific test file can do this:

// REDEFINE: %{check-std}( STD %, PREFIXES %) =                      \
// REDEFINE:   %clang_cc1 -fsyntax-only -verify=expected,%{PREFIXES} \
// REDEFINE:              -triple=x86_64-linux-gnu -std=%{STD} %s

// RUN: %{check-stdcxx98-14}

Or this:

// REDEFINE: %{check-std}( STD %, PREFIXES %) = \
// REDEFINE:   %clang_cc1 -std=%{STD} -w -fdump-record-layouts-simple %s > %t.layouts && \
// REDEFINE:   %clang_cc1 -std=%{STD} -w -fdump-record-layouts-simple %s > %t.before && \
// REDEFINE:   %clang_cc1 -std=%{STD} -w -DPACKED= -DALIGNED16= -fdump-record-layouts-simple -foverride-record-layout=%t.layouts %s > %t.after && \
// REDEFINE:   diff -u %t.before %t.after && \
// REDEFINE:   FileCheck --check-prefixes=CHECK,%{PREFIXES} %s < %t.after

// RUN: %{check-stdcxx98-14}

Do you feel this addresses aspect #1 while avoiding the concerns about boilerplate, obfuscation, and intimidation? We’ll have to answer the question of where the boilerplate goes (e.g., lit configuration file, or an include mechanism), but I’m confident we can find some solution there.

For aspect #2, the above solution focuses on handing the mode data (STD and PREFIXES) to the test author, who can then use that data in whatever way they need. There might be a better way to present that data so it’s more flexible, but I don’t have enough familiarity with the range of test cases in question to do that yet. Does that work?

MaskRay · August 31, 2022, 3:42am

I find it not uncommon that a test has 3 different behaviors:

C++98
C++11, C++14
C++17, C++20, C++2b

I may use a common prefix (e.g. CHECK). For the distinct part I want to use 3 different prefixes, say, cxx98, cxx11 (this covers C++14, too), cxx17 (this covers C++20/C++2b, too).

For another test it may have different behaviors for the following partition of language standards:

C++98
C++11, C++14, C++17
C++20, C++2b

We can imagine that across the whole Clang testsuite, there are many partitions of language standards and a test may exercise a different behavior for each group in a partition. Instead of having a predefined set of check prefixes, letting individual tests defining preferred check prefixes is more ergonomic. I’d also want to avoid boilerplate as much as possible. Something like

// RUN: %clang_cc1 -fsyntax-only -verify=expected,precxx17 -triple=x86_64-linux-gnu %stdcxx_98-14 %s
// RUN: %clang_cc1 -fsyntax-only -verify=expected,cxx17 -triple=x86_64-linux-gnu %stdcxx_17- %s

is the best I can come up with.

If %stdcxx_* is too ad-hoc, I am happy to switch to other forms, e.g.:

// RUN: %for stdcxx_17- { %clang_cc1 -fsyntax-only -verify=expected,cxx17 -triple=x86_64-linux-gnu %[std] %s }
// %for stdcxx_17-  runs the body multiple times, with %[std] replaced with c++17, c++20, c++2b.
// %for stdcxx_11-14  runs the body multiple times, with %[std] replaced with c++11, c++14.

stdcxx_17-, stdcxx_11-14 should be auto included as otherwise it is too much boilerplate for an individual test.

I can imagine that it may take time to have %for stdcxx_17-, so in the interim I wish that we can just use something like %stdcxx_98-14. When %for is available, we can switch to it.

AaronBallman · August 31, 2022, 1:31pm

Agreed

+1

AaronBallman · August 31, 2022, 1:35pm

This is more the case for Clang tests.

If the REDEFINE lines are hidden away from the user, then the RUN line becomes inscrutable because we don’t know what options are being passed beyond the language standard. If the REDEFINE lines are in the test file, it’s a lot of extra (potentially intimidating) boilerplate in basically every test. I think we need something “in between” the two. My hope is that we can either write RUN %clang_cc1_stdcxx14_plus ... or (better yet) RUN %clang_cc1 ... -std=%cxx14- ... so that the rest of the RUN line contents aren’t obscured but we can still signal to lit that there’s some amount of test repetition expected.

jdenny-ornl · August 31, 2022, 3:16pm

Sure, in the example I presented in my last comment, a test file can just ignore the PREFIXES argument, or it can be dropped entirely if it’s not useful anywhere.

MaskRay:

// RUN: %clang_cc1 -fsyntax-only -verify=expected,precxx17 -triple=x86_64-linux-gnu %stdcxx_98-14 %s
// RUN: %clang_cc1 -fsyntax-only -verify=expected,cxx17 -triple=x86_64-linux-gnu %stdcxx_17- %s

So, dropping PREFIXES from the substitution approach I proposed in my previous comment, that could then look like:

// REDEFINE: %{check-std}( STD %) = %clang_cc1 -fsyntax-only -verify=expected,precxx17 -triple=x86_64-linux-gnu -std=%{STD} %s
// RUN: %{check-stdcxx_98-14}

// REDEFINE: %{check-std}( STD %) = %clang_cc1 -fsyntax-only -verify=expected,cxx17 -triple=x86_64-linux-gnu -std=%{STD} %s
// RUN: %{check-stdcxx_17-}

The definitions of %{check-stdcxx_98-14} and %{check-stdcxx_17-} are the boilerplate that would be hidden away. Hopefully the names are at least as self-explanatory as the names of %stdcxx_98-14 and %stdcxx_17-, whose definitions would also be hidden away somewhere.

Moreover, IMHO, it’s pretty obvious the %{check-std} substitution is being redefined for the sake of the following RUN line, so a new user can easily guess the former is used by the latter in a manner hinted at by the names (and of course we can chose more explicit names if desired). In contrast, I find it subtle and surprising that %stdcxx_17- causes the RUN line to actually duplicate itself, and it’s unclear to me how that duplication would behave generally (e.g., if there are multiple such special substitutions in the same RUN line).

The REDEFINE lines are in the test, as shown above.

As shown above, it’s a small amount of extra code. Each RUN line is replaced by a REDEFINE directive that essentially captures the RUN line in a callback function. That’s followed by a short RUN line that expresses which standards are to be iterated.

In the REDEFINE examples above, you write something very similar to that: %clang_cc1 … -std=%{STD} and the rest of the original RUN lines aren’t obscured. The following RUN line then signals the test repetition.

It seems to me that per-test substitutions (regardless of whether my exact DEFINE/REDEFINE patches are accepted) are a more flexible feature that can be learned once (and implemented once in lit) to tackle a larger variety of problems. If you feel that special multi-valued substitutions like %stdcxx_98-14 are needed anyway, I’m not seeing why yet, so please continue to explain what I’m not seeing.

AaronBallman · August 31, 2022, 4:38pm

Yeah, this is far less boilerplate code than I was envisioning (I was thinking we’d need something more like what was posted earlier that had several lines). However, it still means we’ll be doubling the amount of test harness code in a lot of tests, and the code users have to write is basically a little DSL for macro replacement, both of which are kind of unfortunate. I think the reason lit tests are so popular (at least in Clang) is because of the ease of writing and understanding them compared to gtest-driven unit tests, to some extent. (Most of what I recall users being confused about is the difference between // RUN: clang ... and // RUN: %clang_cc1 ..., or how -verify or FileCheck work.)

I definitely think it’s a powerful general mechanism, but I think some of that power comes at a cost. I think I probably have too many “been here for ages” biases to evaluate how easy or hard such an approach would be for new contributors, but I’m a little skeptical that needing to learn a (very small) DSL to write tests is going to be easier than what we have today. However, maybe I’m wrong!

I worry that users copy and pasting the RUN lines and just changing the -std= values manually will be more clear to folks when there is a problem with the RUN line itself. e.g., the DSL is probably fine for reading an existing test, and it’s possibly even okay for modifying an existing test that already uses it, but what will the experience be when writing a wholly new REDEFINE line and there’s an issue with it (such as the syntax being wrong or an issue with the command itself failing)? Basically, do you think the extra layers of abstraction will make it harder for users who are struggling to write the RUN line in some hard-to-address way?

One other practical question is: can we still use other lit features on the RUN line? e.g., // RUN: not %{check-stdcxx_17-} | MyOtherTool %s or do those have to be in the REDEFINE code?

jdenny-ornl · August 31, 2022, 9:42pm

I’m not sure which part of the code you mean or how you’re measuring. Are you referring to per-test code? In my examples in my last comment, the extra code is surely less than double, and that’s for RUN lines that were originally fairly small.

Most lit test suites I’ve encountered use their own little DSLs that are made up of the substitutions that are defined by their lit configuration files. That DSL would be extended either by the %{check-stdcxx98-14} approach I’m suggesting or by the multi-valued %{stdcxx98-14} approach. So I don’t see a difference in that regard, but maybe I missed your point.

I’m sure that either approach we’re discussing would sometimes be harder than fully loop-unrolled RUN lines, as are typical today. However, either approach chooses to suffer a layer of abstraction in order to avoid repeated code, and that choice often makes things easier.

%{check-stdcxx_17-} would merely be a lit substitution that expands %{check-std} many times (once per C++ standard) joined by &&. In other words, it would be this:

// DEFINE: %{check-stdcxx17-} =        \
// DEFINE:   %{check-std}( c++17 %) && \ 
// DEFINE:   %{check-std}( c++20 %) && \ 
// DEFINE:   %{check-std}( c++2b %)

Thus, your example above would put not before the first %{check-std} and | MyOtherTools %s after the last %{check-std}. If you want to apply those to an individual command within your %{check-std}, then you have to specify them on that individual command within your %{check-std} (in its REDEFINE).

I have some questions about multi-valued substitutions like %{stdcxx98-14}:

What happens if I use %{stdcxx98-14} multiple times in a RUN line? Does it execute the entire RUN line once per value in %{stdcxx98-14}, and, for each execution, substitute the same value at every occurrence of %{stdcxx98-14}? In other words, does it remain a single loop over the values instead of nested loops?
What if I also use another multi-valued substitution in the same RUN line? What if that one doesn’t have the same number of values as %{stdcxx98-14}?
Is there any way to execute a RUN line for every combination of values from multi-valued substitutions in the RUN line? In other words, can I produce nested loops instead of a single loop?
What happens when I write the RUN line’s result to a file and read it with another command (e.g., FileCheck) in a separate RUN line (not joined by \)? Seems innocent enough. But will the second RUN line pick up the output file from only the last iteration of first RUN line?
Or maybe I’ve got it all wrong and the whole test file is repeated for each value of %{stdcxx98-14}? That would raise more questions about combinations of multi-valued substitutions in separate RUN lines.

I acknowledge that multi-valued substitutions like %{stdcxx98-14} might be tailor-made for the use cases originally presented in this discussion. In comparison to the approach I’m suggesting, that might make them initially easier to use once nuances are well defined and understood. But will multi-valued substitutions still be useful as those use cases become more complex? Is a user’s understanding of those nuances reusable for other use cases? Is the implementation in lit reusable for other use cases? For the approach I’m suggesting using per-test substitutions (via DEFINE/REDEFINE), my hope is that the answer is yes to all these questions. But of course, there will be nuances there too.

MaskRay · September 1, 2022, 5:41am

I agree that %{check-stdcxx_17-} making the containing command repeated is magical. I pick it mostly because it requies the minimum amount of efforts to implement…
I think using a generic design like your ⚙ D132513 [lit] Implement DEFINE and REDEFINE directives is better. I just want to understand (a) how difficult it is to support function-like macros and
(b) how long ⚙ D131464 [test] Make tests pass regardless of gnu++14/gnu++17 default should wait on ⚙ D132513 [lit] Implement DEFINE and REDEFINE directives and its follow-ups.

For D131464, If I understand correctly, we can change clang/test/lit.cfg.py this way:

config.substitutions += [
  ('%{check-std:cxx98-14}', '%{check-std}(c++98) && %{check-std}(c++11) && %{check-std}(c++14)'),
  ('%{check-std:cxx17-}', '%{check-std}(c++17) && %{check-std}(c++20) && %{check-std}(c++2b)'),
]

In a clang/test/ test, write

// REDEFINE: %{check-std}(STD) = %clang_cc1 -fsyntax-only -verify=expected,precxx17 -triple=x86_64-linux-gnu -std=%{STD} %s
// RUN: %{check-std:cxx98-14}

// REDEFINE: %{check-std}(STD) = %clang_cc1 -fsyntax-only -verify=expected,cxx17 -triple=x86_64-linux-gnu -std=%{STD} %s
// RUN: %{check-std:cxx17-}

I have noticed that your example uses // DEFINE: %{check-std}( STD %, PREFIXES %) =. Some questions and thoughts:

%, appears to be a argument separator. Is this deliberately preferred over ,?
The parameter list end uses %) instead of ). Is this better?
I try to avoid too many punctuations in %{...}, so I pick %{check-std:cxx17-} instead of %{check-std:c++17-} . Do I need to avoid +?
For now we may assume that some metacharacters such as (, ), , cannot appear in macro arguments.

lit - LLVM Integrated Tester — LLVM 18.0.0git documentation has a few %{...} uses. There aren’t many predefined substitutions, so I think overloading it with macro arguments is fine.

jdenny-ornl · September 1, 2022, 5:42pm

I think that’s very reasonable. I apologize that I’ve been caught up in debating the long-term direction, and I haven’t been considering your short-term needs. Even for the long-term direction, there might be critical issues we still need to consider for your use case. I’m happy to keep discussing how/whether the solution I’m proposing works for you.

I will attempt to prepare that patch for phabricator in the next couple of days so you can get a sense of it. From a high-level view within lit’s implementation, I think it’s a conceptually simple extension. It’s just a substitution. The regex to match uses of function substitutions are where it gets a bit subtle in the implementation.

If your D131464 works well for your use case now, and if it’s reasonable to migrate to something else later, my personal opinion is that you shouldn’t wait. Some well placed comments could hopefully warn people that multi-valued substitutions will likely be replaced later, so they should be prepared for that if they want to use them now.

I say don’t wait because I want the DEFINE/REDEFINE series to get a proper review from multiple people who have expressed interest. It’s possible someone will discover it doesn’t do what they had hoped, so we’d need to think about it more. They might have great suggestions that require major changes that would break backward compatibility if we land too early. All this might be quick or might be slow.

MaskRay:

For D131464, If I understand correctly, we can change clang/test/lit.cfg.py this way:

config.substitutions += [
  ('%{check-std:cxx98-14}', '%{check-std}(c++98) && %{check-std}(c++11) && %{check-std}(c++14)'),
  ('%{check-std:cxx17-}', '%{check-std}(c++17) && %{check-std}(c++20) && %{check-std}(c++2b)'),
]

You’d also need to provide an initial definition for %{check-std} so it can be REDEFINEd. %{check-std} is a function substitution, so its definitions in python (python pattern and replacement string) are tricky to get right. My function substitution patch will extend DEFINE/REDEFINE to do that for you, but I still need to expose that functionality to lit configurations files so they can easily do the same thing. I don’t think implementing that exposure will be challenging, and I want it too, but I haven’t gotten to it yet.

An alternative approach is to create a lit include feature so you can write out those definitions using DEFINE/REDEFINE in the same way you would write them in an individual test. I also haven’t implemented that yet.

One question for you: do you need to support every possible range within the C++ standards list? That would be a lot of lit substitutions to define. Or is there a specific set of ranges that would suffice and could be extended as needed?

I originally implemented it to use , and ). I then tried to use that in my own downstream test suites, and I quickly ran into many cases where I needed , and ) in my actual arguments. For example, an argument might be a comma-delimited list of FileCheck/-verify prefixes. An argument might contain expected output strings that contained parenthesized elements, such as (null).

I played around with permitting either set of delimiters because one is pretty and the other offers flexibility I sometimes need. However, having two syntactic forms cluttered the implementation, and I kept mixing things up when writing tests.

Ultimately, I decided just %, and %) are the best way to go. I don’t think they’re particularly pretty, but they mean there’s no limit on what can go in an argument. Moreover, they fit the general theme that % is the only special character. The %if syntax ultimately followed the same theme.

But, like anything else, the syntax can be debated.

In my current implementation, you do need to avoid +. The documentation specifies exactly what’s permitted, and lit diagnoses violations. I made it very strict. There’s a simple reason: the name is a pattern applied using python’s re.sub. That’s already how substitutions work in lit. We could implement DEFINE/REDEFINE to escape the name, but what happens when you decide to relocate the name into a lit configuration file? You then have to remember to escape it yourself. I predict people will often forget (I did before I restricted things), substitutions will quietly match text they weren’t intended to, and we’ll be lucky if tests fail. For example, . is a particularly insidious character my implementation doesn’t permit even though I really like the way it looks in names.

They can appear in arguments. See discussion above.

I’m not sure I understand the point you’re making here. Can you explain a bit more?

MaskRay · September 1, 2022, 7:31pm

jdenny-ornl:

MaskRay:

I just want to understand (a) how difficult it is to support function-like macros

I will attempt to prepare that patch for phabricator in the next couple of days so you can get a sense of it. From a high-level view within lit’s implementation, I think it’s a conceptually simple extension. It’s just a substitution. The regex to match uses of function substitutions are where it gets a bit subtle in the implementation.

MaskRay:

and
(b) how long D131464 [test] Make tests pass regardless of gnu++14/gnu++17 default should wait on D132513 [lit] Implement DEFINE and REDEFINE directives and its follow-ups.

If your D131464 works well for your use case now, and if it’s reasonable to migrate to something else later, my personal opinion is that you shouldn’t wait. Some well placed comments could hopefully warn people that multi-valued substitutions will likely be replaced later, so they should be prepared for that if they want to use them now.

I say don’t wait because I want the DEFINE/REDEFINE series to get a proper review from multiple people who have expressed interest. It’s possible someone will discover it doesn’t do what they had hoped, so we’d need to think about it more. They might have great suggestions that require major changes that would break backward compatibility if we land too early. All this might be quick or might be slow.

Thanks!

jdenny-ornl:

MaskRay:
For D131464, If I understand correctly, we can change clang/test/lit.cfg.py this way:
config.substitutions += [
  ('%{check-std:cxx98-14}', '%{check-std}(c++98) && %{check-std}(c++11) && %{check-std}(c++14)'),
  ('%{check-std:cxx17-}', '%{check-std}(c++17) && %{check-std}(c++20) && %{check-std}(c++2b)'),
]
You’d also need to provide an initial definition for %{check-std} so it can be REDEFINEd. %{check-std} is a function substitution, so its definitions in python (python pattern and replacement string) are tricky to get right. My function substitution patch will extend DEFINE/REDEFINE to do that for you, but I still need to expose that functionality to lit configurations files so they can easily do the same thing. I don’t think implementing that exposure will be challenging, and I want it too, but I haven’t gotten to it yet.

An alternative approach is to create a lit include feature so you can write out those definitions using DEFINE/REDEFINE in the same way you would write them in an individual test. I also haven’t implemented that yet.

Noted.

If we place C++98, C++11, C++14, C++17, C++20, and C++2b on an interval. There are many subintervals (O(n^2)) but not every one may be exercised by a test.Currently I have just seen the following. I I think we will just add substitutions as needed.

// Hack. Should be replaced once the function-like macro is available
+        add_stdcxx('%std_cxx98-14')
+        add_stdcxx('%std_cxx98-')
+        add_stdcxx('%std_cxx11-14')
+        add_stdcxx('%std_cxx11-')
+        add_stdcxx('%std_cxx14-')
+        add_stdcxx('%std_cxx17-20')
+        add_stdcxx('%std_cxx17-')
+        add_stdcxx('%std_cxx20-')
+        add_stdcxx('%std_cxx23-')

jdenny-ornl:

MaskRay:

I have noticed that your example uses // DEFINE: %{check-std}( STD %, PREFIXES %) =. Some questions and thoughts:

%, appears to be a argument separator. Is this deliberately preferred over ,?

The parameter list end uses %) instead of ). Is this better?

I originally implemented it to use , and ). I then tried to use that in my own downstream test suites, and I quickly ran into many cases where I needed , and ) in my actual arguments. For example, an argument might be a comma-delimited list of FileCheck/-verify prefixes. An argument might contain expected output strings that contained parenthesized elements, such as (null).

I played around with permitting either set of delimiters because one is pretty and the other offers flexibility I sometimes need. However, having two syntactic forms cluttered the implementation, and I kept mixing things up when writing tests.

Ultimately, I decided just %, and %) are the best way to go. I don’t think they’re particularly pretty, but they mean there’s no limit on what can go in an argument. Moreover, they fit the general theme that % is the only special character. The %if syntax ultimately followed the same theme.

But, like anything else, the syntax can be debated.

MaskRay:

I try to avoid too many punctuations in %{...}, so I pick %{check-std:cxx17-} instead of %{check-std:c++17-} . Do I need to avoid +?

In my current implementation, you do need to avoid +. The documentation specifies exactly what’s permitted, and lit diagnoses violations. I made it very strict. There’s a simple reason: the name is a pattern applied using python’s re.sub. That’s already how substitutions work in lit. We could implement DEFINE/REDEFINE to escape the name, but what happens when you decide to relocate the name into a lit configuration file? You then have to remember to escape it yourself. I predict people will often forget (I did before I restricted things), substitutions will quietly match text they weren’t intended to, and we’ll be lucky if tests fail. For example, . is a particularly insidious character my implementation doesn’t permit even though I really like the way it looks in names.

Thanks for being open:) My first impression is that the syntax is quite different from existing macro processors I have seen (e.g. m4, C preprocessor, make), and therefore a user accustomed to the uses may get confused when starting to use the lit feature.

In particular, when I saw %) for the first time, I wondered whether I had a missing%( or (% (i.e. unpaired %) ).
When I saw %,, I stopped but then realized it was used as parameter separator. We may imagine that a function-like macro taking more than one arguments may not be common in the testsuite. If a new user starts to define such a macro, they may trip over the different syntax.

This is just my feeling. We can collect more feedback from other interested users.

Nod.

I mean that there are predefined substitutions like %{pathsep}, %{fs-src-root}. Shall we be concerned with a macro parameter conflicting with a predefined substitution. The number of predefined substitutions isn’t large, so personally I don’t consider it a problem.

I see that %[...] was previously mentioned. If there is a concern, we can split the two uses into %[...] and %{...}. I slightly prefer %{...} for macro parameters because macro parameters may get wider adoption and ${...} is widely used in many languages. (%{pathsep} is really rare at the moment.)

jdenny-ornl · September 1, 2022, 7:56pm

Good to hear. (I was worried whether an explosion of substitutions might impact lit’s performance.)

Agreed. I’m open to changes. I just hope we can keep it general with simple rules.

I thought about %{foo}%( ARG %) so that things would look more balanced. If that’s what people prefer, I can live with that. However, I decided %{foo}( could be one token, and that was easier on my eyes. In some sense, %{foo}( ARG %) is fully balanced: there’s a % pair and a parens pair.

Agreed. However, I’ve tried to make lit carefully diagnose mistakes in the formal parameter list of function substitutions. It will specifically tell you , isn’t the separator if you try to use it. For actual argument lists in uses of function substitutions, diagnostics are not so great: the substitution just doesn’t expand. But hopefully the formal parameter list is a good reminder of the syntax.

Moreover, lit complains if a substitution’s name in a DEFINE is even a substring of an existing substitution’s pattern. That helps to avoid such mistakes.

I’m open to changes like that if they help.

Topic		Replies	Views
Iterating 'lit' RUN lines LLVM Project	11	773	August 23, 2022
[RFC] [lit] Report line numbers for failed RUN commands LLVM Dev List Archives	2	120	April 2, 2018
Best way to support multiple compilers from lit tests LLDB	0	106	November 19, 2018
Need A Clang Developer [was: Re: [llvm-commits] [llvm] r122346 - in /llvm/trunk: docs/TestingGuide.html test/lit.cfg utils/lit/lit/TestRunner.py] Clang Frontend	0	56	December 21, 2010
lit: deprecating trailing \ in RUN lines LLVM Dev List Archives	22	189	December 14, 2013

[lit] Run a RUN line multiple times with different % replacements

Related topics