[RFC] New Preprocessor macro directive: #Repeat

I’m trying to create a new preprocessor macro directive called #repeat, the point of this directive to to repeatedly inject tokens into the preprocessor wherever it’s called.

It takes two “arguments”, though that term doesn’t really make sense in this context.

the first is the number of times to inject these tokens into the preprocessor, the second is the actual tokens to inject.

My use case is a little convoluted, but I think this feature will be extremely useful.

#define BitBuffer_NumTests 1

#define BitBuffer_TestNames BitBuffer_Test1

_Pragma(“push_macro(“BitBuffer_TestNames”)”)

_Pragma(“redefine_macro(BitBuffer_TestNames BitBuffer_Test2)”)

_Pragma(“redefine_macro(BitBuffer_NumTests BitBuffer_NumTests + 1)”)

/*
BitBuffer_NumTests now equals 2, and BitBuffer_TestNames now contains a stack of two tests, Bitbuffer_Test1 and BitBuffer_Test2
Using pop macro on BitBuffer_TestNames we can expand every single test name by looping with _Pragma("pop_macro("BitBuffer_TestNames “|)”)

I know that this is very convoluted, and the code will be hard to read, but it’s doable, and I strongly think it’s worth while to add these two features, #repeat and _Pragma(redefine_macro())
*/

My question is technical, I’m already writing these extensions myself, and all I need is a little help understanding Clang’s implementation of the preprocessor

So, I’ve written a new PPKeyword in TokenKinds.def for #repeat, and in Clang’s Lexer I’ve written HandleRepeatDirective, I’ve added it to the IdentifierTable, etc.

My question is, when in HandleRepeatDirective, I Lex and expand the first argument, the number of times to repeat and that part is working fine.

I’m having trouble with the second part, basically the second argument is strange, like the example above the _Pragma(redefine_macro("BitBuffer_NumTests BitBuffer_NumTests + 1")) part, how should I read these tokens?

Previously I was just doing a raw loop making sure that eod wasn’t seen, but it turns out that macros are created by ReadOptionalMacroParameterListAndBody, so I started using that, but I don’t want to create a new macro, so what do I actually do here? how do I expand the tokens repeatedly when the #repeat directive is seen by clang during compilation?

I’ve got the tokens from the NewDefinition string literal in the _Pragma(redefine_macro(“MacroName NewDefintiion”)); with LexFromRawLexer, and I’ve built a MacroInfo to hold those tokens, how do I append the new definition to the IdentifierInfo(MacroName)?

I saw a function the other day that seemed relevant, it was named something like appendMacroInfo or something like that?

The function I’m thinking of is appendDefMacroDirective or appendMacroDirective

What is the difference between the two?

That’s an interesting idea – I’ve seen folks asking for a preprocessor loop mechanism before, so there might be interest in such a feature.

FWIW, it’s not clear to me whether this RFC is trying to add the feature to Clang at this point or not; can you clarify?

I would consider reading the tokens into a buffer of some sort and then re-inject them into the token stream. There is a TokenCollector object (see Pragma.cpp) that is used to implement _Pragma which does something along the same lines of what you might want.

1 Like

I am trying to add the #repeat directive and _Pragma(redefine_macro) pragma to Clang, I have two relevant branches on my fork of LLVM, one for #Repeat and one for _Pragma(redefine_macro)

Redefine_macro sits perfectly between push_macro and pop_macro, and intends on using those existing features to create a list in a macro, so that lists of test cases and suites can be created by the preprocessor, so that tests can be automatically registered by the compiler during compilation.

My LLVM fork is on my Github, the code isn’t quite ready to be upstreamed, I’ve got more debugging and tests to write, but it’s not just some half baked idea I’m thinking about (like _Overload lol)

Thanks for the clarification!

In terms of adding such a feature to Clang, you should be sure to see Clang - Get Involved for the criteria for adding extensions to Clang, especially #1, #2, and #4. The preprocessor is somewhat difficult to extend with new directives because you have to keep in mind that the preprocessor is shared between at least C and C++ (potentially others as well in downstream projects), so the semantics have to work in a broader set of circumstances than for usual language extensions. A compounding factor is that every compiler errors on unknown preprocessor directives, so this feature is going to be really hard to use portably without some sort of feature testing macro, but we’ve never really had feature testing macros for the preprocessor itself (a bit of a chicken and egg situation there).

Personally, I think extending the preprocessor has a higher bar for inclusion in Clang because of these sort of considerations, so it’s especially important to have compelling use cases for this functionality that show there’s a community of users waiting on it. e.g., see if you can find issues in compiler bug databases (doesn’t have to be just Clang’s) asking for the feature, workarounds people have to use and how commonly they are used in practice, etc.

I’ve wondered about feature test macros, but for me personally I’m ok with just using Clang.

But for other users (and hard erroring concerns) you’re right, we should have a feature test macro.

Maybe there should just be a _has_pp_extension(repeat)?

I don’t think the name #repeat will clash with any mainstream compiler, like Clang, GCC, or MSVC, nor does it clash with the M4 preprocessor, but idk about the more niche ones.

As for other users, I was talking to Alex Gilding and JeanHeyd Meneide about #repeat months ago and they both seemed interested in looping in the preprocessor, and like half the point of Jens Gustedt’s P99 is to enable looping.

Not to mention Boost’s preprocessor library which also enables looping in the preprocessor, like P99 by abusing recursive includes, which is far from efficient.

My biggest question is, should I try to write this as a proposal first, upstream it to Clang first, or just keep it specific to my fork and try to get it standardized?

What would be the best way to get this into the world in a usable state?

I think we should be able to reuse __has_extension for this.

That’s good to know. I didn’t spot any conflicts with other compilers, and I verified that preprocessor keyword tokens are not a problem to steal from the user: Compiler Explorer I was a bit worried that repeat wasn’t a reserved identifier, but I don’t think that is an issue. However, we do need to be careful not to steal a pp keyword that one of the language WGs wants to use.

There’s always a chicken-and-egg problem where WG14 wants to standardize existing practice but production compilers are pretty reticent to add experimental language extensions. My recommendation is: do the implementation work in your own fork, and once your fork is in a finished state, try to get it added to Compiler Explorer as one of the variants under Clang (like was done with #embed). Then write the proposal to WG14 to get feedback on the design and a signal that the committee wants the feature. Once the committee has signaled significant support for the feature, then we can start talking about moving the implementation from your fork into Clang to get more implementation and field experience to bring back to WG14. But we want to do that only once WG14 thinks they’re done making modifications to the feature (we do not want to get painted into a corner where we need to support #repeat and e.g. #_Repeat which are subtly different; if we’re going to be the guinea pig implementation, we need some level of assurance we’re not going to be punished for it).

1 Like