[RFC] Allow recursive macros as extension

kelbon · September 12, 2023, 5:07pm

I propose to add preprocessor directive. All differences between #define and new directive - extension
do not prohibits recursive macro expansion.

Behavior:

extension do not changes any behavior of #define directive
recursion depth is limited, but limit is so huge it will be exceeded only with infinite recursion or with code which must never exist, because of that i do not add option for it
it is conditional to what must #define2 A A do, infinite recursion or replace macro with just A(because other usage is useless), in current implementation its diagnosed as infinite recursion

It’s very small and fits well into the existing macro expansion implementation, so it should be easy to maintain

At this point the implementation is ready, but for now the spelling of the directive needs to be discussed

Motivation and implementation are in the pull request

github.com/llvm/llvm-project

Extension: allow recursive macros

llvm:main ← kelbon:kelbon_recursive_macro

opened 02:59PM - 09 Sep 23 UTC

kelbon

+96 -36

Add 'define2' directive, which works as 'define', but allows recursive macros … Motivation: * There are huge amount of code, which uses code generation/ boiler plate macros /misused terrible templates which is basically 'for each token' or somehow may be solved with this new feature 1. Nlohmann json: https://github.com/nlohmann/json/blob/836b7beca4b62e2a99465edef44066b7401fd704/include/nlohmann/detail/macro_scope.hpp#L320 2. boost preprocessor: https://github.com/boostorg/preprocessor/blob/develop/include/boost/preprocessor/seq/detail/limits/split_1024.hpp 3. boost pfr: (codegen) https://github.com/boostorg/pfr/blob/develop/include/boost/pfr/detail/core17_generated.hpp 4. data_parallel_vector: https://github.com/kelbon/AnyAny/blob/4b056be2b6cbcfa1a407f7ee75279af414e390e4/include/anyany/noexport/data_parallel_vector_details.hpp#L62 * Its easily may be used for what 'magic enum' do, in many cases it can replace reflection ( because many who want reflection actually just want to create a JSON schema without specifying names twice ) * C++20 adds `__VA_OPT__`, which is designed for recursive macros, but there are no such thing in C++! Examples: <details> <summary>fold</summary> ```C++ #define2 $fold_right(op, head, ...) ( head __VA_OPT__(op $fold_right(op, __VA_ARGS__)) ) #define2 $fold_left(op, head, ...) ( __VA_OPT__($fold_left(op, __VA_ARGS__) op) head ) static_assert($fold_right(+, 1, 2, 3) == 6); // error: static assertion failed due to requirement '((((4) + 3) + 2) + 1) == 4' static_assert($fold_left(+, 1, 2, 3, 4) == 4); ``` </details> <details> <summary>reverse token stream</summary> ```C++ #define2 $reverse(head, ...) __VA_OPT__($reverse(__VA_ARGS__) , ) head // works as expected constexpr int A[] = { $reverse($reverse($reverse(1, 2, 3))) }; constexpr int B[] = { 3, 2, 1 }; static_assert(A[0] == B[0] && A[1] == B[1] && A[2] == B[2]); ``` </details> <details> <summary>transform token stream ( literaly for each )</summary> ```C++ #define2 $transform(macro, head, ...) macro(head) __VA_OPT__($transform(macro, __VA_ARGS__)) #define $to_string(tok) #tok, constexpr const char* names[] = { $transform($to_string, a, b) #undef $to_string }; static_assert(names[0][0] == 'a' && names[1][0] == 'b'); ``` </details> <details> <summary>calculate count of tokens</summary> ```C++ #define2 TOKCOUNT_IMPL(head, ...) (1 __VA_OPT__(+ TOKCOUNT_IMPL(__VA_ARGS__))) // works for zero args too #define $tok_count(...) (0 __VA_OPT__(+ TOKCOUNT_IMPL(__VA_ARGS__)) ) static_assert($tok_count() == 0); static_assert($tok_count(1, 2, (4, 5, 6)) == 3); ``` </details> <details> <summary>boost pfr without code generation</summary> ```C++ // placeholders for actual calculations template<typename T> consteval int aggregate_size() { return 3; } constexpr int tie(auto&... args) { return sizeof...(args); } #define2 $try_expand(value, head, ...) \ if constexpr (aggregate_size<decltype(value)>() == $tok_count(+1, __VA_ARGS__)) { \ auto [head __VA_OPT__(,) __VA_ARGS__] = value; \ return tie(head __VA_OPT__(,) __VA_ARGS__); \ } \ __VA_OPT__($try_expand(value, __VA_ARGS__)) constexpr auto magic_get(auto aggregate) { $try_expand(aggregate, _3, _2, _1); } struct abc { int a, b, c; }; static_assert(magic_get(abc{}) == 3); ``` Here magic get expands to (screenshot from clangd builded with this patch) ![image](https://github.com/llvm/llvm-project/assets/58717435/d65c2f4f-12c7-48da-b03e-147791692c64) </details> <details> <summary>infinite recursion macro:</summary> ```C++ #define2 A A // produces 'error: unknown type name 'A'' (expanded to 'A') // A ``` ![image](https://github.com/llvm/llvm-project/assets/58717435/9cc388a1-ecd8-4577-856f-313c11669999) </details>

cor3ntin · September 15, 2023, 10:13am

I’ve been thinking about how we could make this work this better as an extension.
#define2 is not a great name and having multiple ways to define macro functions seems a bit invasive and confusing.

And the only thing #define2 does is not expand its name.

So I think there is another design option that is a bit easier to justify as an extension and is probably easier to implement.
We can introduce a magic preprocessor identifier, eg __CURRENT_MACRO__ (name subject to bikeshedding of course) that, in a macro, would denote that macro

Adapting your example:


#define FOLD_RIGHT(op, head, ...) ( head __VA_OPT__(op __CURRENT_MACRO__(op, __VA_ARGS__)) )
static_assert(FOLD_RIGHT(+, 1, 2, 3) == 6);

that way we could model __CURRENT_MACRO__ on __VA_ARGS__ and __VA_OPT__ in that it would only be meaningful in function-like macros.
Making it testable with __has_extension would improve portability for now.

It’s a design that is also probably to standardize in WG14/WG21 too (because it has a smaller surface area)

I’d like other vendors opinion too!

But overall, I think this is a problem worth solving!

cor3ntin · September 15, 2023, 10:18am

@reinterpretcast @AaronBallman @shafik

kelbon · September 15, 2023, 10:40am

But if there are second layer or recursion(two macro at once…), then __CURRENT_MACRO__ will be not expanded? And it will break recursion, but in many cases it might work.
I like idea, also i want behavior of # __CURRENT_MACRO__(args) and ## __CURRENT_MACRO__ (args) to be expanded always, behavior of the C preprocessor in these places is annoying and forces to create many helper macros
I hope it won’t be impossible to implement

AaronBallman · September 15, 2023, 1:04pm

Thank you for the RFC! This is an interesting idea

I have some high-level concerns:

The name define2 conveys nothing to the user as to how the feature works; we’ll have to pick a more descriptive name at some point.
I think macro expansion behavior should be predictable for the user, and I think we lose that property with this specific design. The user using a macro now has to know whether that macro was defined with #define or with #define2 to understand the expansion properties of the macro, and that’s a pretty high cognitive burden for users. The suggestion from @cor3ntin helps in this regard, but another approach that might work (or might be a terrible idea) would be to wrap the macro name with a directive at the point you want to force recursive expansion. e.g., recursively_expand(MACRO) (where recursively_expand is a preprocessor operator).

Also, we have a list of criteria for adding extensions to Clang. The items I have concern with are:

Evidence of a significant user community. Macros have existed in C for a long time and you can achieve non-infinite recursion with macros using the existing preprocessor functionality (and there are libraries which help you with this, such as P99). I’d appreciate more details on why these libraries are insufficient and this requires a language feature; infinite recursion is not possible (that’s why we have to add a recursion depth limit) and I believe it is rarely necessary, so this seems like a very specific feature for a pretty uncommon problem.
A specification. The preprocessor has some very curious properties that have led to implementation divergence over the years; we should nail down the behavior of any new preprocessor extension so that it’s clear how it behaves. This also helps with the next part…
Representation within the appropriate governing organization. We expect the preprocessor to remain broadly the same between C and C++ and the standards committees typically ask for that as well. So this feature will need some sort of proposal to both WG14 and WG21. That’s a tall order, but I think it’s critical if this extension is to be adopted by users – preprocessor differences between compilers can be a source of pain for users, and standardization helps to avoid that pain. I’ve not seen a proposal like this in my time on the committees, and I can’t find evidence that someone else has already proposed this idea. Getting feedback from the committees can be tricky though – WG14 (the C committee) wants to see implementation experience, but as implementers, we do not want to implement an extension to the preprocessor and have the committee(s) “tweak” the design such that we break users, so we want some sign from the committee that the design approach is correct. I think starting a high-level discussion on both the committee reflectors could at least kick-start getting that design feedback. But even that is tricky because ISO has been far more strictly enforcing their rules about who can participate in standardization. I think we should circle back to figure out how best to interface with the standards committees once we think we’ve got a roughly final design for the feature.

cor3ntin · September 15, 2023, 1:44pm

I’d like to push back on that.
If we assume recursion in macro is useful - and i think the original PR had some examples, along with the mere existence of boost PP, P99, and similar facilities… - I had needed that a few times even though i try to limit my use of the preprocessor - then I think “to call a macro recursively you need a library”, is a tough sale.
But assuming you find a library with a suitable license and it gets blessed to be included in your company code base, or you reimplement it yourself, you end up with 2 issues:

The best interface you can get is APPLY(F, args) which kind of works but it would be more natural for F(args) to work
The libraries that exist work by generating (either manually or through a script in python, perl, cmake, etc) a list of N macros, N being how much nested calls they want to support.
That produces unnecessarily large files that need to be pre-processed, unnecessarily causing the
compiler to do more work and leading to bad diagnostics if there is an error in your pile of macros.

So the feature does sound well motivated to me, but i agree the design needs more explorations and starting a conversation with the different vendors and committee seems like the best next step.

AaronBallman · September 15, 2023, 2:01pm

Okay, I can see that logic, thank you! There’s a natural tension between “you can do this already and in fact people have provided libraries to do it” and “why not make this part of the language?” and the line is a bit fuzzy. I guess I see folks wanting to do less macro programming in C++ (e.g., C++'s treatment of macros in modules), so I tend to think the bar is higher here because this is in the preprocessor. However, existence of those libraries is a sign of a need in practice.

kelbon · September 15, 2023, 2:15pm

I don’t think user will be required to know more about define2/ define than now, he can think ‘its just somehow works, may be they generated 100’000 lines by Python script’

I will list the problems with the current state of recursion in preprocessor:

complicates implementation, this forces user to create worse solution, which will be less readable or less usable
if you use a library like boost PP or P99 first you need to learn how to use this library, and this can be very difficult, because such an implementation complicates the interface too
generated recursion usually very limited in numbers,10 to 100 or something like
it greatly increases the volume of the source code and increases the likelihood of errors.
Imagine, you forgot to remove ‘,’ in one line, or the script generator inserts an extra comma in one edge case, when and how you will find this error?
if you use script to generate code, then this script is part of your project now, because if you want to change implementation in future you need to change this script

tahonermann · September 15, 2023, 3:21pm

I agree with @AaronBallman that having two kinds of function-like macros could be a source of surprise and confusion. That being said, there are languages that support multiple kinds of macros with success; e.g., GNU make and its simply expanded and recursively expanded variables.

I’m more inclined towards solutions like those suggested by @cor3ntin and @AaronBallman, assuming they suffice for the desired use cases.

cor3ntin · September 15, 2023, 3:30pm

It might be useful to look at the features offered by the various Preprocessor libraries like PP99 and boost PP and survey how the proposed recursive macro would simplify/replace these features, it might help informing the design.

MarcusJohnson91 · September 16, 2023, 9:07pm

I think a better approach is to redefine macros with the _Pragma operator, my code isn’t complete, but feel free to peruse it, it’s in the _Pragma(redefine_macro) branch, and it’s written in the spirit of _Pragma(push_macro/pop_macro)

As for the recursive expansion aspect, I’m working on a directive #repeat that allows expansion a specified number of times, it’s in the #Repeat branch of my LLVM fork.

My primary use case is in registering test cases and test suites in C, and also to get around the issues with ‘ COUNTER’

Opinions? @tahonermann @cor3ntin @AaronBallman

Github fork here: GitHub - MarcusJohnson91/llvm-project: The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. Note: the repository does not accept github pull requests at this moment. Please submit your patches at http://reviews.llvm.org.

kelbon · September 24, 2023, 2:12pm

Design changed from new preprocessor directive to special token __THIS_MACRO__, which means recursive macro call.
Its enabled only in function-like macros, its ready, all changes in linked pull request

cor3ntin · September 26, 2023, 12:20pm

I understand there are complications with a vendor extending the preprocessor.

However, the proposed changes are fairly minimal, can help remove a lot of cruft for people who only use clang, and do not take any design space beside a reserved identifier
(we could reduce that concern furger by picking a name such as __CLANG_THIS_MACRO__ or something along those lines - but that would reduce the chance of an adoption by other vendors, so not ideal).

Given that we are not going to solve the chicken / egg problem without laying something first, I’m personally in favor of pursuing that change.

I see two outcomes:

Either this proves to be a success, gets adopted by more vendor, at which point WG14 might adopt it, or something similar
Either it’s not and we can either keep around a small amount of code, or deprecate the magic identifiers over a few release.

It is surprising that i would be in favor of compiler extensions, but I’m unconvinced that WG14 would be able to do targeted improvements to the preprocessor without implementation experience. This change is somewhat similar in scope to __VA_OPT__ which i think was well received.

On the recursion limits concerns, I’m not convince it’s new, it’s possible to use, eg boost PP in a way that creates an unreasonable amount of expansion.
We could and probably should consider an (user configurable) limit to how many expansions can be done by a top level macro, independently of this change.

AaronBallman · September 26, 2023, 3:38pm

I reached out to the author of P99 to ask his opinions on the proposal here. My paraphrasing of his response is:

Making recursive macros less complicated is a good move, the code in P99 to handle this is pretty fragile.
__VA_OPT__ is good precedent.
What design to go with is much less clear (whether it’s a new directive or a new preprocessor keyword, etc).
The current design is more restrictive than general recursion because it does not allow for complicated patterns with implicit recursion through several layers of function-like macros.
We should be sure to keep the broader context in mind: how much of this do we need? Would a feature that does iteration over a finite set be more appropriate?

My take is:

I agree there’s a chicken and egg problem with WG14 and implementations likely have to move first to get WG14 to standardize anything. So far, the discussion on the reflectors has been underwhelming, potentially because we’re still in the middle of balloting and so people are not supposed to discuss changes to the standard during that time. We might get more feedback after the C23 ballot closes, but I think we’d need an actual paper in front of the committee to get significant feedback. One thing that would help here is some coordination with GCC developers on the extension; they don’t have to be willing to adopt it on our timeline, but it would be good to know up front whether they have some amount of buy-in with the design or are opposed to it. That makes the standards paper more compelling as well because there’s more prior art to point to (even if GCC doesn’t implement it, some public show of support helps).
I agree that we need to keep the broader context in mind. Some of this boils down to motivating examples where we can take existing code and change it to use the new model to demonstrate what cases work and what cases won’t work. But some of this is also design-level: given that we know we cannot do truly infinite recursion (we have to have some recursion limitations as a compiler limit), would it instead be a more portable and useful feature to let the user specify the recursion limits as part of the preprocessor feature so that the behavior is then portable across compilers without needing to rely on command line switches? (Maybe that’s a bad idea for other reasons, however.)
I like the __THIS_MACRO__ form better than the #define2 form because it’s a syntactic marker in code at the point of expansion (you don’t have to look at how the macro was defined to understand its expansion behavior). I do wonder how well this design works with mutual recursion though.

kelbon · September 26, 2023, 4:34pm

Mutual recursion was first thing i thinking about before change define2 → __THIS_MACRO__, but I couldn’t think of a scenario in which this problem would occur.

would it instead be a more portable and useful feature to let the user specify the recursion limits as part of the preprocessor

Because of that, i think best user strategy to compile all possible code - set recursion depth to max possible, but if so, why we dont set it to max without new command-line option?

tahonermann · September 26, 2023, 8:04pm

I definitely agree with that, but I think naming still needs some work. __THIS_MACRO__ appears to me to conflate two features; 1) the ability to name the current macro without knowing its name (often desired for generic code), and 2) the ability to recursively expand a macro.

I’m not convinced that this approach solves a substantial set of use cases for recursively expanded macros since it doesn’t support recursive expansion of a macro other than the current one. In other words, it can’t handle structural nesting operations. Consider the following example (which I hope I got right).

#define PROCESS_ELEM(X) X
#define PROCESS_LIST(...) __VA_OPT__(PROCESS(__VA_ARGS__))
#define PROCESS_NEXT(X, ...) PROCESS_##X __VA_OPT__(, PROCESS_NEXT(__VA_ARGS__))
#define PROCESS(X, ...) { PROCESS_##X __VA_OPT__(, PROCESS_NEXT(__VA_ARGS__)) }
struct S {
  int dm1;
  struct {
    int dm1, dm2;
  } dm2;
  int dm3;
};
S s = PROCESS(ELEM(0), LIST(ELEM(1), ELEM(2)), ELEM(3));

The intent is that the last line expand to (ignoring white space concerns):

S s = { 0, { 1, 2 }, 3 };

The proposed __THIS_MACRO__ would suffice to allow recursion for the case where PROCESS_NEXT recursively invokes itself, but it doesn’t suffice to address the case where PROCESS is recursively invoked during the expansion of PROCESS_LIST. A preprocessor operator that enables unconditional (recursive or non-recursive) expansion of the macro name it is applied to would suffice to address this use case.

Using @ as a placeholder for operator syntax, the macros above could then be defined as follows and would produce the intended output for the example above.

#define PROCESS_ELEM(X) X
#define PROCESS_LIST(...) __VA_OPT__(@PROCESS(__VA_ARGS__))
#define PROCESS_NEXT(X, ...) @PROCESS_##X __VA_OPT__(, @PROCESS_NEXT(__VA_ARGS__))
#define PROCESS(X, ...) { @PROCESS_##X __VA_OPT__(, @PROCESS_NEXT(__VA_ARGS__)) }

kelbon · September 26, 2023, 9:29pm

Thats a good example, but simplest possible solution - just mark all macros with token __THIS_MACRO__ as recursive (allow expand always), it also will simplify implementation

tahonermann · September 26, 2023, 9:47pm

I would prioritize providing a general solution that addresses more use cases over ease of implementation; particularly when the effort required might not be much greater (which I suspect is the case).

I might be wrong, but I don’t think the example I provided can be implemented using the __THIS_MACRO__ approach. If you believe differently, please show how you would write it.

kelbon · September 27, 2023, 5:21am

To support such cases, there is nothing left to do but allow the macro to expand recursively whenever it is mentioned. After this, all that remains is to decide by what criterion we should select macros for which this is allowed
In the original version, this tag was a new preprocessor directive, but this had its drawbacks

But the special token __THIS_MACRO__ fits this role quite well. With this we simultaneously solve several problems:

it is quite obvious to the user what is happening (__THIS_MACRO__ accurately reflects the intention to do something recursive)
we do not introduce new directives and completely reuse the macro mechanism, without adding any complexity to support this in other parts (such as clangd)

But we need to decide what to do with such declarations

#define A(...) __THIS_MACRO__() A()

I would diagnose using name of macro in macro which uses __THIS_MACRO__ as a error for removing ambiguity

tahonermann · September 27, 2023, 5:11pm

I don’t agree. As previously indicated, we have the option of adding a preprocessor operator to opt-in to a recursive expansion; e.g., the use of @ exhibited at the end of my earlier comment.

Perhaps it is helpful to think of this operator like eval in some other languages. In fact, perhaps it would be more useful to use an operator that supports delimiters. Today, we sometimes have to introduce additional macros just to force an expansion:

#define DO_STRINGIFY(X) #X
#define STRINGIFY(X) DO_STRINGIFY(X)
#define TEXT blah
STRINGIFY(TEXT) // expands to "blah", not "TEXT"

With an eval operator, perhaps STRINGIFY could instead be written:

#define STRINGIFY(X) #__EVAL__(X)

Going back to my previous example, the PROCESS related macros could be defined as:

#define PROCESS_ELEM(X) X
#define PROCESS_LIST(...) __VA_OPT__(__EVAL__(PROCESS(__VA_ARGS__)))
#define PROCESS_NEXT(X, ...) __EVAL__(PROCESS_##X) __VA_OPT__(, __EVAL__(PROCESS_NEXT(__VA_ARGS__)))
#define PROCESS(X, ...) { __EVAL__(PROCESS_##X) __VA_OPT__(, __EVAL__(PROCESS_NEXT(__VA_ARGS__))) }

Topic		Replies	Views
Rewriter: expand macro to definition Clang Frontend	2	150	August 17, 2017
Clang Preprocessor Speed Up LLVM Dev List Archives	23	95	July 6, 2016
Preprocessor Macros Parser Clang Frontend	0	73	May 18, 2016
[preprocessor] How to customize preprocessor directive handling Clang Frontend	0	132	July 16, 2013
warning: disabled expansion of recursive macro Using Clang	0	137	August 5, 2014

[RFC] Allow recursive macros as extension

Related Topics