[RFC] Allow recursive macros as extension

urnathan · October 2, 2023, 8:09pm

my 2c.
One issue with the current solutions of recursive expansion (eg GitHub - swansontec/map-macro: A recursive C preprocessor macro which performs an operation on each element of a list) is that diagnostics from the midst of the expansion are unwieldy – the ‘expanded from’ traceback is large and essentially meaningless. Thus, a compiler extension seems desirable just to make that better.
Something along @cor3ntin, @AaronBallman or @tahonermann’s suggestions seem the way to go. The use cases for recursive expansion are, I think, fairly well constrained, as essentially repeatedly applying a macro over a variadic argument list. As such, something that smells similar to VA_OPT and VA_ARGS? I don’t think the full generality of an EVAL is waranted – something that permits the rexpansion of a specific macro.
(If there are other uses, it would be good to be clear what they are.)

AaronBallman · October 12, 2023, 2:11pm

Thank you everyone for the good discussion on this RFC, and especially thank you to @kelbon for continuing to update the patch as thinking evolves (note, feel free to hold off on future changes until we’ve come closer to finalizing the design in the RFC). I’d like to make sure we come to a conclusion on the RFC, or at least keep the discussion heading towards one.

I think I see sufficient support for the RFC to continue the efforts to solve the problems in this general space, so this RFC isn’t rejected. However, I’m not seeing clear consensus on the design and there are some process-related tasks to keep in mind, so it’s not quite accepted yet either. My reading of the thread is:

There’s not support for #define2 but there is support for __SOME_SPECIAL_IDENT__ or __SOME_SPECIAL_OPERATOR__(X). I think @tahonermann and @urnathan are advocating for an operator or function-like macro interface (though there’s disagreement about how much generality is warranted), I think @cor3ntin is advocating for a special identifier along the lines of the current form of the patch. We need to nail down whether we want this to be an operator-like feature or whether we want it to be a special placeholder to represent the macro being defined.
Once we’ve nailed the design down more, we need some sort of formal description of the language feature and how it works. This description will wind up in the documentation, but it will also help with the next process bit. This specification doesn’t need to be to the level of precision for adding it to the standard, but it should be sufficiently detailed that another implementation can use it to help them add the extension to their product.
We’ve extended the preprocessor in the past for things like __has_builtin or for compatibility with other compilers, but this feature is a bit different in that it’s more novel language design (things like __has_builtin are bog standard function-like macros, so they’re more of a minor extensions than a significant one, in some ways). Because of that, I think we really do want this to be presented to a standards committee. I don’t think we need to block acceptance of the functionality on the committee, but I do think we should consider putting the feature behind an opt-in -f flag. If we introduce a new operator using a punctuator like the @ suggestion earlier, we definitely need a feature flag because we don’t want to close off design space from the committees. If we introduce it with a reserved identifier, the feature flag is a bit less necessary, but may still be worth it. This helps users to know that they’re using an extension that’s definitely not portable and it also gives us a bit of an escape hatch should one of the committees ask for breaking changes to the feature.

I think the next steps for this RFC should be to make decisions on the form of the feature. Personally, I do not have strong opinions on __SPECIAL_IDENT__ vs __SPECIAL_OPERATOR__(X) but I lean towards an operator form because it gives a not-so-subtle syntactic latch for programmers to see “this is something other than the usual macro expansion” which feels like an important property for code readability.

kelbon · October 12, 2023, 5:42pm

I think it’s important to note that a special token such as __THIS_MACRO__ affects localy on behavior of macro which is defined, but function-like thing like __EXPAND__ or __EVAL__ may indirectly break some other macros, which are used in there and i, as user, definitely do not want debug such cases

// yes, its UB, but it used (i saw such code)
#define bool bool
#define A  __EXPAND__(bool x = true)
A // infinite recursion?

And, i think, function-like solution proposed mostly because of annoying expand behavior of # and ## in C preprocessor, which require define additional macros only for expanding, but we live with this > 20 years and its solvable. Annoying, but solvable. On other way function-like solution require much more wording, as for me it looks more like:

#define A __PREPROCESSOR_RULES_2.0__(abc)

And i don’t like such huge changes with small profit

Also i will try write some ‘wording’ and post it on [std-proposals]

tahonermann · October 12, 2023, 8:30pm

I don’t think there is a disagreement. I suggested the eval possibility, but I’m uncertain that it is a good idea. I would be quite content with an operator that only applies to the next token-ish.

I don’t see any reason that example would lead to infinite recursion. A is only expanded once. I would expect bool to be expanded twice; once before __EXPAND__ is evaluated, and then again as part of that evaluation (during which the substituted bool token would be eligible as a macro name again). I would not expect __EXPAND__ to keep expanding until no substitutions occur.

Operator precedence needs to be fully specified with any solution. In other words, it needs to be well understood at which point the recursive expansion occurs relative to use of the existing operators. It would be helpful to have a sense of where/how [cpp.replace.general] would be modified.

cor3ntin · October 13, 2023, 8:45am

I should point out that the C++ committee has a meeting in early November.
If a paper materializes (I can’t commit to help write the paper, sorry), I might be able to present it.

Another possibility would be to present a paper to the C/C++ liaison group such that members of both committees are present.
I think you might be able to attend such a meeting - they are held on telecons (somewhat irregularly at the moment)

tahonermann · October 13, 2023, 2:34pm

I like the idea of presenting a paper to the SG22 C/C++ liaison group. That seems like a great place to start! Like Corentin, I can’t commit to helping to write a paper other than to provide simple guidance on the process and expectations, but I would be happy to review one!

urnathan · October 13, 2023, 5:48pm

Thanks for the summary. I’m tending much more towards:

operator syntax
restricted reach of the recursion

As such the minimal solution I can think of looks somethng like:

#define MAP(FN, ...) __VA_OPT__(MAP_(FN, __VA_ARGS__))
#define MAP_(FN, A, ...) FN(A) __VA_OPT__(__RE_EXPAND__(FN, __VA_ARGS__))

The semantics are that __RE_EXPAND__ applies the current macro to its argument list (in the usual way, those arguments are macro expanded, before invoked macro expansion). One doesn’t get a choice of which macro might be recursively expanded – it has to be the current one. I think this will have all the syntactic and semantic restrictions & scoping of __VA_OPT__, which is also tied to a specific macro expansion.

This is very localized behaviour. It won’t break macros that rely on the non-reexpansion mentioned above – which I have seen in the wild.

Does this meet the OP’s needs? It meets the one I have in mind, which is like the above example.

tahonermann · October 13, 2023, 8:05pm

Does this meet the OP’s needs? It meets the one I have in mind, which is like the above example.

I might not be understanding your explanation correctly, but it looks to me like it doesn’t meet the needs for the PROCESS example I posted. I would very much like that to be possible; workarounds are difficult.

One doesn’t get a choice of which macro might be recursively expanded – it has to be the current one.

What do you mean by “the current one”? MAP_ in that example? While that behavior would indeed be localized, it seems quite limiting to me.

kelbon · October 13, 2023, 8:36pm

I would very much like that to be possible; workarounds are difficult.

It works with current solution with __THIS_MACRO__, your example part of tests

tahonermann · October 16, 2023, 6:59pm

I don’t see how. Could you please demonstrate how you would write those macros in terms of __THIS_MACRO__?

kelbon · October 16, 2023, 7:17pm

Definition here, key idea is just to disable this rule [cpp] for names of functional macros which definition contains __THIS_MACRO__ token.
Yes, its simple and now i thinking about desired and expected behavior in edge cases, but … it work

kelbon · October 19, 2023, 8:32pm

Paper materialized somehow:

tahonermann · October 19, 2023, 9:25pm

It isn’t clear to me that what you are describing actually does work for the example I provided. Note that the particular test case doesn’t include a recursive expansion of PROCESS_LIST. What does your implementation produce for the following test case?

PROCESS(LIST(ELEM(0), LIST(ELEM(1), LIST(ELEM(2), LIST(ELEM(3))))))

The intended result is (modulo spaces):

{ 0, { 1, { 2, { 3 }}}}

kelbon · October 20, 2023, 7:41am

Yes, you are right, this case with implementation from example will not expand to provided output, because other macro do not declared as ‘recursive’.

And there are possible way in current solution if you need that:`
tag macro as recursive, iit will look like this(may seem contradictory, but it seems flexible)

#define TAG(...)
#define PROCESS_ELEM(X) X TAG(__THIS_MACRO__)
#define PROCESS_LIST(...) \
__VA_OPT__({PROCESS(__VA_ARGS__)}) TAG(__THIS_MACRO__)
#define PROCESS(X, ...) \
PROCESS_##X __VA_OPT__(,__THIS_MACRO__(__VA_ARGS__))

And this will expand

PROCESS(LIST(ELEM(0), LIST(ELEM(1), LIST(ELEM(2), LIST(ELEM(3))))))

into {0 ,{1 ,{2 ,{3 } } } }

tahonermann · October 20, 2023, 3:26pm

Hmm, that is clever, but unintuitive since the use of __THIS_MACRO__ affects how the preceding tokens are expanded. I still prefer a solution that provides more control over which parts of a replacement list are eligible for recursive expansion. My intuition is that, when recursive expansion is desired, it is also desired for all macro names that appear in the expansion. Can anyone offer a test case that exemplifies when such recursive expansion would not be desired?

Some suggestions for the paper:

I think the possibility of infinite recursion needs to be addressed. Perhaps include the following examples. I think there are three possible approaches. 1) undefined-behavior, 2) implementation limits, or 3) ill-formed (this would probably require some additional constraints somehow).

#define INFINITE_RECURSION() __THIS_MACRO__()
INFINITE_RECURSION()

#define TAG(X)
#define CO_DEPENDENT1() CO_DEPENDENT2() TAG(__THIS_MACRO__)
#define CO_DEPENDENT2() CO_DEPENDENT1() TAG(__THIS_MACRO__)
CO_DEPENDENT1()

I suggest adding Nathan’s MAP and my PROCESS examples to the motivation section of the paper with a presentation and explanation of the TAG idiom.

kelbon · October 20, 2023, 5:06pm

I forget about recursion in proposal, i will add it (implementation-defined value, may be add it in limits section in standard)

Infinite recursion addressed in pull request, here tests:

github.com/llvm/llvm-project

Extension: allow recursive macros

llvm:main ← kelbon:kelbon_recursive_macro

opened 02:59PM - 09 Sep 23 UTC

kelbon

+123 -33

Add 'define2' directive, which works as 'define', but allows recursive macros … Motivation: * There are huge amount of code, which uses code generation/ boiler plate macros /misused terrible templates which is basically 'for each token' or somehow may be solved with this new feature 1. Nlohmann json: https://github.com/nlohmann/json/blob/836b7beca4b62e2a99465edef44066b7401fd704/include/nlohmann/detail/macro_scope.hpp#L320 2. boost preprocessor: https://github.com/boostorg/preprocessor/blob/develop/include/boost/preprocessor/seq/detail/limits/split_1024.hpp 3. boost pfr: (codegen) https://github.com/boostorg/pfr/blob/develop/include/boost/pfr/detail/core17_generated.hpp 4. data_parallel_vector: https://github.com/kelbon/AnyAny/blob/4b056be2b6cbcfa1a407f7ee75279af414e390e4/include/anyany/noexport/data_parallel_vector_details.hpp#L62 * Its easily may be used for what 'magic enum' do, in many cases it can replace reflection ( because many who want reflection actually just want to create a JSON schema without specifying names twice ) * C++20 adds `__VA_OPT__`, which is designed for recursive macros, but there are no such thing in C++! Examples: <details> <summary>fold</summary> ```C++ #define2 $fold_right(op, head, ...) ( head __VA_OPT__(op $fold_right(op, __VA_ARGS__)) ) #define2 $fold_left(op, head, ...) ( __VA_OPT__($fold_left(op, __VA_ARGS__) op) head ) static_assert($fold_right(+, 1, 2, 3) == 6); // error: static assertion failed due to requirement '((((4) + 3) + 2) + 1) == 4' static_assert($fold_left(+, 1, 2, 3, 4) == 4); ``` </details> <details> <summary>reverse token stream</summary> ```C++ #define2 $reverse(head, ...) __VA_OPT__($reverse(__VA_ARGS__) , ) head // works as expected constexpr int A[] = { $reverse($reverse($reverse(1, 2, 3))) }; constexpr int B[] = { 3, 2, 1 }; static_assert(A[0] == B[0] && A[1] == B[1] && A[2] == B[2]); ``` </details> <details> <summary>transform token stream ( literaly for each )</summary> ```C++ #define2 $transform(macro, head, ...) macro(head) __VA_OPT__($transform(macro, __VA_ARGS__)) #define $to_string(tok) #tok, constexpr const char* names[] = { $transform($to_string, a, b) #undef $to_string }; static_assert(names[0][0] == 'a' && names[1][0] == 'b'); ``` </details> <details> <summary>calculate count of tokens</summary> ```C++ #define2 TOKCOUNT_IMPL(head, ...) (1 __VA_OPT__(+ TOKCOUNT_IMPL(__VA_ARGS__))) // works for zero args too #define $tok_count(...) (0 __VA_OPT__(+ TOKCOUNT_IMPL(__VA_ARGS__)) ) static_assert($tok_count() == 0); static_assert($tok_count(1, 2, (4, 5, 6)) == 3); ``` </details> <details> <summary>boost pfr without code generation</summary> ```C++ // placeholders for actual calculations template<typename T> consteval int aggregate_size() { return 3; } constexpr int tie(auto&... args) { return sizeof...(args); } #define2 $try_expand(value, head, ...) \ if constexpr (aggregate_size<decltype(value)>() == $tok_count(+1, __VA_ARGS__)) { \ auto [head __VA_OPT__(,) __VA_ARGS__] = value; \ return tie(head __VA_OPT__(,) __VA_ARGS__); \ } \ __VA_OPT__($try_expand(value, __VA_ARGS__)) constexpr auto magic_get(auto aggregate) { $try_expand(aggregate, _3, _2, _1); } struct abc { int a, b, c; }; static_assert(magic_get(abc{}) == 3); ``` Here magic get expands to (screenshot from clangd builded with this patch) ![image](https://github.com/llvm/llvm-project/assets/58717435/d65c2f4f-12c7-48da-b03e-147791692c64) </details> <details> <summary>infinite recursion macro:</summary> ```C++ #define2 A A // produces 'error: unknown type name 'A'' (expanded to 'A') // A ``` ![image](https://github.com/llvm/llvm-project/assets/58717435/9cc388a1-ecd8-4577-856f-313c11669999) </details>

urnathan · October 20, 2023, 5:26pm

It would indeed be MAP_ in that example. The implementation would, I guess, emit some special token refering to MAP_ and forcing expansion – remember, the emitted tokens are subject to (re)macro expansion. As opposed to (say) an explicit mention of MAP_ in MAP_'s own definition.

cor3ntin · October 24, 2023, 8:08am

I mentioned that before but i don’t think we should do something specific about recursion or this proposal specifically.
Instead a general limit on the number of tokens a macro can expand to seem that it should exist - or, the number of preprocessed token in a TU, both would have similar effects - which already exists in practice. The standard already says that its list of implementation limits is non exhaustive.

Making it UB would go against the current effort of removing UB from pre processing, and making it ill-formed (ie setting a hard limit in the standard) would prevent us to provide a flag to control the limit, which is something we do for some limit.

tahonermann · October 24, 2023, 5:09pm

The standard already specifies minimum limits for the number of macro parameters ([implimits]p2.13) and the number of macro arguments ([implimits]p2.14). Likewise, the standard specifies minimum limits for recursive constexpr function invocations ([implimits]p2.38) and recursively nested template instantiations ([implimits]p2.41). Why would a similar minimum limit not be appropriate for recursive macro expansions? Placing limits on the number of tokens produced by a macro expansion or on the total number of preprocessing tokens produced for a TU would not address the examples I provided since those examples recurse, but never contribute any tokens.

kelbon · November 5, 2023, 8:11am

I should point out that the C++ committee has a meeting in early November

Is the paper I provided enough?

Topic		Replies	Views
Clang Preprocessor Speed Up LLVM Dev List Archives	23	317	July 6, 2016
Switching terminology from 'instantiation' to 'expansion' for macros? Clang Frontend	11	164	July 11, 2011
Macro expansion weirdness (or bug?) Clang Frontend	10	129	March 26, 2010
[RFC] New Preprocessor macro directive: #Repeat Clang Frontend	7	549	March 17, 2023
[PATCH] Automatic detection of compatibility macros for non-portable diagnostics Clang Frontend	12	157	August 7, 2012

[RFC] Allow recursive macros as extension

Related topics