Best way to determine if a macro expansion spans a single expression? (ASTRecursiveVisitor)

Consider this code:

#define BAD_EXPANSION (1, 2)
void f(int, int);
void g()

For clang-tidy’s modernize-macro-to-enum check, the macro BAD_EXPANSION should be rejected, but currently it is accepted because we don’t examine macro expansion sites for contextual usage of the tokens in the macro. I’d like to correct this defect.

First of all, I don’t think it’s possible for a macro that expands to an integral constant expression to be used as anything other than an expression, except when the macro contains comma (,) tokens. Is there anything wrong with this assumption?

The check currently puts the tokens through a recursive descent parser to validate that the initializing expression conforms to language rules for a constant expression that is allowed as the initializer for an enumeration. This rules out macros that expand to malformed partial expressions as potential enumerations. The wrapping parentheses in BAD_EXPANSION allow it to be accepted by the parser as a valid initializing expression.

Second, my thinking is that in order to identify the case above, I need to record the SourceRange of all macro expansions and then use a RecursiveASTVisitor to visit the AST to ensure that the entire token sequence is used as an expression node and not partially matching the other tokens required for the syntax of something like a function call expression or a member call expression. If I can restrict this check to only those tokens containing commas, then I can reduce the amount of time spent in the visitor (and possibly skip it altogether if no such macros have been seen).

Am I on the right track, or is there a better way?


One case I can think of:

#define E +1
int foo() { return 1 E; }

Plus all sorts of fun with token concatenation, of course.

I can imagine situations where it’s an expression used within a declaration, but I’m not certain that’s what you’re after. e.g.,

#define DERP 0

void func(int i = DERP);

struct S {
  int : DERP;
  int huh[DERP + 1];

#define CAT(x, y) x##y
#define PASTE(x, y) CAT(x, y)

void PASTE(func, DERP)(void) {

Assuming that’s not the situation you have in mind, you’re correct: an integer constant expression is an expression and those can only show up where an expression is allowed. (I suppose technically you could claim a macro expanded into a vendor-specified attribute argument isn’t necessarily an expression, but that’s splitting hairs, eg: Compiler Explorer)

Ah, yes, this is another case that should be rejected. Here the +1 in the macro definition when parsed in isolation looks like unary operator + but when used in the expansion it becomes half of binary operator +.

With the AST analysis I am proposing, it would be properly rejected because the macro tokens don’t span an entire expression, they span a portion of the expression 1 +1.

int x : DERP; as a bitfield declaration is accepted without error by clang when DERP is an enum, so we’re OK there.

The token pasting one is a problem, but I took care of that by excluding all macros that are used as arguments to other macros. This is a slightly more restrictive than it need be, but it eliminates the possibility of generating incorrect code. There are cases where using the macro as an argument to another macro is perfectly acceptable if the macro is converted to an enum. To enable those cases, I need to do more macro analysis.

If I do the AST matching as I propose here, I don’t think it would be tripped up by the token pasting case, e.g. falsely declare that DERP could be transformed into an enum.

…getting back to the real meat of the question:

Is using a visitor the best way to associate AST nodes with macro expansion ranges?

I can’t think of a particularly better way to do it. Macro replacement lists are just a bunch of tokens rather than anything we could directly model as AST nodes, so I think you basically have to try to do a mapping like that.