I have seen requests for help using the Rewriter, and a number of times there are problems with macro expansions, especially when performing things like instrumentation. Would it make sense (or even be feasible) to add an option to the Rewriter to automatically expand a macro when a rewrite happens with a SourceLocation that is within the macro? It seems like there are times with other refactoring when this could be a useful option.
It sounds like useful functionality. We don't store whether an
identifier is an expanded macro or what it expanded to in any
convenient way, though, so it would be a pain to implement.
I investigated this over the weekend and came to a similar conclusion. I have a student currently working on a code reformatting tool who wants to be able to see, from libclang, if a macro expansion contains open or close braces. I'd assumed that this would be something easy to expose, but it seems that we don't actually have any way of finding the sequence of tokens generated by a macro expansion (this is generated by the preprocessor, but not stored anywhere). Even the HTML Rewriter, which (given the output in the static analyser) I assumed would already have code for doing it contains a half-implemented duplication of the macro expansion logic.
If someone's looking for a project, then factoring the macro expansion code out so that it could be rerun (the current code is destructive) would be very helpful. It would also improve diagnostics a lot if you could say exactly what the macro expansion was, not just the chain of macros that caused it.
We have investigated this possibility in past (see
but we didn't find a suitable solution that avoid the veto about making
Preprocessor slower in non negligible way.
Recently I've thought about a possibility that should have a minimal impact:
- suppose that the last two tokens preprocessed have respectively as
location Loc1 and Loc2
- if Loc1 and Loc2 come from the same FileID (i.e. their spelling
location are consecutive in source) nothing happens (the case
statistically far more frequent), otherwise a callback is invoked
passing Loc1 and Loc2 as arguments
- the program using clang library can implement such callback so to
store the locs in a jump table (a DenseMap)
When preprocessed token sequence is needed, ordinary relexing is used,
but using the jump table when we reach a location present in such table.
This permits not only to known the exact preprocessed token stream but
also to have every detail about every single token expansion in the
I hope that this time we obtain a general consensus about adding this so
important missing feature.