Getting the final tokens a macro expands into

Is there a way to get the tokens that a macro ultimately expands into? For example:

#define RP )
#define ID(x) (x
int r1 = ID(1 + 2 RP ); // "ID()" Expands into: "(1 + 2 )"

In the Clang Static Analyzer currently we install a custom token watcher to the Preprocessor, and just collect the tokens that expand to the same location. The problem is with AST dumps/modules, where (AFAIK) clang just lazily loads the nodes without any preprocessing, thus the token watcher wouldn’t help there.

clangd also has some similar token watcher, and collects the tokens into a TokenBuffer in clang/lib/Tooling/Syntax/Tokens.cpp. So the problem is not unique to the static analyzer.

Can’t we recover the sequence of tokens after preprocessing (in other words, without using a token watcher)?
I figured that the SourceManager must already contain this information somehow, right?

(For back reference, this question was prompted by PR #176126).

I’m not sure how we did it (I’d have to ask the team member who did it), but in our out-of-tree clang, we added a listing file and shows macro expansions.

       1  #define RP )
       2  #define ID(x) (x
       3  int r1 = ID(1 + 2 RP );
       E           ( 1 + 2 )
       

The E indicates an expansion line from the line above it.

1 Like

Thank you @JohnReagan .

Macro expansion is a part of OpenVMS listing file implementation in OpenVMS C++ compiler.
OpenVMS listing file is a generated file which shows the module(source file) it is generated from, its modification time, the compiler version, compilation time, the content of main file and its included files ( #include directives, in an expanded form), diagnostic messages, expanded macros, the command line, predefined macros on compilation time, etc…
The listing file generation and inclusion of some parts of it is controlled by command line option and flags.

I want to give few more examples of macro expansion:

1093  #define ADD_ONE(X) ((X) + 1)
1094  #define ADD_TWO(X) (ADD_ONE(X) + 1)
1095  #define ADD_THREE(X) ((X) + 3)
1096  #define ADD_FIVE(X) (ADD_TWO(X) + ADD_THREE(X))
1097  
1098  #if __clang__
   E  #if 1
1099  #define ADD_TEN(Y) (ADD_FIVE(Y) + 5)
1100  int gl = ADD_TEN(10);
   E           ( ( ( ( ( 10 ) + 1 ) + 1 ) + ( ( 10 ) + 3 ) ) + 5 )
  10  #if __cplusplus
   E  #if 201703L
  11  
  12  #if __has_feature(cxx_alignas)
   E  #if 1
   7  #define ADD_TEN(Y) (Y + 10)
   8  #define foo(u) 42
   9  #define bar(x, y) x(y)
  10  #define EMPTYY
  11  #define EMPTY() EMPTYY
  12  #define DEFER(...) __VA_ARGS__ EMPTY()
  13  
  14  int main() {
  15    int f = DEFER(ADD_TEN(bar(foo, -4)));
   E            ( 42 + 10 )
  16    return 0;
  17  }

A shallow description of macro expansion implementation:

I added struct ListingFileGeneratorPPCallback derived from PPCallbacks class. This struct holds a lot of current information needed to generate the listing file, and the macro expansion is a part of it. I added new function virtual void MacroExpandsForListing(StringRef MacroExpansionStr) in the class PPCallbackswhich already has virtual function MacroExpands()`. This new PP callback is registered in CompilerInstance::createPreprocessor() function, which actually creates the Preprocessor, by the member object of class ListingFileGenerator.

The function MacroExpandsForListing() is called in Preprocessor’s static functions CLK_Lexer, CLK_TokenLexer, Preprocessor::HandleMacroExpandedIdentifier, Preprocessor::ExpandBuiltinMacro functions under certain conditions.
New calls to the function MacroExpands() are not added.

The Preprocessor has new member std::string ListingExpansion;which holds currently expanding string. This member is passed to the function MacroExpandsForListing() and is clear()ed right after the call. This function finalizes macro expansion string and writes into the listing file.
The Preprocess has new void AddListingExpansion(StringRef str); which gathers(appends) currently expanding macro/string in many different places, like EvaluateDefined(), EvaluateValue(), EvaluateDirectiveSubExpr(), Preprocessor::HandleMacroExpandedIdentifier(), Preprocessor::ExpandBuiltinMacro(), CLK_TokenLexer().

OpenVMS C++ compiler is not publicly visible, so I can’t point you to our implementation.

1 Like

I don’t think you can recover that later. Clang doesn’t store the final expanded token stream anywhere; macro expansion is done on the fly and the tokens are gone unless you record them during preprocessing. SourceManager only tracks locations, not expanded tokens. That’s why token watchers are needed.

Okay, so if this information is not recoverable, is it okay to extend the module dumps to contain this somehow?