Determining macros used in a function or SourceRange (using clang plugin)

Hi,

I have a project that I've been working on to determine dependencies in code and mock/stub them. Mostly I'm using a recursive visitor to datamine these things. I would really like to be able to figure out what macros are used in a function as well (or SourceRange would be even more helpful.) Just to be clear I'm doing this as a clang frontend plugin (currently stopping at syntax check). I know that I can capture these PPCallbacks and store away their SourceLocations etc during preprocessing, but based on what is done in error messages I would imagine that the information is there somewhere already... I'd rather not reinvent the wheel if something else will work already.

For example:

#define BAR "%s"
#define FOO(x) printf( BAR, x )
#define DOY 3
void func1() {
     const char *lvar="func1";
     FOO(lvar);
}
void func2() {
     printf("%d", DOY);
}

What I want to be able to do is isolate func1() and determine that FOO and BAR were both used and what macro they were pointing to at the time they were used. What I'm hoping is that there is way to iterate the SourceRange to find where the text pieces came from (with enough info to know what the macro name/definition was). I didn't see an obvious way to do this. Then again, maybe I'm going down the wrong path and there's an entirely easier direction?

Anyhow, even some hints in the right direction would be greatly appreciated.

Thanks in advance,

    -Eric

Hi Eric,

Yes, you can iterate the source range of a declaration like a function. But I don’t think you need to, as you can iterate macro expansions and check if they were expanded inside the source range of the function. The following piece of code illustrates this approach (assuming you have a declaration Decl and a source manager reference SM):

SourceLocation Start = Decl->getLocStart();
SourceLocation End = Decl->getLocEnd();
for (unsigned I = 0, E = SM.local_sloc_entry_size(); I != E; ++I) {
const SrcMgr::SLocEntry &Entry = SM.getLocalSLocEntry(I);
if (Entry.isExpansion() && Entry.getExpansion().isMacroBodyExpansion()) {
SourceLocation Loc = Entry.getExpansion().getExpansionLocStart();
if (Start < Loc && Loc < End) {
// This Macro expansion is inside the body of the function (Decl).
}
}
}

You can get the name of the macro that is expanded inside a function by looking at the first token that’s at the location ‘Loc’ i.e. at the starting location of the expansion.

I hope that helps,
Alex

Thanks Alex,

That gets me mostly there. Pardon if that is a dumb question, but I'm not sure how I go from a SourceLocation to a Token. I have not worked at all in the preprocessor levels before.

    -Eric

PS - Is there a good page somewhere that outlines how these layers hang together? Just don't wanna take up time if there's docs somewhere.

Thanks Alex,

That gets me mostly there. Pardon if that is a dumb question, but I'm not
sure how I go from a SourceLocation to a Token. I have not worked at all
in the preprocessor levels before.

Something like this should work:

    StringRef getToken(SourceLocation BeginLoc, SourceManager &SM,
LangOptions &LangOpts) {
      const SourceLocation EndLoc = Lexer::getLocForEndOfToken(BeginLoc, 0,
SM, LangOpts);
      return Lexer::getSourceText(CharSourceRange::getTokenRange(BeginLoc,
EndLoc), SM, LangOpts);
    }

   -Eric

PS - Is there a good page somewhere that outlines how these layers hang
together? Just don't wanna take up time if there's docs somewhere.

I think that the “Clang” CFE Internals Manual (
http://clang.llvm.org/docs/InternalsManual.html) might give you some of the
information that you're looking for.

Alex

Alex,

First off thanks so much for your help (and probably patience at this point.) Okay, that all works with a few tweaks. I spent most of the day trying to figure out how I get the definition. I have been looking at the getSpellingLoc() which seems to get me one end of it, but I can't seem to figure out how I find the end of the definition. If this were just a string I'd look until I found a line break that wasn't preceeded with a \. So far I tried constructing a lexer and using ReadToEndOfLine() and LexFromRawLexer() based on some things I found online. Neither seemed to work. My eventual goal is to get another SourceRange and check it for macros as well, etc, right now the return is StringRef just for debugging. I.e. I want to check for any macro dependency trees. I've attached the code below of what I tried. ReadToEndOfLine() seems to never advance anything, and LexFromRawLexer() seems to never come across an Tok::eod. :confused: Some output below the function clip. Maybe there's an entirely easier approach?

    -Eric

StringRefgetTokensThroughEndOfDefine(SourceLocationBeginLoc, SourceManager&SM) {
constLangOptions&LangOpts=getDefaultLangOpts();
SourceLocationCurLoc=BeginLoc;
SourceLocationNextLoc;
intiter=0;

std::pair<FileID, unsigned>cur_info=SM.getDecomposedLoc(BeginLoc);
boolinvalid=false;
StringRefbuf=SM.getBufferData(cur_info.first, &invalid);

if(invalid) {
returnnullptr;
}

// Get the point in the buffer
constchar*point=buf.data() +cur_info.second;

// Make a lexer and point it at our buffer and offset
Lexerlexer(SM.getLocForStartOfFile(cur_info.first), LangOpts,
buf.begin(), point, buf.end());

while(1) {
// read through the end of line
SmallString<128>text;
lexer.ReadToEndOfLine(&text);

if(text.back() !='\\') {
break;
}

llvm::errs() <<"Incomplete line, so far: "<<
getCodeString(SM, BeginLoc, lexer.getFileLoc(), "Token") <<"\n";
}

returngetCodeString(SM, BeginLoc, lexer.getFileLoc(), "Definition");
#if0
Token tok;
while(1) {
lexer.LexFromRawLexer(tok);

if(tok.is(tok::eof) || tok.is(tok::eod)) {
break;
}

llvm::errs() << "Token[" << tok.getName() << "]: \"" <<
getCodeString(SM, tok.getLocation(), tok.getEndLoc(), "Token") <<
"\"\n";
}

returngetCodeString(SM, BeginLoc, tok.getEndLoc(), "Definition");
#endif
}

Example failure on tokens: (and ignore the fact that we're sorta printing out two tokens on every line as getEndLoc() seems to really be the next token and getCodeString() seems to print on token boundaries.)

Macro name: ASSERT
Macro string: ASSERT((getFirstMatchingOnly && firstMatching != nullptr) ||
           (!getFirstMatchingOnly && (allMatchingMo != nullptr ||
                                      allMatchingMoRef != nullptr)))
Token[raw_identifier]: "ASSERT_IFNOT("
Token[l_paren]: "(cond"
Token[raw_identifier]: "cond,"
Token[comma]: ","
Token[raw_identifier]: "_ASSERT_PANIC("
Token[l_paren]: "(AssertAssert"
Token[raw_identifier]: "AssertAssert)"
Token[r_paren]: "))"
Token[r_paren]: ")" <---- I'd expect a eod token here. Guessing though.
Token[hash]: "#define"
Token[raw_identifier]: "define"
...

Hi Alex,

Thanks again for all your help. Actually I did manage to make this eventually work with a huge amount of Lexer/Token magic. Unfortunately I could not actually follow it past 1 level, so it looks like an unworkable solution for what I have in mind. I had thought the macro definitons were expanded from other macros. After digging though the SLocEntries, this is clearly not the case and they are expanded at their final uses. This means I'm going back to the PPCallbacks and digging in there. I think I can get the whole tree without annotating every macro I meet. Apparently MacroExpands gets called repeatedly at the point the macro gets used.

Kind Regards,
    -Eric

Oh, one follow on question... If I persist a MacroDefinition* or MacroInfo* in a datastructure, will this still exist when processing the AST tree or will these have been deleted out from under me? Just trying to understand how much info I need to copy in a few cases.

    -Eric

Oh, one follow on question... If I persist a MacroDefinition* or
MacroInfo* in a datastructure, will this still exist when processing the
AST tree or will these have been deleted out from under me? Just trying to
understand how much info I need to copy in a few cases.

MacroInfo, MacroDirective and ModuleMacro objects should persist until the
Preprocessor object is destroyed. You can hold onto them until that happens.

MacroDefinition has value semantics, so if you want to persist a
MacroDefinition* it's up to you to store the MacroDefinition object
somewhere. But you should probably just be passing it around by value.

That had been my guess, unfortunately as I’m a plugin I am not creating that myself. I guess my real question is then, at what stage is the default preprocessor object destroyed in the compiler. Ohhh, gotcha! Thanks for the tip.

Thanks Richard,

Question/comment inline.

Oh, one follow on question... If I persist a MacroDefinition* or
MacroInfo* in a datastructure, will this still exist when processing the
AST tree or will these have been deleted out from under me? Just trying to
understand how much info I need to copy in a few cases.

MacroInfo, MacroDirective and ModuleMacro objects should persist until the
Preprocessor object is destroyed. You can hold onto them until that happens.

That had been my guess, unfortunately as I'm a plugin I am not creating
that myself. I guess my real question is then, at what stage is the
default preprocessor object destroyed in the compiler.

I believe the preprocessor should still be around for all callbacks your
plugin receives (except perhaps in destructors).

MacroDefinition has value semantics, so if you want to persist a

Just in case anyone ever needs to get the relevant macro text from either an ExpansionInfo or a MacroInfo to allow emitting that macro completely, I've included code for the way I figured out how to do it. I didn't find an easier way, but maybe I missed something, but it was pretty hard. For the rest of the detection I switched to using the Preprocessor callbacks to catch all the macros emitted in groups (using the range to know what goes together.) Looks like everything is working.

Thanks to everyone for all their help!
   -Eric

typedefstd::pair<clang::SourceRange, clang::SourceRange>MacroDefinitionPair;

MacroDefinitionPairgetMacroDefinitionPair(constSourceLocationDefBodyLoc, constSourceManager&SM, constLangOptions&LangOpts) {
MacroDefinitionPairRanges;
intiter=0;

//SourceLocation scur = SM.getSpellingLoc(BeginLoc);
std::pair<FileID, unsigned>cur_info=SM.getDecomposedLoc(DefBodyLoc);
boolinvalid=false;
StringRefbuf=SM.getBufferData(cur_info.first, &invalid);

if(invalid) returnRanges;

// Get the point in the buffer
constchar*buf_start=buf.data();
constchar*orig_point=buf_start+cur_info.second;
constchar*point=orig_point;

// Find the point we begin this #define
while(1) {

// Search backwards until a new-line
while(point>buf_start) {
if(*(point-1) =='\n') break;
point--;
}

// Make a lexer and point it at our buffer and offset and ignore
// comments.
Lexerlexer(SM.getLocForStartOfFile(cur_info.first), LangOpts,
buf_start, point, buf.end());
lexer.SetCommentRetentionState(false);

// Parse the first two tokens
Tokentok_h, tok_define;
lexer.LexFromRawLexer(tok_h);
lexer.LexFromRawLexer(tok_define);

// If we match the beginning of a define, then we are done
if(tok_h.is(tok::hash) &&tok_define.is(tok::raw_identifier) &&
tok_define.getRawIdentifier() =="define") {

// Get the name token (this skips over the leading space)
Tokentok_name;
lexer.LexFromRawLexer(tok_name);

// The range starts at our token and goes through one token before
// the body.
Ranges.first.setBegin(tok_name.getLocation());
Ranges.first.setEnd(DefBodyLoc.getLocWithOffset(-1));

// Done processing, move on.
break;
}

// Backup one more and keep looking
point--;

// If we can't find the beginning, return null ranges to represent
// an invalid starting point.
if( point<buf_start) returnRanges;
}

// Make a lexer and point it at our buffer and offset
Lexerlexer(SM.getLocForStartOfFile(cur_info.first), LangOpts,
buf.begin(), orig_point, buf.end());
lexer.SetCommentRetentionState(false);

// Intermediate variables
Tokentok;
SourceLocationEndBodyLoc;

// Advance 1 token because when we start the tokenizer it assumes that
// we're the start of line.
lexer.LexFromRawLexer(tok);

// Read tokens until we find the next Start of Line.
while(1) {
lexer.LexFromRawLexer(tok);

// If we're EoF or SoL, we should stop advancing
if(tok.is(tok::eof) ||
tok.getFlags() &Token::TokenFlags::StartOfLine) {
//if (tok.is(tok::eof)) {
break;
}

// Cache the last token's final location
EndBodyLoc=tok.getEndLoc().getLocWithOffset(-1);
}

// Mark the end of the macro body
Ranges.second.setBegin(DefBodyLoc);
Ranges.second.setEnd(EndBodyLoc);

// Return the ranges
returnRanges;
}