source-code representation of an Expr

Is there a relatively painless way to get the exact source-code representation of an Expr? I’ve looked into using SourceManager::getCharacterData(E->getExprLoc()), but it’s not really what I want. I am fairly new to the clang API so I realize that I may have missed something obvious.

Thanks for your help,
Sam

Well, starting with the expression's SourceRange instead of a single SourceLocation would be a good start.

John.

Yes, I just don’t know what to do with it :wink: In other words, I don’t see a clear use of the SourceManager API once I have the SourceRange to extract the Expr’s source-code. I don’t see any API calls that use SourceRange or beginning and ending SourceLocations for source-code extraction.

We should probably make some API for this.

What you can do for now is something like the following:

  SourceRange range = expr->getSourceRange();
  if (range.getBegin().isMacroID() || range.getEnd().isMacroID()) {
    // handle this case
  } else if (!sourceManager.isFromSameFile(range.getBegin(), range.getEnd())) {
    // handle this case
  } else {
    range.setEnd(preprocessor.getLocForEndOfToken(range.getEnd()));
    const char *begin = sourceManager.getCharacterData(range.getBegin());
    const char *end = sourceManager.getCharacterData(range.getEnd());
    llvm::StringRef string(begin, end - begin);
    // now you can do whatever you want
  }

John.

John,

That works perfectly. Thank you so much for your help!

Sam

We should probably add this to a FAQ of some sort that documents how to do X with Clang.

What is the right way to address the first condition in the code you suggested? I have been banging my head against my monitor long enough that I now admit defeat. I’ve been looking mostly at the APIs for clang::Preprocessor and clang::SourceManager. I’ve also read through the Clang Internals document.

The bigger issue is that I’d really just like to understand how to properly traverse the source-code representation of statements I encounter via the AST in an efficient manner. Ideally this would include the ability to examine the source-code representation of a statement (or one of its sublcasses) before or after preprocessing, e.g., before and after macro expansion.

Here’s what I know so far:

  1. For an expression “Expr * e”, I can retrieve its source range via e->getSourceRange().
  2. Using SourceRange::getBegin() I can call SourceManager::getCharacterData()
  3. From what I can tell, SourceManager::getCharacterData returns source-code post macro-expansion.
  4. If my expression doesn’t contain any macros, the code that John McCall suggested works fine for my purposes. However, I am interested in a number of cases where there are macros involved (specifically, I am looking at function arguments, i.e., my “expressions”, that contain macros).

Can anyone help? I feel that Clang is likely capable of providing me with the information that I want as-is, but I just can’t figure this one out via API intuition. If it’s not, I’d be glad to submit a patch to the API if anyone can steer me in the right direction.

Thanks in advance,
Sam

getCharacterData will give you the data for whatever location you give it. The problem with macros is that there are multiple locations involved: there's the spelling location, where the actual token was written/formed, and there's a chain of arbitrarily many instantiation locations, where the name of some macro was written. Clang preserves this full macro-instantiation stack, and you can walk up from the spelling location (which is what's generally stored in the AST) through the chain of instantiation locations. SourceManager::getInstantiationLoc() will jump the whole way for you, or you can walk step-by-step, in which case you need to understand a bit more about how Clang's SourceLocation abstraction works.

A SourceLocation is basically just an offset within a FileID. A FileID is either a specific inclusion of a physical file or it's a macro instantiation buffer; SourceLocation::isMacroID() tells you which one, although you can also (at much greater expense) ask the SourceManager for a location's FileID, look up the FileID's SourceManager::SLocEntry, and then ask that. The SLocEntry for a macro location will have a SourceManager::InstantiationInfo that will tell you the range of the expression from which the macro was instantiated, i.e. moving exactly one level up the instantiation stack.

But I can't tell you what your project should do with this information.

John.