Clang AST: how to get more precise debug information in certain cases?

Hi all,

I would like to obtain the correct source locations from the Clang AST
information but I am having some difficulties in doing that.

The attached screenshot example-1 demonstrates running clang -Xclang
-ast-dump foo.c on the following code:

int foo(int aaaaa, int bbbbb) {
  if (aaaaa == bbbbb) {
    return 1;
  } else {
    return 0;

The source locations for the BinaryOperator only cover the "aaaaa =="
part, not the full expression "aaaaa == bbbbb".

In other cases (see the example-2) only the part of the second operand
is covered but never the whole expression.

My questions are:

1) Is there a specific reason for this behavior?
2) Which API should I use to obtain more precise information about the
source locations? I have tried using getExpansionColumnNumber() and
getColumnNumber() functions of the SourceManager class but they both
give me the results similar to those of -ast-dump and I never get the
precise information.

Thanks in advance.

Stanislav Pankevich

I /think/ that column 16 in your example is the first b in bbbbb, and that Clang’s AST dump is printing the first column of each token (& that the range internally is stored in terms of tokens, essentially) - the dumping format could be improved to compute the size of the token and print the end of it instead of the start. Though this is mostly an aid for the developers of Clang and having a more raw/low-level/closer-to-how-the-compiler-is-internally might be the right tradeoff here, I don’t know for sure.

Just closing the loop here. Back then, I found the solution on Stack Overflow:

Use the Lexer module:

clang::SourceManager *sm;
clang::LangOptions lopt;

std::string decl2str(clang::Decl *d) {
    clang::SourceLocation b(d->getLocStart()), _e(d->getLocEnd());
    clang::SourceLocation e(clang::Lexer::getLocForEndOfToken(_e, 0,
*sm, lopt));
    return std::string(sm->getCharacterData(b),

"Getting the source behind clang's AST",