AST and Tokens?

Hi All,

I'm hoping that someone can help point me in the right direction - I seem to be going round in circles at the moment.

I'm using ClangTool coupled with a RecursiveASTVisitor to examine some C-code [1]. In addition to the AST I'd like to be able to examine a token stream of the source. I've found a couple of examples ([2], [3]) of using the pre-processor to lex the file, however I was hoping that as the file is already being lexed in order to generate the AST I'd just be able to hook in to that process somehow rather than (what seems to be) significantly duplicating that effort.

Any suggestions gratefully accepted.

Many Thanks,

John

[1] https://github.com/bright-tools/ccsm/blob/master/src/ccsm.cpp
[2] http://amnoid.de/tmp/clangtut/tut.html
[3] https://github.com/loarabia/Clang-tutorial/wiki/TutorialOrig

What exactly do you need to do? The Clang Lexer API makes it very easy to
obtain the token at any location. For example clang::Lexer::getRawToken.
clang::Lexer::GetBeginningOfToken is also useful, as are the other
utilities around it.

Eli

Hi Eli,

I've found a couple of examples ([2], [3]) of using the pre-processor
to lex the file, however I was hoping that as the file is already being
lexed in order to generate the AST I'd just be able to hook in to that
process somehow rather than (what seems to be) significantly
duplicating that effort.

What exactly do you need to do? The Clang Lexer API makes it very easy to
obtain the token at any location. For example clang::Lexer::getRawToken.
clang::Lexer::GetBeginningOfToken is also useful, as are the other
utilities around it.

Thanks for the response (BTW, I found the libTooling post on your blog really useful when I was last working with Clang) - I'll take a look at the methods you've mentioned.

My objective is to gather various statistics relating to the operators and identifiers in the code (occurrence count in the first instance). e.g. for something like:

int example_manipulation( int i, int j )
{
    return(( i + j ) / (i * j));
}

it might determine that 3 operators had been used ( +, /, *) and 3 identifiers (example_manipulation, i, j). Having this as a stream of (already pre-processed) lexed symbols seemed to make the associated processing quite straight forward.

It struck me that I could achieve my aim through the RecursiveASTVisitor derived class that I already have but that it would require me to implement a lot of methods (i.e. all of the Visit* suite) in order to make sure that everything was accounted for and also to have some processing for some of the visited nodes (e.g. for a ForStmt to process the returns of of getInit(), getCond(), getInc(), getBody() ).

Having looked at the API again, though, I'm now wondering if it would be better to walk the AST in a more abstract way, using get_children() in order to visit each node and simply checking the node type in order to gather the statistics.

Thanks,

John