Syntax Parser using Clang

I’m attempting to write a syntax parser using clang to let me list out tokens (variables, types, etc) for another toolset.

I have a file specified as

#include “bogus.h”

const bogus::bogus_type bogus::bogus_val = 0;

int main(){

int alpha = 40;

return(-1);

}

and I would expect to get a list of something like:

bogus::bogus_type

bogus_bogus_val

alpha

main

(built in values as well)

I’m using the C API, and for simple C files, it works fine. However once I start going into C++ bodies, especially when the header file isn’t found (again, I only care about syntax, not making sure its valid), it’s giving me only:

alpha

main

and ignoring the bogus values. I’m almost thinking that I am basically using too powerful of a tool for what I need, and I should find something “dumber”.

My main follows (whole thing also attached)

Compile with:

g++ parse.cpp -lclang -L/usr/lib64/llvm -o parse

int main(int argc, char* argv){

init_filter();

CXIndex index = clang_createIndex(1, 1);

unsigned int options = CXTranslationUnit_None;

// We don’t want to expand any #include statements

// so disable the standard include locations

const unsigned int num_args = 2;

const char* const args[num_args] = {

“-nostdlibinc”,

“-nostdinc”

};

std::cout << “-----------------” << std::endl;

// Parse the file

CXTranslationUnit tu =

clang_parseTranslationUnit (

index, // index to associate w/ this translation unit

argv[1], // source file name

args, // number command line args

num_args, // command line args

0, // number of unsaved files

NULL, // unsaved files

options

);

std::cout << “-----------------” << std::endl;

// Get a cursor into the parsed file

CXCursor cursor = clang_getTranslationUnitCursor(tu);

if(clang_Cursor_isNull(cursor)){

std::cout << “Cursor was NULL!” << std::endl;

exit(-1);

}

// Visit the children

clang_visitChildren(cursor, visitor, NULL);

// Print out the unique tokens

std::cout << std::endl << “Unique Tokens:” << std::endl;

for(tokenSet_t::iterator iter = token_set.begin();

iter != token_set.end();

++iter)

{

std::cout << “\t” << *iter << std::endl;

}

return 0;

}

simple.cpp (136 Bytes)

Hello David,

   once I tried to do something similar, but only looking at the loops & procedures within the code. When I tried that I was on a hurry due to a deadline, and I simply relied on the output of

   clang -cc1 -ast-dump <inputfile>

   which was almost complete for me. Maybe you can give this a try.

   BTW, now I have some more spare time, I can try something like using clang's API like you.

   Just my 0.02€.

Best regards.

This is more of a "lets try and replace a lot of nasty, ugly, awful code with something that is documented, and third party". It's just trying to tickle out what I want from it :-/