libclang: Get tokens after preprocessing ("as the compiler sees it")

Dear all,

I would like to tokenize a C++ file "as the compiler sees it", i.e. after the preprocessor.

For example, given the following simple C++ program, I would like to get the tokens from the line with "MYASSERT(1 > 2)" after expanding the precompiler macros.

I guess that this is not possible with same API, but I might be wrong. How hard would it be to write such a tokenization function for libclang? What would the necessary steps be?

Bests,
Manuel

#include <cstdlib>
#include <cstdio>

#define MYASSERT(x) do { if (!(x)) { fprintf(stderr, "ASSERTION FAILED!\n"); exit(1); }} while (false)

int main()
{
     MYASSERT(1 > 2);
     return 0;
}

You can get raw tokens from clang via:

clang::Preprocessor* pp = /* ... */;
clang::Token token;
pp->Lex(token);
while (token.isNot(clang::tok::eof)) {
  /* do something with token */
  pp->Lex(token);
}

I forget if that gives you tokens after or before preprocessing. If
it's before, you can always preprocess the input first with
"clang++/g++ -E" - though there's probably a better way through API.

-Alexei