Questions on implementing a custom preprocessor

Hello:

I just started looking at libClang recently, so please pardon me if this is a noobish question.

I want to essentially build a custom preprocessing stage and I am not sure where to get started. I want to be able to generate some code based on some code in my source files, things like:

  1. Insert statements in selected functions
  2. Insert members in a struct

And I want to be able to to this as part of the compilation, like, in other words, I don’t see this as a refactoring, i.e. the original source file should remain intact, and all compilation error messages should reference line numbers for the original file, and I’m not really concerned with keeping the intermediate/preprocessed version anywhere. Ideally this could be done as a single step.

I have found the documentation for C api for libclang and it looks like it is mainly for reading the AST, like you can’t actually start a compilation or alter the AST.

I also found the C++ API for the “Driver” code, and that looks more functional, but it isn’t mentioned as a recommended API, so I wanted to check to see if maybe I am missing something in the C api…

There are also the plugins API, and I found some examples on how to rewrite code there using the “Rewriter” class but that looks like its designed for refactoring, not preprocessing. Specifically it doesn’t output or keep track of line markers, and after you rewrite the code there doesn’t seem to be a way to compile the new version of the code.

The best strategy that I have for moving forward is to try to use the “Driver” c++ api to run a custom plugin and do a preprocessor stage only, then use a “RecursiveASTVisitor” to go through the whole AST, and output that into a temporary buffer/file, optionally making inserts/edits based on conditions, and manually implementing line markers by using “getSourceRange” and then looping that process until there is no more changes then feeding the temporary buffer into the compilation stage…

There is also the issue that sometimes my messages to the preprocessor aren’t actually valid statements (undeclared identifiers), and it seems that the AST Visiting functionality completely ignores any error statements… The best way that I can think to work around this is to just add an option to define them as some kind of internal function/variable that I can filter for…

Any feedback would be appreciated. Thanks :slight_smile:

Hello:

I just started looking at libClang recently, so please pardon me if this
is a noobish question.

I want to essentially build a custom preprocessing stage and I am not sure
where to get started. I want to be able to generate some code based on
some code in my source files, things like:
1. Insert statements in selected functions
2. Insert members in a struct

And I want to be able to to this as part of the compilation, like, in
other words, I don't see this as a refactoring, i.e. the original source
file should remain intact, and all compilation error messages should
reference line numbers for the original file, and I'm not really concerned
with keeping the intermediate/preprocessed version anywhere. Ideally this
could be done as a single step.

I have found the documentation for C api for libclang and it looks like it
is mainly for reading the AST, like you can't actually start a compilation
or alter the AST.

I also found the C++ API for the "Driver" code, and that looks more
functional, but it isn't mentioned as a recommended API, so I wanted to
check to see if maybe I am missing something in the C api...

There are also the plugins API, and I found some examples on how to
rewrite code there using the "Rewriter" class but that looks like its
designed for refactoring, not preprocessing. Specifically it doesn't
output or keep track of line markers, and after you rewrite the code there
doesn't seem to be a way to compile the new version of the code.

Plugin API is not confined to refactoring, it allows running arbitrary
transformations over AST, so probably this way fits your needs.

The best strategy that I have for moving forward is to try to use the
"Driver" c++ api to run a custom plugin and do a preprocessor stage only,
then use a "RecursiveASTVisitor" to go through the whole AST, and output
that into a temporary buffer/file, optionally making inserts/edits based on
conditions, and manually implementing line markers by using
"getSourceRange" and then looping that process until there is no more
changes then feeding the temporary buffer into the compilation stage...

If you don't bother about distribution of your product, you could choose
patching clang. Just put your transformation somewhere in clang sources,
`Sema::ActOnEndOfTranslationUnit` may be a good place. Depending on what
processing you need after you inserted you code (for instance, whether you
need template instantiations), you put call to you code in different
places. If your transformation produces AST ready to codegen, the call may
be placed in ParseAST, after parser but before call to Handle* methods of
ASTConsumer. This way may be simpler to start.

There is also the issue that sometimes my messages to the preprocessor

aren't actually valid statements (undeclared identifiers), and it seems
that the AST Visiting functionality completely ignores any error
statements... The best way that I can think to work around this is to just
add an option to define them as some kind of internal function/variable
that I can filter for...

If you want to pass your AST to codegen, you must keep it perfect, codegen
must not see invalid statements or declarations. If new names are
introduced, they must be properly declared. Put the source code your
transformation must produce into a file and run 'clang -cc1 -ast-dump
file.cpp` to see what your code must look as in AST and try to build
similar tree.