source-to-source transformations using LLVM/ Clang


I need some help in understanding few things related to llvm/clang.

What I want to do is to take a C source code and then get some information from source code like the number of branches, load/store count etc. and then instrument the source code with this obtained information. At the moment, I can think of 2 approaches:

(1) Clang: Generate AST from the source code, obtain these counts (if it is possible) from AST traversal and do then do transformations on AST to generate instrumented source code so that I can use gcc later.

(2) LLVM: Emit bitcode file from clang. Write an analysis pass to count these values and then using Transformation pass, add these values to thr IR and then, if possible, generate instrumented source code.

Can someone kindly suggest which option is better as both of these approaches deal at different levels. Any pointers to some sample codes would be really helpful to me.


Do you need to output instrumented C source code? Do you want to run before or after optimizations? Do you want to see register spills or not?

If you want to instrument control flow and loads and stores, then you probably want to instrument LLVM IR. There are examples of this kind of thing in llvm/lib/Transforms/Instrumentation, like the sanitizers. However, LLVM no longer has a C backend, so you won’t be able to get C source code back out without some effort.

Thanks for the reply Reid.

I would like to generate an instrumented C source code so that I can execute using other compilers like gcc or any proprietary compilers later and want to run the optimizations later. I’m not sure on register spills though. So do you suggest to go with Clang? Is there a way to count loads, stores from clang AST. I was going through different node types of AST class but was not able to find such info. For counting branches, I can rely on checking a stmt (if etc.,)