Reading the AST from the bitcode generated by clang

Hi all,
After reading the documentation on clang I still have the following
question:
How do I read the bitcode generated by clang from a C++ file? I need to
have all the AST information in memory.

AST has nothing to do with LLVM bitcode.

What is the best level to use for backend development? We do not need to
generate machine code, but rather an intermediate representation that will
give us the control flow graph.
Is this the llvm IR?
If so, how do I read into memory a file that was generated by clang++
-emit-llvm <source file>.

Thanks,
Sitvanit

Sitvanit Ruah
Formal Verification Group
IBM Haifa Research Laboratory
Tel: 972-4-828-1249

Hi Sitvanit,

You can create your own llvm pass as a shared library and use opt to run it. Take a look at the doc here http://llvm.org/docs/WritingAnLLVMPass.html.

Ivan

Hi Sitvanit,

I assume you want to do some sort of formal verification and/or model checking of C++ programs.
I've some experience on the subject, so let me give you some pointers.

Some time ago I also asked here on the mailing list for opinions about where to do these kind of analyses:
http://lists.cs.uiuc.edu/pipermail/cfe-dev/2009-May/005032.html
Daniel Dunbar also provided some great comments:
http://lists.cs.uiuc.edu/pipermail/cfe-dev/2009-May/005048.html

To summarize, I believe the right way to go is to directly convert LLVM IR to your intermediate representation.
The main advantages of using the LLVM IR are:
- You can use LLVM's alias analysis to cheaply discard the simple aliasing queries
- You can use LLVM to optimize the code (greatly reduces the verification time)
- It supports multiple files (it's not something that is easy to do with clang ASTs directly)
- LLVM IR is pretty low level, so all the complex C++ constructs are already desugared. For some analyses this may be a con, but for us it's great.

I would suggest you to implement a ModulePass, which allows you to iterate over all global symbols (i.e., global variables and functions). You can then have an instruction visitor that encodes instructions to your intermediate representation.

Nuno

P.S.: BTW, there's a BMC tool (named LLBMC) implemented on top of LLVM. Our model checker is not (yet) based on LLVM because of historical reasons.