The university of Parma is working to a C language static analyzer.
At the moment we use a home cooked C parser, we'd like to pass to Clang and
use its AST output as input to our simplifier.
I am in charge for the project, I am wondering what is the best way to begin
to study the Clang source in order to understand how it works.
Thanks and keep the good job.
Clang consists of a set of libraries and a command-line driver. You are free to use the libraries in your static analyzer to do parsing, semantic analysis, etc., without having to use the clang driver. You are also free to add functionality to the clang driver (the easiest way is to add additional ASTConsumers).
Probably the most direct way to get familiar with the clang source is to try and do something simple. It can be a toy example such as printing out all the variables of each function in a parsed file (you could write an ASTConsumer do this and add it to the driver). You could also add simple checks to the semantic analyzer; for example we have a check right now that does some simple (and quick) analysis when we build ASTs that looks for cases where you are returning the address of a stack variable. This code was added as simple routine in the Sema library. Doing well-defined, small, encapsulated tasks makes it much easier to get traction, as it allows you to plug in to existing infrastructure.
The clang driver also supports various debugging options. For example, the -ast-dump and -ast-print allow to both pretty-print parsed code and provide a visual dump of the ASTs that can help you understand the internal representation of parsed code. There is also -dump-cfg and -view-cfg if you are interested in using clang's CFGs (which are built on top of the ASTs, and are optional if you want to use them).
I would also focus on the libraries that are of interest to you; if you are interested in the details of the parser, the AST library (and maybe the Analysis library) would probably be of most interest to you. Some parts of clang are better documented than others (we are gradually working on this), but the ASTs in the Stmt.h and Expr.h files have a fair amount of comments. The Analysis library contains both a basic flow-sensitive dataflow solver (and a LiveVariables and UninitializedValues analysis built on top of it) as well as a path-sensitive dataflow solver (useful for writing analyses to find software bugs) that is under heavy development.