Help me do Hacking Clang source code

Hi list,

I am now started to read Clang source code.
And I found any conversion from input file(.c, .i, .s) to the output
file(.i, .s, .o) is done the process below.

- FrontAction::BeginSourceFile()
- FrontAction::Execute
- FrontAction::EndSourceFile()

And I drew some activity diagram below.

From the diagram I can guess that these are main actors in the movie, Clang.

- Diagnostic client
- Action
- Preprocessor
- Consumer of AST and AST
- File and Source Manager

Because Clang is well-structured C++ code, I believe there would be a
big picture for it's design.
If someone provide hints about this I would read Clang much more smoothly.
I'd like to know the architecture used in the back of those below.
- FrontAction::BeginSourceFile()
- FrontAction::Execute
- FrontAction::EndSourceFile()

I even don't know where to find the starting point of compile, assemble.

The pattern used, I also want to know.

Thank you very much in advance.

Journeyer J. Joh

You should look at <>,
which illustrates how the driver works. The driver is the program
responsible for deciding things like compile vs. assemble vs. link.

For the actual step of parsing and compiling C/C++, this diagram maybe
helpful <>. The diagram was
made just by looking at the constructor arguments of each class. The
arrows represent arguments to constructors, so you can see, for
example, that Parser depends on Sema and Preprocessor. The purple
nodes represent interfaces/abstract classes: these usually represent
points of customizability. The diagram by no means contains all the
classes involved, but it should give you a better idea of the "big

The main actors in the compilation step are Sema, Parser, and
Preprocessor. Effectively, Parser calls methods on Preprocessor to get
tokens; depending on the tokens themselves (i.e., which syntactic
constructs are found), Parser then takes information taken from those
tokens and calls into Sema, which acts on that information and builds
AST nodes. In effect, Sema is a giant class with a bunch of callbacks
("semantic actions") corresponding to different aspects of
semantically analyzing C/C++ and building up the C/C++ AST.

For example, if you look at Parser::ParseWhileStatement in
lib/Parse/ParseStmt.cpp, you can see this flow: it first consumes a
`tok::kw_while` from the Preprocessor (it knows (and asserts) that the
next token is a "while" token because otherwise it would not be called
to parse a while (by
Parser::ParseStatementOrDeclarationAfterAttributes)). It then does
some diagnostics checking that it is followed by the "(" of the while
condition. It then enters a scope (which through an RAII object calls
Parser::EnterScope) by setting Actions.CurScope (`Actions` is a Sema
object) to a new scope (and chaining the current one as the parent of
this new scope). It then recursively calls ParseParenExprOrCondition
and returns an error if that fails. If you then skip down a little
bit, you'll see that it recursively calls ParseStatement to parse the
body of the while loop. It then exits the scopes it has made and does
some error checking. Finally, it calls Sema::ActOnWhileStmt
(`Actions.ActOnWhileStmt([...])`) which builds the AST nodes
corresponding to the "while", along with any semantic analysis it
needs to do.

Hope that helps.

btw, there's a cfe-dev mailing list that would be more appropriate for
this question (CFE=="C FrontEnd", but it should really be called
"clang-dev"). I've CC'd them and un-CC'd llvmdev.

--Sean Silva

Thanks a lot.

Best regards