CFG documentation

Hi folks,
I'm trying to hack CLANG to translate its Control Flow Graph (CFG) to the
native CFG for my own tool -- essentially use CLANG as a frontend (and
possibly more) to a pre-existing tool. Does anybody have any documentation
on the CLANG control flow graph other than the doxygen stuff? I would really
appreciate any pointers on where to start.

Regards,
Gaurav

Hi Gaurav,

There is some documentation on the CFG here:

  http://clang.llvm.org/docs/InternalsManual.html#CFG

It's a bit dated, but it explains the core concepts of how the CFG is represented. The CFG has recently been expanded so that it can contain other elements besides just Stmt*, but the design is still the same.

Note that you can dump CFGs from the command line:

$ clang -fsyntax-only -Xclang -analyze -Xclang -dump-cfg t.c

Hi Ted,

Thanks for the link. I had actually already gone through all the documentation online and am currently sifting through the source code starting at cfg.h and branching out. I think using the command line to dump the CFG will help get a jump start, thanks.

Through my posting on the mailing list I was looking for more information on:

  1. Whether there exists a CFG API that hides the implementation details.

  2. The CFG structure and how it relates/uses the AST. E.g. how would one lookup the nature of a variable in an expression or its scope, how the symbol tables are accessed, etc. I think this would relate to how the AST gets translated to the CFG.

Regards,

Gaurav

Hi Ted,

Thanks for the link. I had actually already gone through all the documentation online and am currently sifting through the source code starting at cfg.h and branching out. I think using the command line to dump the CFG will help get a jump start, thanks.

Through my posting on the mailing list I was looking for more information on:

  1. Whether there exists a CFG API that hides the implementation details.

The public methods of the CFG class should be considered as part of its API.

  1. The CFG structure and how it relates/uses the AST. E.g. how would one lookup the nature of a variable in an expression or its scope, how the symbol tables are accessed, etc. I think this would relate to how the AST gets translated to the CFG.

I think some of your questions more have to do with how some fundamental concepts are represented in Clang in general. The CFG simply models a control-flow relation between elements (i.e., statements and expressions) in the AST, no more and no less. It does not encapsulate scope or reason about anything else except statements and expressions.

The AST encapsulates statements, expressions, and declarations. There is no symbol table per se in the Clang frontend, as all symbols are uniquely identified by their (canonical) declarations. The LLVM backend reasons about symbols differently since that represents a strictly low-level representation of the program. Scope is currently not modeled explicitly in the AST, but it can be inferred. The parser and semantic analyzer reason about scope while building the AST, but that information is not recorded in the AST.

Hi Ted,

Thanks for the explanation. After a brief chat with someone who explored using LLVM for a slightly different purpose, I have determined that I might be barking up the wrong tree. Maybe I should be looking at the LLVM assembly language instead since it maintains type information anyway and should be easier to convert to my tool’s native CDFG.

Regards,

Gaurav