Visiting statements of a CFG one time

Hi all,

I'm trying to iterate over all top-level functions of a C file and perform
control-flow analysis of each function body. My current approach looks as

class MyASTVisitor : public DeclVisitor<MyASTVisitor> {
  ASTContext &Context;
  explicit MyASTVisitor(ASTContext &Context) : Context(Context) { }

  void VisitFunctionDecl(FunctionDecl *FD) {
    if (!FD->hasBody())


    const CFG::BuildOptions opts;
    auto CFG = CFG::buildCFG(FD, FD->getBody(), &Context, opts);
    CFG->dump(Context.getLangOpts(), true);
    // perform the actual analysis over the CFG

class MyConsumer : public ASTConsumer {
  MyASTVisitor Visitor;
  explicit MyConsumer(ASTContext &Context) : Visitor(Context) { }

  bool HandleTopLevelDecl(DeclGroupRef DR) override {
    for (auto *I : DR)
    return true;

What confuses me right now is how the CFG is structured. Consider the
following C code:

extern int *pi;
extern unsigned char *pc;
void foo(void) {
    *(pi=(int *)pc, pi) = 42;

The dump of the corresponding CFG looks as follows:

[B2 (ENTRY)]
   Succs (1): B1

   1: pi = (int *)pc
   2: pi (ImplicitCastExpr, LValueToRValue, int *)
   3: ... , [B1.2]
   4: *([B1.3]) = 42
   Preds (1): B2
   Succs (1): B0

[B0 (EXIT)]
   Preds (1): B1

The single statement
  *(pi=(int *)pc, pi) = 42;
is broken up into 4 statements in the CFG. That means, if I iterate over each
statement of a block like

for (const CFGBlock *block : *CFG) {
  for (const auto &I : *block) {
    if (Optional<CFGStmt> cs = I.getAs<CFGStmt>())

and the transfer functions TF recursively visit each subexpression, I will
visit some of them twice. My plan was actually to construct transfer functions
of type ConstStmtVisitor in order to examine each statement recursively. In
the past I ran into a similar problem which was related to wrongly set
CFG::BuildOptions. However, this does not seem to be the case this time.

Is it possible to construct a CFG where statements are not split? Or is there
another approach to walk over all statements of a basic block exactly one time?


I ran into exactly this problem last week. I wanted to visit top-level statements (and expressions since expression-statements are just expressions) and recursively work through their ASTs. I just wrote a small pre-pass over the CFG elements that would construct a set of top-level statements by noting which expressions occurred as subexpressions and then extract the statements from there.

There’s probably a better way to do this, but here’s what my implementation of that function looks like.

void findStatements(CFGBlock *B, llvm::SmallVectorImpl<const Stmt > &SS) {
llvm::SmallDenseMap<const Stmt
, bool> Map;

// Mark subexpressions of each element in the block.
for (auto I = B->begin(); I != B->end(); ++I) {
CFGElement E = *I;
if (auto SE = E.getAs()) {
const Stmt *S = SE->getStmt();
for (const Stmt *K : S->children())
Map[K] = true;

// Any expressions not in Map are statements.
for (auto I = B->begin(); I != B->end(); ++I) {
CFGElement E = *I;
if (auto SE = E.getAs()) {
const Stmt *S = SE->getStmt();
if (Map.find(S) == Map.end())

Andrew Sutton

Note that the surrounding statement is not necessarily in the same block. Say, your algorithm would fail to find full statement ‘x ? y : z;’ by looking at its sub-expression ‘y’.

As far as i’m aware, no existing data flow analyses operate the way you describe.

In which cases are subexpressions pulled out? Is the idea behind the
splitting to come up with something similar to expressions in three
address format?


Interesting, though I think you need to do it recursively since
children() is not recursive. In my case one of the expressions which
got pulled out had an extra ParenExpr around in the embedded expression.

I will further play around with this idea.