CFG initializers and destructors patch

Hi

I’m sending a patch with implementation of C++

  • initializers from constructor initialization list,
  • implicit destructors for objects with automatic storage duration.

For destructors I’ve taken care of:

  • block local scopes,
  • if/switch/for/while/do local scopes (in case there’s no block),
  • if/switch/for/while condition variables,
  • catch exception variable,
  • temporaries bound to const reference.

Is there something that I’ve missed?

As it have been suggested I’ve created hierarchy of CFGElements. Currently there’re two unused types for implicit destructor calls in destructor. I’ve did not revert ability of CFGElement to cast/dyn_cast to Stmt, because it would lead to situation when cast(SomeCFGElement) would return null. I did however add method for downcasting CFGElement object to object of its implemntation class (returned by value, not pointer) with returning invalid object on invalid cast.

I did not add CFGElement for destructors of temporary objects. After giving it some thought I came to a conclusion that it shouldn’t be covered by CFG, because it clearly needs path-sensitiveness.

Cheers
Marcin

cfg-init-dtor.patch (74.3 KB)

Hi Marcin,

Thanks again for doing this. I will try and take a look at this in detail in the next 24 hours and come back with feedback. As I mentioned to Zhongxing and Jordy in a private email, we should hold off and committing any of these changes until LLVM branches tomorrow for the 2.8 release.

I have one question/comment inline (below).

Hi

I'm sending a patch with implementation of C++
- initializers from constructor initialization list,
- implicit destructors for objects with automatic storage duration.

For destructors I've taken care of:
- block local scopes,
- if/switch/for/while/do local scopes (in case there's no block),
- if/switch/for/while condition variables,
- catch exception variable,
- temporaries bound to const reference.

Is there something that I've missed?

As it have been suggested I've created hierarchy of CFGElements. Currently there're two unused types for implicit destructor calls in destructor. I've did not revert ability of CFGElement to cast/dyn_cast to Stmt, because it would lead to situation when cast<Stmt>(SomeCFGElement) would return null. I did however add method for downcasting CFGElement object to object of its implemntation class (returned by value, not pointer) with returning invalid object on invalid cast.

I did not add CFGElement for destructors of temporary objects. After giving it some thought I came to a conclusion that it shouldn't be covered by CFG, because it clearly needs path-sensitiveness.

How is this path-sensitive? The compiler has to generate destructors calls at the IR level without doing any path-sensitive analysis at all (and actually no flow-sensitive analysis either).

W dniu 3 września 2010 06:39 użytkownik Ted Kremenek <kremenek@apple.com> napisał:

Hi Marcin,

Thanks again for doing this. I will try and take a look at this in detail in the next 24 hours and come back with feedback. As I mentioned to Zhongxing and Jordy in a private email, we should hold off and committing any of these changes until LLVM branches tomorrow for the 2.8 release.

I have one question/comment inline (below).

Hi

I’m sending a patch with implementation of C++

  • initializers from constructor initialization list,
  • implicit destructors for objects with automatic storage duration.

For destructors I’ve taken care of:

  • block local scopes,
  • if/switch/for/while/do local scopes (in case there’s no block),
  • if/switch/for/while condition variables,
  • catch exception variable,
  • temporaries bound to const reference.

Is there something that I’ve missed?

As it have been suggested I’ve created hierarchy of CFGElements. Currently there’re two unused types for implicit destructor calls in destructor. I’ve did not revert ability of CFGElement to cast/dyn_cast to Stmt, because it would lead to situation when cast(SomeCFGElement) would return null. I did however add method for downcasting CFGElement object to object of its implemntation class (returned by value, not pointer) with returning invalid object on invalid cast.

I did not add CFGElement for destructors of temporary objects. After giving it some thought I came to a conclusion that it shouldn’t be covered by CFG, because it clearly needs path-sensitiveness.

How is this path-sensitive? The compiler has to generate destructors calls at the IR level without doing any path-sensitive analysis at all (and actually no flow-sensitive analysis either).

If we would want to model destructors of temporaries directly in CFG we would have to generate blocks structure mirroring that generated for full expression. That is because of expressions with control flow (&&, ||, ?:). But what do we need this for?

  • If we want to know of all destructors that can be called at the and of full expression, we can look at CXXExprWithTemporaries in AST.
  • If we want to know exactly what destructors will be called at the end of full expression we can build stack of all temporaries seen on some specific path.
    Direct modeling this in CFG won’t be easier to use by a client in any of this cases IMO.

I obviously disagree with this last point, but hopefully I can give you a good reason of why I think representing these destructors calls in the CFG is valuable.

If we don't model them in the CFG, then every client has to potentially replicate some of this reasoning, which is messy at best. Even in vanilla C, there are enough edge cases to making reasoning about control-flow tricky, especially when you compose language features that have control-flow of their own. In this case, there are control-dependencies between destructors calls of temporaries, as well as control dependencies between these destructors and other destructors, exception handling edges, etc. The whole point of the CFG is that we get it right in one place and clients don't have to think about any special cases. This is crucial, as it really provides a nice decomposition of reasoning between analyses (that just care about the properties they are reasoning about) and all the logic to reason about control-flow.

Finally, if you don't buy that argument, the other benefit for clients of having explicit CFGElements in the CFG for such destructors is that they provide convenient handles (i.e., ProgramPoints) to anchor dataflow facts, both in the path-sensitive engine and basic flow-sensitive analyses.

W dniu 3 września 2010 08:51 użytkownik Ted Kremenek <kremenek@apple.com> napisał:

If we would want to model destructors of temporaries directly in CFG we would have to generate blocks structure mirroring that generated for full expression. That is because of expressions with control flow (&&, ||, ?:). But what do we need this for?

  • If we want to know of all destructors that can be called at the and of full expression, we can look at CXXExprWithTemporaries in AST.
  • If we want to know exactly what destructors will be called at the end of full expression we can build stack of all temporaries seen on some specific path.
    Direct modeling this in CFG won’t be easier to use by a client in any of this cases IMO.

I obviously disagree with this last point, but hopefully I can give you a good reason of why I think representing these destructors calls in the CFG is valuable.

If we don’t model them in the CFG, then every client has to potentially replicate some of this reasoning, which is messy at best. Even in vanilla C, there are enough edge cases to making reasoning about control-flow tricky, especially when you compose language features that have control-flow of their own. In this case, there are control-dependencies between destructors calls of temporaries, as well as control dependencies between these destructors and other destructors, exception handling edges, etc. The whole point of the CFG is that we get it right in one place and clients don’t have to think about any special cases. This is crucial, as it really provides a nice decomposition of reasoning between analyses (that just care about the properties they are reasoning about) and all the logic to reason about control-flow.

Right, if we have destructor that throws (which we shouldn’t, but who knows) we probably want it directly in CFG. I’ll think more about this later, maybe when what is done so far will find it’s way to repository.

Hi Marcin,

Thank you for working on this. This patch is really big. I suggest we focus on the ‘CFGElement’ aspect in the first patch, and put the ‘LocalScope’ things in later patches.

About CFGElement, I think we an avoid using two pointers in it. Currently the second pointer data is used for 'the statement that triggers the dtor. I think we don’t need a pointer for it. The position where the CFGElement resides is the place where it is triggered. It is not triggered by a statement.

Then for the rest one pointer, we let it point to the VarDecl for AutomaticObjectDtor, to the initializer for Base and Member dtors, to CXXBindTemporaryExpr for temporary dtor.

2010/9/3 Marcin Świderski <marcin.sfider@gmail.com>

One thing to keep in mind is that we will want the ability to distinguish different CFGImplicitDtor's that represent the destruction of the same object (at different locations). This will be need for creating ProgramPoints, both for the flow-sensitive and path-sensitive engines, that we can use as points to anchor dataflow facts at unique locations. With Stmt's, for the most part a single Stmt* appears once in a CFG (thus creating unique ProgramPoints). That isn't the case for destructors of the same object. This means we will likely need a second field in CFGElement that has some data that uniquely distingues them. If that is the call location, then that might be unique enough.

My guiding advice is that we shouldn't worry too much about optimizing away the second pointer until we know we have this working not only in the CFG, but also the dataflow engines.

(responding to my own comment)

One thing we could do for ProgramPoints is have them point to the CFGElement itself (e.g., CFGImplicitDtor*), and use that pointer address to be the unique identifier (instead of trying to have the contents of CFGElements uniquely distinguish them). We wouldn't need to do this for all CFGElements (e.g., ones that wrap Stmt*); just the ones we can't uniquely distinguish by their contents alone. Doing this would allow us to optimize away the second pointer as Zhongxing suggested.

That said, keeping the second pointer doesn't sound so bad to me for now, as we may find it to be useful. Diagnostics, however, will want to know where the destructor occurred, so I'm happy as long as we have a way to reconstruct this information.

About CFGElement, I think we an avoid using two pointers in it. Currently the second pointer data is used for 'the statement that triggers the dtor. I think we don’t need a pointer for it. The position where the CFGElement resides is the place where it is triggered. It is not triggered by a statement.

One thing to keep in mind is that we will want the ability to distinguish different CFGImplicitDtor’s that represent the destruction of the same object (at different locations). This will be need for creating ProgramPoints, both for the flow-sensitive and path-sensitive engines, that we can use as points to anchor dataflow facts at unique locations. With Stmt’s, for the most part a single Stmt* appears once in a CFG (thus creating unique ProgramPoints). That isn’t the case for destructors of the same object. This means we will likely need a second field in CFGElement that has some data that uniquely distingues them. If that is the call location, then that might be unique enough.

Makes sense.

W dniu 6 września 2010 05:37 użytkownik Zhongxing Xu <xuzhongxing@gmail.com> napisał:

Hi Marcin,

Thank you for working on this. This patch is really big. I suggest we focus on the ‘CFGElement’ aspect in the first patch, and put the ‘LocalScope’ things in later patches.

About CFGElement, I think we an avoid using two pointers in it. Currently the second pointer data is used for 'the statement that triggers the dtor. I think we don’t need a pointer for it. The position where the CFGElement resides is the place where it is triggered. It is not triggered by a statement.

Then for the rest one pointer, we let it point to the VarDecl for AutomaticObjectDtor, to the initializer for Base and Member dtors, to CXXBindTemporaryExpr for temporary dtor.

2010/9/3 Marcin Świderski <marcin.sfider@gmail.com>

Hi

I’m sending a patch with implementation of C++

  • initializers from constructor initialization list,
  • implicit destructors for objects with automatic storage duration.

For destructors I’ve taken care of:

  • block local scopes,
  • if/switch/for/while/do local scopes (in case there’s no block),
  • if/switch/for/while condition variables,
  • catch exception variable,
  • temporaries bound to const reference.

Is there something that I’ve missed?

As it have been suggested I’ve created hierarchy of CFGElements. Currently there’re two unused types for implicit destructor calls in destructor. I’ve did not revert ability of CFGElement to cast/dyn_cast to Stmt, because it would lead to situation when cast(SomeCFGElement) would return null. I did however add method for downcasting CFGElement object to object of its implemntation class (returned by value, not pointer) with returning invalid object on invalid cast.

I did not add CFGElement for destructors of temporary objects. After giving it some thought I came to a conclusion that it shouldn’t be covered by CFG, because it clearly needs path-sensitiveness.

Cheers
Marcin

Should I split it and send it cfe-commits with some comments?

2010/9/6 Marcin Świderski <marcin.sfider@gmail.com>

W dniu 6 września 2010 05:37 użytkownik Zhongxing Xu <xuzhongxing@gmail.com> napisał:

Hi Marcin,

Thank you for working on this. This patch is really big. I suggest we focus on the ‘CFGElement’ aspect in the first patch, and put the ‘LocalScope’ things in later patches.

About CFGElement, I think we an avoid using two pointers in it. Currently the second pointer data is used for 'the statement that triggers the dtor. I think we don’t need a pointer for it. The position where the CFGElement resides is the place where it is triggered. It is not triggered by a statement.

Then for the rest one pointer, we let it point to the VarDecl for AutomaticObjectDtor, to the initializer for Base and Member dtors, to CXXBindTemporaryExpr for temporary dtor.

2010/9/3 Marcin Świderski <marcin.sfider@gmail.com>

Hi

I’m sending a patch with implementation of C++

  • initializers from constructor initialization list,
  • implicit destructors for objects with automatic storage duration.

For destructors I’ve taken care of:

  • block local scopes,
  • if/switch/for/while/do local scopes (in case there’s no block),
  • if/switch/for/while condition variables,
  • catch exception variable,
  • temporaries bound to const reference.

Is there something that I’ve missed?

As it have been suggested I’ve created hierarchy of CFGElements. Currently there’re two unused types for implicit destructor calls in destructor. I’ve did not revert ability of CFGElement to cast/dyn_cast to Stmt, because it would lead to situation when cast(SomeCFGElement) would return null. I did however add method for downcasting CFGElement object to object of its implemntation class (returned by value, not pointer) with returning invalid object on invalid cast.

I did not add CFGElement for destructors of temporary objects. After giving it some thought I came to a conclusion that it shouldn’t be covered by CFG, because it clearly needs path-sensitiveness.

Cheers
Marcin

Should I split it and send it cfe-commits with some comments?

Yes, I would like we settle down the CFGElement part first, then the CFG construction part.

W dniu 6 września 2010 07:49 użytkownik Zhongxing Xu <xuzhongxing@gmail.com> napisał:

2010/9/6 Marcin Świderski <marcin.sfider@gmail.com>

W dniu 6 września 2010 05:37 użytkownik Zhongxing Xu <xuzhongxing@gmail.com> napisał:

Hi Marcin,

Thank you for working on this. This patch is really big. I suggest we focus on the ‘CFGElement’ aspect in the first patch, and put the ‘LocalScope’ things in later patches.

About CFGElement, I think we an avoid using two pointers in it. Currently the second pointer data is used for 'the statement that triggers the dtor. I think we don’t need a pointer for it. The position where the CFGElement resides is the place where it is triggered. It is not triggered by a statement.

Then for the rest one pointer, we let it point to the VarDecl for AutomaticObjectDtor, to the initializer for Base and Member dtors, to CXXBindTemporaryExpr for temporary dtor.

2010/9/3 Marcin Świderski <marcin.sfider@gmail.com>

Hi

I’m sending a patch with implementation of C++

  • initializers from constructor initialization list,
  • implicit destructors for objects with automatic storage duration.

For destructors I’ve taken care of:

  • block local scopes,
  • if/switch/for/while/do local scopes (in case there’s no block),
  • if/switch/for/while condition variables,
  • catch exception variable,
  • temporaries bound to const reference.

Is there something that I’ve missed?

As it have been suggested I’ve created hierarchy of CFGElements. Currently there’re two unused types for implicit destructor calls in destructor. I’ve did not revert ability of CFGElement to cast/dyn_cast to Stmt, because it would lead to situation when cast(SomeCFGElement) would return null. I did however add method for downcasting CFGElement object to object of its implemntation class (returned by value, not pointer) with returning invalid object on invalid cast.

I did not add CFGElement for destructors of temporary objects. After giving it some thought I came to a conclusion that it shouldn’t be covered by CFG, because it clearly needs path-sensitiveness.

Cheers
Marcin

Should I split it and send it cfe-commits with some comments?

Yes, I would like we settle down the CFGElement part first, then the CFG construction part.

I won’t be able to prepare patches before the weekend. I hope it won’t be much of a problem to anyone.

As I understand each patch should introduce chunk of code that would not break the compilation?

Cheers
Marcin

2010/9/7 Marcin Świderski <marcin.sfider@gmail.com>

W dniu 6 września 2010 07:49 użytkownik Zhongxing Xu <xuzhongxing@gmail.com> napisał:

2010/9/6 Marcin Świderski <marcin.sfider@gmail.com>

W dniu 6 września 2010 05:37 użytkownik Zhongxing Xu <xuzhongxing@gmail.com> napisał:

Hi Marcin,

Thank you for working on this. This patch is really big. I suggest we focus on the ‘CFGElement’ aspect in the first patch, and put the ‘LocalScope’ things in later patches.

About CFGElement, I think we an avoid using two pointers in it. Currently the second pointer data is used for 'the statement that triggers the dtor. I think we don’t need a pointer for it. The position where the CFGElement resides is the place where it is triggered. It is not triggered by a statement.

Then for the rest one pointer, we let it point to the VarDecl for AutomaticObjectDtor, to the initializer for Base and Member dtors, to CXXBindTemporaryExpr for temporary dtor.

2010/9/3 Marcin Świderski <marcin.sfider@gmail.com>

Hi

I’m sending a patch with implementation of C++

  • initializers from constructor initialization list,
  • implicit destructors for objects with automatic storage duration.

For destructors I’ve taken care of:

  • block local scopes,
  • if/switch/for/while/do local scopes (in case there’s no block),
  • if/switch/for/while condition variables,
  • catch exception variable,
  • temporaries bound to const reference.

Is there something that I’ve missed?

As it have been suggested I’ve created hierarchy of CFGElements. Currently there’re two unused types for implicit destructor calls in destructor. I’ve did not revert ability of CFGElement to cast/dyn_cast to Stmt, because it would lead to situation when cast(SomeCFGElement) would return null. I did however add method for downcasting CFGElement object to object of its implemntation class (returned by value, not pointer) with returning invalid object on invalid cast.

I did not add CFGElement for destructors of temporary objects. After giving it some thought I came to a conclusion that it shouldn’t be covered by CFG, because it clearly needs path-sensitiveness.

Cheers
Marcin

Should I split it and send it cfe-commits with some comments?

Yes, I would like we settle down the CFGElement part first, then the CFG construction part.

I won’t be able to prepare patches before the weekend. I hope it won’t be much of a problem to anyone.

As I understand each patch should introduce chunk of code that would not break the compilation?

Cheers
Marcin

Hi Marcin,

I will try to review this one, since splitting this patch may cause much trouble for you.

I implemented a minimum patch that set up the CFGElement hierarchy based on Marcin’s patch.

I use 2 pointers directly in CFGElement, because this make logic simpler and we can optimize it anytime later.
The first pointer’s integer bits are used to mark the main kind of the CFGElement. The second are used to mark the kind of dtors.

Scopes are removed since we haven’t decided if to include it. Other parts are necessary adaption to the new CFGElement interface.

This should be easier to review.

2010/9/7 Zhongxing Xu <xuzhongxing@gmail.com>

cfg.patch (16.1 KB)

W dniu 7 września 2010 07:22 użytkownik Zhongxing Xu <xuzhongxing@gmail.com> napisał:

I implemented a minimum patch that set up the CFGElement hierarchy based on Marcin’s patch.

I use 2 pointers directly in CFGElement, because this make logic simpler and we can optimize it anytime later.
The first pointer’s integer bits are used to mark the main kind of the CFGElement. The second are used to mark the kind of dtors.

Scopes are removed since we haven’t decided if to include it. Other parts are necessary adaption to the new CFGElement interface.

This should be easier to review.

Thanks for doing this, I really appreciate. One thing I would add to this patch, that won’t complicate it but will make code clearer, is a rename in CFGBlock from StatementList to ElementList and Statements to Elements.

2010/9/7 Marcin Świderski <marcin.sfider@gmail.com>

W dniu 7 września 2010 07:22 użytkownik Zhongxing Xu <xuzhongxing@gmail.com> napisał:

I implemented a minimum patch that set up the CFGElement hierarchy based on Marcin’s patch.

I use 2 pointers directly in CFGElement, because this make logic simpler and we can optimize it anytime later.
The first pointer’s integer bits are used to mark the main kind of the CFGElement. The second are used to mark the kind of dtors.

Scopes are removed since we haven’t decided if to include it. Other parts are necessary adaption to the new CFGElement interface.

This should be easier to review.

Thanks for doing this, I really appreciate. One thing I would add to this patch, that won’t complicate it but will make code clearer, is a rename in CFGBlock from StatementList to ElementList and Statements to Elements.

OK, patch updated.

cfg.patch (20.1 KB)

Comments inline below on the patch. Is the plan to still ad the CFGElements for the destructors of temporary objects? Overall, this is truly fantastic work. Sorry I took so long to look it over.

One thing that would help is a general big comment (above LocalScopes perhaps) that explains the algorithm for adding destructors. I was able to tease it out of the code, but a high-level description would be really useful.

One thing that occurred to me that while I argued against putting the scope information in the CFG itself, it looks like we do all the work to reconstruct scope information (via the LocalScope objects). Perhaps that information should be exposed as a public API? Just thinking (and not something to immediately consider).

Index: include/clang/Analysis/FlowSensitive/DataflowSolver.h

Hi Marcin,

I suggest we fix the CFGElement part of design in the first patches, and leave the CFG building processes later. Attached is my patch based on your proposal. The differences are:

  • I made two pointers explicit in the CFGElement since they are simpler and as I calculated before they won’t take too much memory, and we can optimize them any time later.

  • I added dtors for temporaries. This is truly needed as we don’t want to reason about it every time we need it.

What do you think?

2010/9/3 Marcin Świderski <marcin.sfider@gmail.com>

cfg.patch (19.4 KB)

I agree with Zhongxing here. Let's gradual tackle this in smaller pieces. I really had difficulty absorbing the entire patch, and I think it's important we all understand the implications of these changes from all angles.

Thanks for looking through the patch. Commented on some of your comments inline.

W dniu 15 września 2010 06:52 użytkownik Ted Kremenek <kremenek@apple.com> napisał:

Hi

I’m sending a patch with implementation of C++

  • initializers from constructor initialization list,
  • implicit destructors for objects with automatic storage duration.

For destructors I’ve taken care of:

  • block local scopes,
  • if/switch/for/while/do local scopes (in case there’s no block),
  • if/switch/for/while condition variables,
  • catch exception variable,
  • temporaries bound to const reference.

Is there something that I’ve missed?

As it have been suggested I’ve created hierarchy of CFGElements. Currently there’re two unused types for implicit destructor calls in destructor. I’ve did not revert ability of CFGElement to cast/dyn_cast to Stmt, because it would lead to situation when cast(SomeCFGElement) would return null. I did however add method for downcasting CFGElement object to object of its implemntation class (returned by value, not pointer) with returning invalid object on invalid cast.

I did not add CFGElement for destructors of temporary objects. After giving it some thought I came to a conclusion that it shouldn’t be covered by CFG, because it clearly needs path-sensitiveness.

Cheers
Marcin

Comments inline below on the patch. Is the plan to still ad the CFGElements for the destructors of temporary objects? Overall, this is truly fantastic work. Sorry I took so long to look it over.

One thing that would help is a general big comment (above LocalScopes perhaps) that explains the algorithm for adding destructors. I was able to tease it out of the code, but a high-level description would be really useful.

One thing that occurred to me that while I argued against putting the scope information in the CFG itself, it looks like we do all the work to reconstruct scope information (via the LocalScope objects). Perhaps that information should be exposed as a public API? Just thinking (and not something to immediately consider).

This could be done. However LocalScope graph construction process works only on variables that will need destructor to be called. Adding all variables for creating full scope info could break the graph. I’ll include comment in code describing the process, so it will be clear why.

Index: include/clang/Analysis/FlowSensitive/DataflowSolver.h

— include/clang/Analysis/FlowSensitive/DataflowSolver.h (revision 112851)
+++ include/clang/Analysis/FlowSensitive/DataflowSolver.h (working copy)
@@ -273,8 +273,15 @@
void ProcessBlock(const CFGBlock* B, bool recordStmtValues,
dataflow::forward_analysis_tag) {

  • for (StmtItr I=ItrTraits::StmtBegin(B), E=ItrTraits::StmtEnd(B); I!=E;++I)
  • ProcessStmt(*I, recordStmtValues, AnalysisDirTag());
  • for (StmtItr I=ItrTraits::StmtBegin(B), E=ItrTraits::StmtEnd(B); I!=E;++I) {
  • if (const CFGStmt* SE = dyn_cast(&*I))
  • ProcessStmt(*SE, recordStmtValues, AnalysisDirTag());
  • // FIXME: (CFGElement) Should handle other cases.
  • else if (const CFGInitializer* IE = dyn_cast(&*I))
  • (void)IE;
  • else if (const CFGImplicitDtor* DE = dyn_cast(&*I))
  • (void)DE;
  • }

TF.VisitTerminator(const_cast<CFGBlock*>(B));
}
@@ -284,8 +291,15 @@

TF.VisitTerminator(const_cast<CFGBlock*>(B));

  • for (StmtItr I=ItrTraits::StmtBegin(B), E=ItrTraits::StmtEnd(B); I!=E;++I)
  • ProcessStmt(*I, recordStmtValues, AnalysisDirTag());
  • for (StmtItr I=ItrTraits::StmtBegin(B), E=ItrTraits::StmtEnd(B); I!=E;++I) {
  • if (const CFGStmt* SE = dyn_cast(&*I))
  • ProcessStmt(*SE, recordStmtValues, AnalysisDirTag());
  • // FIXME: (CFGElement) Should handle other cases.
  • else if (const CFGInitializer* IE = dyn_cast(&*I))
  • (void)IE;
  • else if (const CFGImplicitDtor* DE = dyn_cast(&*I))
  • (void)DE;
  • }
    }

It would probably be more succinct to introduce a ProcessCFGElement(), which then in turn called ProcessStmt(). That way the two loops look like:

  • for (StmtItr I=ItrTraits::StmtBegin(B), E=ItrTraits::StmtEnd(B); I!=E;++I)
  • ProcessCFGElement(*I, recordStmtValues, AnalysisDirTag());

and then have ProcessCFGElement do the dispatch logic. Right now this is basically copy-paste for the forward/backward analyses.

void ProcessStmt(const Stmt* S, bool record, dataflow::forward_analysis_tag) {
Index: include/clang/Analysis/Visitors/CFGRecStmtVisitor.h

— include/clang/Analysis/Visitors/CFGRecStmtVisitor.h (revision 112851)
+++ include/clang/Analysis/Visitors/CFGRecStmtVisitor.h (working copy)
@@ -47,6 +47,9 @@

// Defining operator() allows the visitor to be used as a C++ style functor.
void operator()(Stmt* S) { static_cast<ImplClass*>(this)->BlockStmt_Visit(S);}

  • void operator()(CFGElement E) {
  • static_cast<ImplClass*>(this)->VisitCFGElement(E);
  • }
    };

Looks good.

} // end namespace clang
Index: include/clang/Analysis/Visitors/CFGStmtVisitor.h

— include/clang/Analysis/Visitors/CFGStmtVisitor.h (revision 112851)
+++ include/clang/Analysis/Visitors/CFGStmtVisitor.h (working copy)
@@ -61,6 +61,16 @@
RetTy VisitConditionVariableInit(Stmt *S) {
return RetTy();
}
+

  • /// VisitCFGElement - Handle CFGElements: Statements, Initializers and
  • /// ImplicitDtors. Those CFGElements act like statements from CFG point of
  • /// view.
  • RetTy VisitCFGElement(CFGElement E) {
  • if (CFGStmt S = E.getAs())
  • return BlockStmt_Visit(S);
  • // FIXME: (CFGElement) Handle Initializers and ImplicitDtors
  • return RetTy();
  • }

Looks good.

/// BlockVisit_XXX - Visitor methods for visiting the “root” statements in
/// CFGBlocks. Root statements are the statements that appear explicitly in
Index: include/clang/Analysis/CFG.h

— include/clang/Analysis/CFG.h (revision 112851)
+++ include/clang/Analysis/CFG.h (working copy)
@@ -28,30 +28,234 @@
}
namespace clang {
class Decl;

  • class VarDecl;
    class Stmt;
    class Expr;
  • class CXXBaseOrMemberInitializer;
  • class CXXBindTemporaryExpr;
  • class CXXDestructorDecl;
  • class CXXExprWithTemporaries;
  • class FieldDecl;
  • class TypeSourceInfo;
    class CFG;
  • class CFGBlock;
    class PrinterHelper;
    class LangOptions;
    class ASTContext;

-/// CFGElement - Represents a top-level expression in a basic block.
+/// CFGElement - Represents a top-level expression or other language construct
+/// in a basic block.
class CFGElement {

  • llvm::PointerIntPair<Stmt *, 2> Data;
    +protected:
  • // ExternalData - External data sturcture used by implemntations to store
  • // more data then one pointer.
  • struct ExternalData {
  • llvm::PointerIntPair<void*, 2> PrimData;
  • llvm::PointerIntPair<void*, 2> SecData;
  • };
  • ExternalData& getExternalData() {
  • return static_cast<ExternalData>(Data.getPointer());
  • }
  • const ExternalData& getExternalData() const {
  • return static_cast<ExternalData>(Data.getPointer());
  • }
  • llvm::PointerIntPair<void*, 2> Data;

public:

  • enum Type { StartScope, EndScope };
  • explicit CFGElement() {}
  • CFGElement(Stmt *S, bool lvalue) : Data(S, lvalue ? 1 : 0) {}
  • CFGElement(Stmt *S, Type t) : Data(S, t == StartScope ? 2 : 3) {}
  • Stmt *getStmt() const { return Data.getPointer(); }
  • bool asLValue() const { return Data.getInt() == 1; }
  • bool asStartScope() const { return Data.getInt() == 2; }
  • bool asEndScope() const { return Data.getInt() == 3; }
  • bool asDtor() const { return Data.getInt() == 4; }
  • enum Kind {
  • Statement,
  • LvalStatement,
  • BEGIN_STATEMENTS = Statement,
  • END_STATEMENTS = LvalStatement,
  • Initializer,
  • AutomaticObjectDtor,
  • BaseDtor,
  • MemberDtor,
  • BEGIN_DTORS = AutomaticObjectDtor,
  • END_DTORS = MemberDtor,
  • BEGIN_EXTERNALS = BEGIN_DTORS
  • };
  • // Construct invalid CFGElement. Each implementation will provide default
  • // constructor for constructing invalid instance. Invalid instance won’t
  • // preserve it’s kind.
  • CFGElement() {}
  • Kind getKind() const {
  • if (Data.getInt() != BEGIN_EXTERNALS)
  • return Kind(Data.getInt());
  • return Kind(BEGIN_EXTERNALS + getExternalData().PrimData.getInt());
  • }
  • bool isValid() const { return Data.getPointer(); }
  • operator bool() const { return isValid(); };
  • bool isStmt() const {
  • Kind K = Kind(Data.getInt());
  • return K >= BEGIN_STATEMENTS && K <= END_STATEMENTS;
  • }
  • bool isInitializer() const {
  • return Data.getInt() == Initializer;
  • }
  • bool isImplicitDtor() const {
  • if (Data.getInt() != BEGIN_EXTERNALS)
  • return false;
  • Kind K = Kind(BEGIN_EXTERNALS + getExternalData().PrimData.getInt());
  • return K >= BEGIN_DTORS && K <= END_DTORS;
  • }
  • template ElemTy getAs() const {
  • if (llvm::isa(this))
  • return static_cast<const ElemTy>(this);

Could we use llvm::dyn_cast<> here? You already implement the classof() functions.

You mean?:

if (ElemTy *E = llvm::dyn_cast(this))
return *E;

We could, no real difference for me.

  • return ElemTy();

  • }

  • static bool classof(const CFGElement* E) { return true; }

+protected:

  • CFGElement(Kind K, void* P)
  • : Data(P, K) {
  • assert (K < BEGIN_EXTERNALS && “CFGElement needs external data.”);
  • }
  • CFGElement(Kind K, void* P, void* S, llvm::BumpPtrAllocator& A)
  • : Data(new (A.Allocate()) ExternalData(), BEGIN_EXTERNALS) {
  • assert (K >= BEGIN_EXTERNALS && “CFGElement doesn’t need external data.”);
  • getExternalData().PrimData.setInt(K - BEGIN_EXTERNALS);
  • getExternalData().PrimData.setPointer(P);
  • getExternalData().SecData.setPointer(S);
  • }
    +};

Looks great.

+/// CFGStmt - Represents a top-level expression in a basic block.
+class CFGStmt : public CFGElement {
+public:

  • CFGStmt() {}
  • CFGStmt(Stmt* S, bool asLValue)
  • : CFGElement(asLValue ? LvalStatement : Statement, S) {}
  • Stmt* getStmt() const { return static_cast<Stmt*>(Data.getPointer()); }
    operator Stmt*() const { return getStmt(); }
  • operator bool() const { return getStmt() != 0; }
  • bool asLValue() const { return getKind() == LvalStatement; }
  • static bool classof(const CFGElement* E) { return E->isStmt(); }
    };

Looks great.

+/// CFGInitializer - Represents a C++ initializer in constructor initialization
+/// list.
+class CFGInitializer : public CFGElement {
+public:

  • CFGInitializer() {}
  • CFGInitializer(CXXBaseOrMemberInitializer* I)
  • : CFGElement(Initializer, I) {}
  • CXXBaseOrMemberInitializer* getInitializer() const {
  • return static_cast<CXXBaseOrMemberInitializer*>(Data.getPointer());
  • }
  • operator CXXBaseOrMemberInitializer*() const { return getInitializer(); }
  • static bool classof(const CFGElement* E) { return E->isInitializer(); }
    +};

Looks great.

+/// CFGImplicitDtor - Represents implicit call to C++ object’s destructor. This
+/// is a base class for more specific implementations.
+class CFGImplicitDtor : public CFGElement {
+public:

  • CFGImplicitDtor() {}
  • CXXDestructorDecl* getDestructor() const;
  • SourceLocation getCallLoc() const;
  • SourceLocation getObjectLoc() const;

Please add doxygen comments for these two methods that return SourceLocations.

  • static bool classof(const CFGElement* E) { return E->isImplicitDtor(); }

+protected:

  • CFGImplicitDtor(Kind K, void* P, void* S, llvm::BumpPtrAllocator& A)
  • : CFGElement(K, P, S, A) {}
    +};

Looks good.

+/// CFGAutomaticObjDtor - Represents implicit call to destructor of C++ object
+/// with automatic storage duration.
+class CFGAutomaticObjDtor : public CFGImplicitDtor {
+public:

  • CFGAutomaticObjDtor() {}
  • CFGAutomaticObjDtor(VarDecl* VD, Stmt* S, llvm::BumpPtrAllocator& A)
  • : CFGImplicitDtor(AutomaticObjectDtor, VD, S, A) {}
  • CXXDestructorDecl* getDestructor() const;
  • SourceLocation getCallLoc() const;
  • SourceLocation getObjectLoc() const;

Please comment that we use our own dynamic-dispatch here for these methods since they are non-virtual and we don’t want a vptr. When I first looked at this I thought this was a bug. I think it is very important to document the design decision here.

  • VarDecl* getVarDecl() const {
  • return static_cast<VarDecl*>(getExternalData().PrimData.getPointer());
  • }

Looks great.

  • // Get statement end of which triggered the destructor call.
  • Stmt* getTriggerStmt() const {
  • return static_cast<Stmt*>(getExternalData().SecData.getPointer());
  • }

Looks great. Make the comment a doxygen comment.

  • static bool classof(const CFGElement* E) {
  • return E->getKind() == AutomaticObjectDtor;
  • }
    +};

+/// CFGBaseDtor - Represents implicit call to destructor of base class in
+/// C++ class destructor.
+class CFGBaseDtor : public CFGImplicitDtor {
+public:

  • CFGBaseDtor() {}
  • CXXDestructorDecl* getDestructor() const;
  • SourceLocation getCallLoc() const;
  • SourceLocation getObjectLoc() const;

Please comment that we use our own dynamic-dispatch here for these methods since they are non-virtual and we don’t want a vptr. When I first looked at this I thought this was a bug.

+};
+
+/// CFGMemberDtor - Represents implicit call to destructor of member in
+/// C++ class destructor.
+class CFGMemberDtor : public CFGImplicitDtor {
+public:

  • CFGMemberDtor() {}
  • CXXDestructorDecl* getDestructor() const;
  • SourceLocation getCallLoc() const;
  • SourceLocation getObjectLoc() const;

Please comment that we use our own dynamic-dispatch here for these methods since they are non-virtual and we don’t want a vptr. When I first looked at this I thought this was a bug.

+};
+
+// Dispatch methods from CFGImplicitDtor’s interface.
+#define CFG_IMPLICIT_DTOR_DISPATCH(Meth) \

  • assert (Data.getInt() == BEGIN_EXTERNALS && “Bad CFGElement kind”); \
  • Kind K = Kind(getExternalData().PrimData.getInt() + BEGIN_EXTERNALS); \
  • switch (K) { \
  • case AutomaticObjectDtor: \
  • return static_cast<const CFGAutomaticObjDtor*>(this)->Meth(); \
  • case BaseDtor: \
  • return static_cast<const CFGBaseDtor*>(this)->Meth(); \
  • default: \
  • assert (K == MemberDtor && “CFGElement not of a destructor kind”);\
  • return static_cast<const CFGMemberDtor*>(this)->Meth(); \
  • }

+inline CXXDestructorDecl* CFGImplicitDtor::getDestructor() const {

  • CFG_IMPLICIT_DTOR_DISPATCH(getDestructor);
    +}
    +inline SourceLocation CFGImplicitDtor::getCallLoc() const {
  • CFG_IMPLICIT_DTOR_DISPATCH(getCallLoc);
    +}
    +inline SourceLocation CFGImplicitDtor::getObjectLoc() const {
  • CFG_IMPLICIT_DTOR_DISPATCH(getObjectLoc);
    +}

+#undef CFG_IMPLICIT_DTOR_DISPATCH
+

Cute.

/// CFGBlock - Represents a single basic block in a source-level CFG.
/// It consists of:
///
@@ -77,11 +281,11 @@
/// &&, || expression that uses result of && or ||, RHS
///
class CFGBlock {

  • class StatementList {
  • class ElementList {
    typedef BumpVector ImplTy;
    ImplTy Impl;
    public:
  • StatementList(BumpVectorContext &C) : Impl(C, 4) {}
  • ElementList(BumpVectorContext &C) : Impl(C, 4) {}

Looks good.

typedef std::reverse_iteratorImplTy::iterator iterator;
typedef std::reverse_iteratorImplTy::const_iterator const_iterator;
@@ -89,6 +293,11 @@
typedef ImplTy::const_iterator const_reverse_iterator;

void push_back(CFGElement e, BumpVectorContext &C) { Impl.push_back(e, C); }

  • reverse_iterator insert(reverse_iterator I, size_t Cnt, CFGElement E,
  • BumpVectorContext& C) {
  • return Impl.insert(I, Cnt, E, C);
  • }

CFGElement front() const { return Impl.back(); }
CFGElement back() const { return Impl.front(); }

@@ -110,8 +319,8 @@
bool empty() const { return Impl.empty(); }
};

  • /// Stmts - The set of statements in the basic block.
  • StatementList Stmts;
  • /// Elements - The set of elements in the basic block.
  • ElementList Elements;

/// Label - An (optional) label that prefixes the executable
/// statements in the block. When this variable is non-NULL, it is
@@ -140,33 +349,33 @@

public:
explicit CFGBlock(unsigned blockid, BumpVectorContext &C)

  • : Stmts(C), Label(NULL), Terminator(NULL), LoopTarget(NULL),
  • : Elements(C), Label(NULL), Terminator(NULL), LoopTarget(NULL),
    BlockID(blockid), Preds(C, 1), Succs(C, 1) {}
    ~CFGBlock() {}
  • // Statement iterators
  • typedef StatementList::iterator iterator;
  • typedef StatementList::const_iterator const_iterator;
  • typedef StatementList::reverse_iterator reverse_iterator;
  • typedef StatementList::const_reverse_iterator const_reverse_iterator;
  • // Element iterators
  • typedef ElementList::iterator iterator;
  • typedef ElementList::const_iterator const_iterator;
  • typedef ElementList::reverse_iterator reverse_iterator;
  • typedef ElementList::const_reverse_iterator const_reverse_iterator;
  • CFGElement front() const { return Stmts.front(); }
  • CFGElement back() const { return Stmts.back(); }
  • CFGElement front() const { return Elements.front(); }
  • CFGElement back() const { return Elements.back(); }
  • iterator begin() { return Stmts.begin(); }
  • iterator end() { return Stmts.end(); }
  • const_iterator begin() const { return Stmts.begin(); }
  • const_iterator end() const { return Stmts.end(); }
  • iterator begin() { return Elements.begin(); }
  • iterator end() { return Elements.end(); }
  • const_iterator begin() const { return Elements.begin(); }
  • const_iterator end() const { return Elements.end(); }
  • reverse_iterator rbegin() { return Stmts.rbegin(); }
  • reverse_iterator rend() { return Stmts.rend(); }
  • const_reverse_iterator rbegin() const { return Stmts.rbegin(); }
  • const_reverse_iterator rend() const { return Stmts.rend(); }
  • reverse_iterator rbegin() { return Elements.rbegin(); }
  • reverse_iterator rend() { return Elements.rend(); }
  • const_reverse_iterator rbegin() const { return Elements.rbegin(); }
  • const_reverse_iterator rend() const { return Elements.rend(); }
  • unsigned size() const { return Stmts.size(); }
  • bool empty() const { return Stmts.empty(); }
  • unsigned size() const { return Elements.size(); }
  • bool empty() const { return Elements.empty(); }
  • CFGElement operator[](size_t i) const { return Stmts[i]; }
  • CFGElement operator[](size_t i) const { return Elements[i]; }

All looks great.

// CFG iterators
typedef AdjacentBlocks::iterator pred_iterator;
@@ -240,14 +449,24 @@
}

void appendStmt(Stmt* Statement, BumpVectorContext &C, bool asLValue) {

  • Stmts.push_back(CFGElement(Statement, asLValue), C);
  • }
  • void StartScope(Stmt* S, BumpVectorContext &C) {
  • Stmts.push_back(CFGElement(S, CFGElement::StartScope), C);
  • Elements.push_back(CFGStmt(Statement, asLValue), C);
    }
  • void EndScope(Stmt* S, BumpVectorContext &C) {
  • Stmts.push_back(CFGElement(S, CFGElement::EndScope), C);
  • void appendInitializer(CXXBaseOrMemberInitializer* I, BumpVectorContext& C) {
  • Elements.push_back(CFGInitializer(I), C);
    }
  • // Destructors must be inserted in reversed order. So insertion is in two
  • // steps. First we prepare space for some number of elements, then we insert
  • // the elements begining at the last position in prepared space.
  • iterator beginAutomaticObjDtorsInsert(iterator I, size_t Cnt,
  • BumpVectorContext& C) {
  • return iterator(Elements.insert(I.base(), Cnt, CFGElement(), C));
  • }
  • iterator insertAutomaticObjDtor(iterator I, VarDecl* VD, Stmt* S,
  • BumpVectorContext& C) {
  • *I = CFGAutomaticObjDtor(VD, S, C.getAllocator());
  • return ++I;
  • }

Looks great.

};

/// createBlock - Create a new block in the CFG. The CFG owns the block;
/// the caller should not directly free it.
@@ -321,11 +568,12 @@
//===--------------------------------------------------------------------===//

template

  • void VisitBlockStmts(CALLBACK& O) const {
  • void VisitCFGElements(CALLBACK& O) const {
    for (const_iterator I=begin(), E=end(); I != E; ++I)
    for (CFGBlock::const_iterator BI=(*I)->begin(), BE=(*I)->end();
  • BI != BE; ++BI)
  • BI != BE; ++BI) {
    O(*BI);
  • }
    }

Cool.

//===--------------------------------------------------------------------===//
@@ -398,18 +646,6 @@

namespace llvm {

-/// Implement simplify_type for CFGElement, so that we can dyn_cast from
-/// CFGElement to a specific Stmt class.
-template <> struct simplify_type {

  • typedef ::clang::Stmt* SimpleType;
  • static SimpleType getSimplifiedValue(const ::clang::CFGElement &Val) {
  • return Val.getStmt();
  • }
    -};

-template <> struct simplify_type< ::clang::CFGElement>

  • : public simplify_type {};

Makes sense to remove, since we have getAs<> now and a much richer CFGElement hierarchy.

// Traits for: CFGBlock

template <> struct GraphTraits< ::clang::CFGBlock* > {
Index: include/clang/Analysis/ProgramPoint.h

— include/clang/Analysis/ProgramPoint.h (revision 112851)
+++ include/clang/Analysis/ProgramPoint.h (working copy)
@@ -117,10 +117,6 @@
const CFGBlock* B = getBlock();
return B->empty() ? CFGElement() : B->front();
}

  • const Stmt *getFirstStmt() const {
  • return getFirstElement().getStmt();
  • }

static bool classof(const ProgramPoint* Location) {
return Location->getKind() == BlockEntranceKind;
@@ -136,11 +132,6 @@
return reinterpret_cast<const CFGBlock*>(getData1());
}

  • const Stmt* getLastStmt() const {
  • const CFGBlock* B = getBlock();
  • return B->empty() ? CFGElement() : B->back();
  • }

const Stmt* getTerminator() const {
return getBlock()->getTerminator();
}

Index: include/clang/Checker/PathSensitive/GRCoreEngine.h

— include/clang/Checker/PathSensitive/GRCoreEngine.h (revision 112851)
+++ include/clang/Checker/PathSensitive/GRCoreEngine.h (working copy)
@@ -259,7 +259,9 @@

/// getStmt - Return the current block-level expression associated with
/// this builder.

  • const Stmt* getStmt() const { return B[Idx]; }
  • const Stmt* getStmt() const {
  • return B[Idx].isStmt() ? B[Idx].getAs().getStmt() : NULL;
  • }

Looks great.

/// getBlock - Return the CFGBlock associated with the block-level expression
/// of this builder.
Index: lib/Analysis/CFG.cpp

— lib/Analysis/CFG.cpp (revision 112851)
+++ lib/Analysis/CFG.cpp (working copy)
@@ -52,6 +52,113 @@
Kind k;
};

+/// LocalScope - List of automatic variables declared in current
+/// local scope and position in previous local scope.
+class LocalScope {
+public:

  • typedef std::vector<VarDecl*> AutomaticVarsTy;

A SmallVector might reduce overall malloc() traffic.

  • typedef AutomaticVarsTy::const_reverse_iterator AutomaticVarsIter;
  • class const_iterator {
  • const LocalScope* Scope;
  • AutomaticVarsIter VarIter;
  • // Guard for “valid” invalid operator.
  • static LocalScope GuardScope;
  • public:
  • /// Create invalid iterator.
  • const_iterator()
  • : Scope(&GuardScope), VarIter(Scope->Vars.rbegin()) {}
  • /// Create valid iterator.
  • const_iterator(const LocalScope* Scope, AutomaticVarsIter VarIter)
  • : Scope(Scope), VarIter(VarIter) {}
  • VarDecl* operator*() const { return *VarIter; }
  • VarDecl* const* operator->() const { return &*VarIter; }
  • const_iterator& operator++() {
  • ++VarIter;
  • if (VarIter == Scope->Vars.rend())
  • *this = Scope->Prev;
  • return *this;
  • }
  • const_iterator operator++(int) {
  • const_iterator prev = *this;
  • ++*this;
  • return prev;
  • }
  • bool operator==(const const_iterator& rhs) const {
  • return Scope == rhs.Scope && VarIter == rhs.VarIter;
  • }
  • bool operator!=(const const_iterator& rhs) const {
  • return !(*this == rhs);
  • }
  • int distance(const_iterator L);
  • friend int distance(const_iterator F, const_iterator L);
  • };
  • friend class const_iterator;

+private:

  • // Automatic variables in order of declaration.
  • AutomaticVarsTy Vars;
  • // Iterator to variable in previous scope that was declared just before
  • // begin of this scope.
  • const_iterator Prev;
  • // Creates invalid scope that can serve as guard scope for iterator.
  • LocalScope()
  • : Vars(1, NULL)
  • , Prev(this, Vars.rbegin()) {}

+public:

  • LocalScope(const_iterator P)
  • : Vars()
  • , Prev(P) {}
  • // Begin of scope in direction of CFG building (backwards).
  • const_iterator begin() const { return const_iterator(this, Vars.rbegin()); }
  • void addVar(VarDecl* VD) {
  • Vars.push_back(VD);
  • }
    +};

+LocalScope LocalScope::const_iterator::GuardScope;
+
+/// distance - Calculates distance from F to L. L must be reachable from F (with
+/// use of ++ operator). Cost of calculating the distance is linear w.r.t.
+/// number between F and L.
+int distance(LocalScope::const_iterator F, LocalScope::const_iterator L) {

  • return F.distance(L);
    +}

+int LocalScope::const_iterator::distance(LocalScope::const_iterator L) {

  • int D = 0;
  • const_iterator F = *this;
  • while (F.Scope != L.Scope) {
  • assert (F.Scope != &GuardScope
  • && “L iterator is not reachable from F iterator.”);
  • D += F.Scope->Vars.rend() - F.VarIter;
  • F = F.Scope->Prev;
  • }
  • D += L.VarIter - F.VarIter;
  • return D;
    +}

+struct BlockScopePosPair {

  • BlockScopePosPair() {}
  • BlockScopePosPair(CFGBlock* B, LocalScope::const_iterator S)
  • : Block(B), ScopePos(S) {}
  • CFGBlock* Block;
  • LocalScope::const_iterator ScopePos;
    +};

/// CFGBuilder - This class implements CFG construction from an AST.
/// The builder is stateful: an instance of the builder should be used to only
/// construct a single CFG.
@@ -67,41 +174,59 @@
/// implicit fall-throughs without extra basic blocks.
///
class CFGBuilder {

  • typedef BlockScopePosPair JumpTarget;
  • typedef BlockScopePosPair JumpSource;

ASTContext *Context;
llvm::OwningPtr cfg;

CFGBlock* Block;
CFGBlock* Succ;

  • CFGBlock* ContinueTargetBlock;
  • CFGBlock* BreakTargetBlock;
  • JumpTarget ContinueJumpTarget;
  • JumpTarget BreakJumpTarget;
    CFGBlock* SwitchTerminatedBlock;
    CFGBlock* DefaultCaseBlock;
    CFGBlock* TryTerminatedBlock;
  • // LabelMap records the mapping from Label expressions to their blocks.
  • typedef llvm::DenseMap<LabelStmt*,CFGBlock*> LabelMapTy;
  • // Current position in local scope.
  • LocalScope::const_iterator ScopePos;
  • // LabelMap records the mapping from Label expressions to their jump targets.
  • typedef llvm::DenseMap<LabelStmt*, JumpTarget> LabelMapTy;
    LabelMapTy LabelMap;

// A list of blocks that end with a “goto” that must be backpatched to their
// resolved targets upon completion of CFG construction.

  • typedef std::vector<CFGBlock*> BackpatchBlocksTy;
  • typedef std::vector BackpatchBlocksTy;
    BackpatchBlocksTy BackpatchBlocks;

// A list of labels whose address has been taken (for indirect gotos).
typedef llvm::SmallPtrSet<LabelStmt*,5> LabelSetTy;
LabelSetTy AddressTakenLabels;

  • // A list of local scopes, which must be destroyed at the end of build
  • // process.
  • typedef std::vector<LocalScope*> LocalScopesTy;
  • LocalScopesTy LocalScopes;

Consider making this a SmallVector to reduce malloc() traffic.

public:
explicit CFGBuilder() : cfg(new CFG()), // crew a new CFG
Block(NULL), Succ(NULL),

  • ContinueTargetBlock(NULL), BreakTargetBlock(NULL),
  • ContinueJumpTarget(), BreakJumpTarget(),
    SwitchTerminatedBlock(NULL), DefaultCaseBlock(NULL),
  • TryTerminatedBlock(NULL) {}
  • TryTerminatedBlock(NULL), ScopePos() {}

  • ~CFGBuilder() {

  • // Clean up local scopes.

  • for (LocalScopesTy::iterator I = LocalScopes.begin(), E = LocalScopes.end()

  • ; I != E; ++I) {

  • delete *I;

Another possibility is to make the LocalScopes be BumpVectors, and BumpPtrAllocate them. Then you don’t need to free them explicitly, and it reduces malloc() traffic.

BumpVector with BumpPtrAllocator handled by CFG or CFGBuilder? I would assume that the first case would be slightly faster, but would leave some unused data after build process.

  • }
  • }

// buildCFG - Used by external clients to construct the CFG.
CFG* buildCFG(const Decl *D, Stmt *Statement, ASTContext *C,

  • bool pruneTriviallyFalseEdges, bool AddEHEdges,
  • bool AddScopes);
  • CFG::BuildOptions BO);

private:
// Visitors to walk an AST and construct the CFG.
@@ -150,37 +275,42 @@
return Block;
}

void autoCreateBlock() { if (!Block) Block = createBlock(); }
CFGBlock createBlock(bool add_successor = true);
bool FinishBlock(CFGBlock
B);
+
CFGBlock *addStmt(Stmt *S) {
return Visit(S, AddStmtChoice::AlwaysAdd);
}

  • CFGBlock addInitializer(CXXBaseOrMemberInitializer I);

  • CFGBlock *addAutomaticObjDtors(LocalScope::const_iterator B,

  • LocalScope::const_iterator E, Stmt* S);

  • // Local scopes creation.

  • LocalScope* createOrReuseLocalScope(LocalScope* Scope);

  • LocalScope* addLocalScopeForStmt(Stmt* S, LocalScope* Scope = NULL);

  • LocalScope* addLocalScopeForDeclStmt(DeclStmt* DS, LocalScope* Scope = NULL);

  • LocalScope* addLocalScopeForVarDecl(VarDecl* VD, LocalScope* Scope = NULL);

  • LocalScope* addLocalScopeAndDtors(Stmt* S, LocalScope* Scope = NULL);

  • // Interface to CFGBlock - adding CFGElements.
    void AppendStmt(CFGBlock *B, Stmt *S,
    AddStmtChoice asc = AddStmtChoice::AlwaysAdd) {
    B->appendStmt(S, cfg->getBumpVectorContext(), asc.asLValue());
    }

  • void appendInitializer(CFGBlock* B, CXXBaseOrMemberInitializer* I) {

  • B->appendInitializer(I, cfg->getBumpVectorContext());

  • }

  • void insertAutomaticObjDtors(CFGBlock* Blk, CFGBlock::iterator I,

  • LocalScope::const_iterator B, LocalScope::const_iterator E, Stmt* S);

  • void appendAutomaticObjDtors(CFGBlock* Blk, LocalScope::const_iterator B,

  • LocalScope::const_iterator E, Stmt* S);

  • void prependAutomaticObjDtorsWithTerminator(CFGBlock* Blk,

  • LocalScope::const_iterator B, LocalScope::const_iterator E);

void AddSuccessor(CFGBlock *B, CFGBlock *S) {
B->addSuccessor(S, cfg->getBumpVectorContext());
}
@@ -207,7 +337,7 @@
/// TryEvaluateBool - Try and evaluate the Stmt and return 0 or 1
/// if we can evaluate to a known value, otherwise return -1.
TryResult TryEvaluateBool(Expr *S) {

  • if (!PruneTriviallyFalseEdges)
  • if (!BuildOpts.PruneTriviallyFalseEdges)
    return TryResult();

Expr::EvalResult Result;
@@ -219,15 +349,7 @@
}

bool badCFG;
};

// FIXME: Add support for dependent-sized array types in C++?
@@ -250,19 +372,17 @@
/// transferred to the caller. If CFG construction fails, this method returns
/// NULL.
CFG* CFGBuilder::buildCFG(const Decl D, Stmt Statement, ASTContext* C,

  • bool pruneTriviallyFalseEdges,
  • bool addehedges, bool AddScopes) {
  • CFG::BuildOptions BO) {
  • AddEHEdges = addehedges;
  • PruneTriviallyFalseEdges = pruneTriviallyFalseEdges;

Context = C;
assert(cfg.get());
if (!Statement)
return NULL;

  • this->AddScopes = AddScopes;
    badCFG = false;
  • BuildOpts = BO;
  • if (!C->getLangOptions().CPlusPlus)
  • BuildOpts.AddImplicitDtors = false;

// Create an empty block that will serve as the exit block for the CFG. Since
// this is the first block added to the CFG, it will be implicitly registered
@@ -274,9 +394,14 @@
// Visit the statements and create the CFG.
CFGBlock* B = addStmt(Statement);

  • // For C++ constructor add initializers to CFG.
    if (const CXXConstructorDecl *CD = dyn_cast_or_null(D)) {
  • // FIXME: Add code for base initializers and member initializers.
  • (void)CD;
  • for (CXXConstructorDecl::init_const_reverse_iterator I = CD->init_rbegin()
  • , E = CD->init_rend(); I != E; ++I) {
  • B = addInitializer(*I);
  • if (badCFG)
  • return NULL;
  • }
    }
    if (!B)
    B = Succ;
    @@ -291,15 +416,17 @@
    for (BackpatchBlocksTy::iterator I = BackpatchBlocks.begin(),
    E = BackpatchBlocks.end(); I != E; ++I ) {
  • CFGBlock* B = *I;
  • CFGBlock* B = I->Block;
    GotoStmt* G = cast(B->getTerminator());
    LabelMapTy::iterator LI = LabelMap.find(G->getLabel());

// If there is no target for the goto, then we are looking at an
// incomplete AST. Handle this by not registering a successor.
if (LI == LabelMap.end()) continue;

  • JumpTarget JT = LI->second;
  • AddSuccessor(B, LI->second);
  • prependAutomaticObjDtorsWithTerminator(B, I->ScopePos, JT.ScopePos);
  • AddSuccessor(B, JT.Block);
    }

// Add successors to the Indirect Goto Dispatch block (if we have one).
@@ -314,7 +441,9 @@
// at an incomplete AST. Handle this by not registering a successor.
if (LI == LabelMap.end()) continue;

  • AddSuccessor(B, LI->second);
  • // FIXME: Are indirect gotos supported in C++? If yes we have to add
  • // additional block for each possible branch with destructors called.
  • AddSuccessor(B, LI->second.Block);

How horrible to consider. I don’t know the answer to this question.

}

Succ = B;
@@ -344,6 +473,148 @@
return true;
}

+/// addInitializer - Add C++ base or member initializer element to CFG.
+CFGBlock* CFGBuilder::addInitializer(CXXBaseOrMemberInitializer* I) {

  • if (!BuildOpts.AddInitializers)
  • return Block;
  • autoCreateBlock();
  • appendInitializer(Block, I);
  • if (!I->getInit())
  • // FIXME: Remove this check if is unnecessery.
  • return Block;
  • return addStmt(I->getInit());
    +}

Looks good.

+CFGBlock* CFGBuilder::addAutomaticObjDtors(LocalScope::const_iterator B,

  • LocalScope::const_iterator E, Stmt* S) {
  • if (!BuildOpts.AddImplicitDtors)
  • return Block;
  • if (B == E)
  • return Block;
  • autoCreateBlock();
  • appendAutomaticObjDtors(Block, B, E, S);
  • return Block;
    +}

Looks good.

+LocalScope* CFGBuilder::createOrReuseLocalScope(LocalScope* Scope) {

  • if (!Scope) {
  • Scope = new LocalScope(ScopePos);
  • LocalScopes.push_back(Scope);
  • }
  • return Scope;
    +}

Consider BumpPtrAllocating LocalScope, since we may create a lot of them.

+LocalScope* CFGBuilder::addLocalScopeForStmt(Stmt* S, LocalScope* Scope) {

  • if (!BuildOpts.AddImplicitDtors)
  • return Scope;
  • // For compound statement we will be creating explicit scope.
  • if (CompoundStmt* CS = dyn_cast(S)) {
  • for (CompoundStmt::body_iterator BI = CS->body_begin(), BE = CS->body_end()
  • ; BI != BE; ++BI) {
  • if (DeclStmt* DS = dyn_cast(*BI))
  • Scope = addLocalScopeForDeclStmt(DS, Scope);
  • }
  • // For any other statement scope will be implicit and as such will be
  • // interesting only for DeclStmt.
  • } else if (DeclStmt* DS = dyn_cast(S)) {
  • Scope = addLocalScopeForDeclStmt(DS, Scope);
  • }
  • return Scope;
    +}

Looks great.

+LocalScope* CFGBuilder::addLocalScopeForDeclStmt(DeclStmt* DS,

  • LocalScope* Scope) {
  • if (!BuildOpts.AddImplicitDtors)
  • return Scope;
  • for (DeclStmt::decl_iterator DI = DS->decl_begin(), DE = DS->decl_end()
  • ; DI != DE; ++DI) {
  • if (VarDecl* VD = dyn_cast(*DI))
  • Scope = addLocalScopeForVarDecl(VD, Scope);
  • }
  • return Scope;
    +}

Looks great. Does each VarDecl get it’s own scope?

Scope variable passed and returned from addLocalScopeForVarDecl is for lazy creating new scope when needed. So when created it will be used for more variables, but we won’t create scope if no variables will be added to it.