CFG: scopes and destructors

Hi,

I’m trying to figure out how to add scopes and destructors to CFG. As I understand, what we need is:

For scopes:

  • CFGElement kinds for start/end of scopes that would allow for tracking entering and leaving scopes.
  • If there are no variable declarations in scope, scope start/end elements should be omited.
  • Scope end element should be omited if there are no variable declarations before it in the scope.

For destructors:

  • CFGElement kind for call to destructor.

The main problem with implementing this in CFGBuilder is that the builder is processing AST backwards. When the builder have to decide if it should add a scope end element or a destructor it doesn’t know of variables declared earlier in the scope. To solve this there have to be an additional step of walking e.g. a block of statements (compound statement) and creating a list of all statements declaring variables.

Solution that I would like to propose is to make those lists a part of CFG instead of just temporary data used while building the graph. It would look something like this:

class CFGScope {
Stmt* stmt; // Statement that creates the scope (e.g. CompoundStmt)
vector<Stmt*> decls; // Statements that declare variables in same order as in AST
CFGScopeIterator parent; // Link to position in parent scope where this scope started
};

class CFGScopeIterator {
CFGScope* scope; // Scope pointed by iterator
vector<Stmt*>::iterator pos; // Position in declarations list of the scope.
};

Now the scope start element would point to CFGScope object. The scope end element would point to pair (range) of CFGScopeIterators: first would point to position in current scope the scope end occured, second would point to position in some scope where the control flow did jump to. Element for destructors wouldn’t be needed, because destructors could be easly deduced from range in scope end element.

This design would allow for simple implementation of jumps out of nested scopes and simple modeling of destructors. Similar design could be used to model destructors for temporary objects.

What do you think about this design?

Cheers
Sfider

Hi Marcin,

It is not clear where the CFGScope will fit in the original
CFG-CFGBlock-CFGElement hierarchy. It looks like CFGScope is
equivalent to the CFGBlock. Can we just extend CFGBlock? Whenever a
new scope begins, we create a new CFGBlock for it. We add a 'Parent'
member to CFGBlock and let it point to the parent block and the
position where the scope begins.

Currently we don't distinguish Decls and Stmts in the CFGBlock (Decls
are DeclStmts), because Decls may have Stmts as their initializers and
we need to visit them along with other Stmts. But it is desired to
improve this representation.

Destructors could be implicit in the CFG. The client can maintain a
list of Decls which need to be destructed. An explicit list of Decls
in each scope-beginning CFGBlock may help in this case.

It is not clear where the CFGScope will fit in the original
CFG-CFGBlock-CFGElement hierarchy. It looks like CFGScope is
equivalent to the CFGBlock. Can we just extend CFGBlock? Whenever a
new scope begins, we create a new CFGBlock for it. We add a 'Parent'
member to CFGBlock and let it point to the parent block and the
position where the scope begins.

I agree that the CFGScope doesn't fit in the CFG-CFGBlock-CFGElement hierarchy. I actually think this scope representation has nothing to do with CFGs at all, and can be done using an entirely separate data structure. I'll elaborate further in a direct response to Marcin's email.

Moreover, I don't think there is a one-to-one mapping between CFGBlocks and scopes. A CFGBlock can actually contain multiple scopes. Consider:

  {
     int x;
     {
        int y;
        {
           int z;
        }
     }
  }

All three DeclStmts go into the same CFGBlock.

Currently we don't distinguish Decls and Stmts in the CFGBlock (Decls
are DeclStmts), because Decls may have Stmts as their initializers and
we need to visit them along with other Stmts. But it is desired to
improve this representation.

I'm not certain I understand what would need to be improved. The Stmts represent what actually gets executed; DeclStmts provide a clear mapping from Decls to DeclStmts where we need to represent the "evaluation" of a Decl.

The main thing the CFG cannot represent right now (aside from destructors) are the initializers in a C++ constructor, since they aren't statements. I think this can be done by extending CFGElement.

Destructors could be implicit in the CFG. The client can maintain a
list of Decls which need to be destructed. An explicit list of Decls
in each scope-beginning CFGBlock may help in this case.

I think they could be represented implicitly (from a storage perspective) in the CFG, but clients of the CFG would prefer a "view" of the CFG that makes them *look* like they are explicitly represented. We don't want analyses having to do anything special to understand when a destructor gets called relative to all the other statements and function calls.

Hi Marcin,

First, I want to thank you for working on this. Adding support for destructors, and possibly scopes, to the CFG is something that is sorely needed for analyzing C++ code.

At the risk of sounding pedantic, I think when evaluating our possible approaches it's really important to keep in mind both *how* the information in the CFG will be used to service clients and *what* information the CFG is meant to represent. Second, I think we shouldn't focus on the implementation details (e.g., "the builder is processing AST backwards") of how we would construct the information before we have an idea of what the right API is for accessing the information in its final form. The whole point of the CFG is to make it easier to write analyses, so having a good API that represents useful information is the primary goal.

That's all well and good, but what does that mean? First let's step back and examine what purpose the CFG serves. The CFG is meant to represent control-flow in a function; nothing more and nothing less. Representing the control-flow is important for any analyses we want to build on top of the CFG that want to know the ordering between statements, the change of control caused by jumps and loops, etc. With this in mind, let's consider both the problem of scopes and destructors.

From your proposal of CFGScope, if you take a look at the definition it actually has nothing to do with control-flow at all. In C and its derivatives, scope is purely a lexical concept. This is evident since Sema and the Parser reason about scopes but don't actually reason about control-flow. If you renamed 'CFGScope' to 'ASTScope' then it functionally would be the same thing. Thus in my mind tying it to CFGs really has no value, and conflates our mental model of what the CFG does. I think it's reasonable to consider building ASTScopes (or whatever they are called) as part of the process of building a CFG (sort of a by-product), but unless the scope information is valuable to model in the CFG as something a client of the CFG would want to know about *when reasoning about control-flow* then I think it is an orthogonal topic. Before I believe I argued that this information could be represented in the CFG, but I'm honestly not certain what benefit that would provide other than modeling when we cross scope boundaries. That information, however, can likely be constructed by analysis using a side map from statements to scope, so it's not clear if that needs to be in the CFG.

Now let's consider destructor calls. Destructor calls are really just hidden function calls inserted by the compiler. If we looked at the LLVM IR for a function that created/destroyed C++ objects, the calls to the destructors would be explicitly represented. This is crucial for building analyses in the LLVM backend. Source-level analyses really need this information as well, since when destructors get called is non-trivial, and we don't want individual analyses needing to reconstruct this information. This is actually far more complicated then when we cross scope boundaries.

To illustrate, consider:

   MyClass A;
   ...
   if (...)
     return; // Destructor for A called
   ...
   MyClass B;
   ...
   if (...)
     return; // Destructor for B called, followed by destructor for A
   ...
   if (...) {
     MyClass C;
     return; // Destructor C called, followed by destructor for B, followed by destructor for A
   }
   ...
   return; // Destructor B called, followed by destructor for A

In the comments, I have listed distinct and well-ordered function calls. This example is fairly simple, but if you consider adding things like exceptions, C++ temporaries, etc., we have an abundance of destructor calls that can be called at different times. We certainly do not want individual analyses built on the CFG to have to reconstruct this information from the raw scope information. Also, we want to be able to model individual destructor calls, particularly if any of them can throw exceptions or do something else crazy that impacts control-flow (e.g., call a no-return function). For the purpose of diagnostics from analyses (think of the static analyzer), we want to know (relatively speaking) where a destructor was called.

This is why, fundamentally, I think destructors need to be explicitly represented in the CFG. Dataflow analyses need to know when destructors are called and in what order, and they need to model this precisely and easily. Scopes are obviously and important part of reasoning about constructors, and for the CFGBuilder to model destructors in the CFG it will need to reason about scopes. That, however, is different than having the scope information you described as 'CFGScope' being associated with the CFG itself.

Now let's consider implementation. Indeed there is an interesting challenge in doing this in CFGBuilder because we process the AST backwards. Processing it backwards, however, will actually allow us to readily determine the right destructor calls *if* we can readily know the current set of objects in a given scope that need their destructors called. I'm just throwing an idea out here, but this can possibly be done by doing a mixture of "backwards" and "forward" processing of an AST.

To illustrate, consider the case of a CompoundStmt. Right now we iterate through the statements in the CompoundStmt backwards, and build CFGBlocks as needed. A CompoundStmt represents a distinct scope, and the objects in that scope are destroyed in reverse order that they are declared. One possibility is that when we process a CompoundStmt in CFGBuilder we first do a forward pass to build up a list of objects (Decls) that would need their destructors called. We then process the CompoundStmt in reverse order. When we see a point where we would exit the scope of the CompoundStmt, we consult our list of objects (Decls) that would need to be destroyed and add the needed CFGElements for the destructor calls. Later in our processing of CompoundStmt, when we see one of the DeclStmts for one of the objects, we just remove that object from the list of objects that needed to be destroyed and keep processing. Then when we see another point where we would exit the scope, we consult the current list of objects that need to be destroyed (which will only include the objects created up to that point) and add the appropriate CFGElements for the destructor calls. While this approach requires a little extra processing, algorithmically it's not all that complicated and it is precise.

Of course we would also need to model nested scopes that also create objects. We would thus have a chain of "lists of objects" (one list for each nested scope), popping off lists as we process the first statement in that scope. We then create the destructors for all objects not in the scope we are going to.

To conclude, I'm not saying that my idea is the right (and only approach). I just wanted to point out more of the design parameters here, and that it actually isn't as hard as it seems to explicitly model the destructors in the CFG. I'm also mixed on what value there is modeling scopes directly in the CFG itself, since it seems like a complementary concept.

2010/8/11 Ted Kremenek <kremenek@apple.com>

Hi Marcin,

First, I want to thank you for working on this. Adding support for destructors, and possibly scopes, to the CFG is something that is sorely needed for analyzing C++ code.

At the risk of sounding pedantic, I think when evaluating our possible approaches it’s really important to keep in mind both how the information in the CFG will be used to service clients and what information the CFG is meant to represent. Second, I think we shouldn’t focus on the implementation details (e.g., “the builder is processing AST backwards”) of how we would construct the information before we have an idea of what the right API is for accessing the information in its final form. The whole point of the CFG is to make it easier to write analyses, so having a good API that represents useful information is the primary goal.

That’s all well and good, but what does that mean? First let’s step back and examine what purpose the CFG serves. The CFG is meant to represent control-flow in a function; nothing more and nothing less. Representing the control-flow is important for any analyses we want to build on top of the CFG that want to know the ordering between statements, the change of control caused by jumps and loops, etc. With this in mind, let’s consider both the problem of scopes and destructors.

From your proposal of CFGScope, if you take a look at the definition it actually has nothing to do with control-flow at all. In C and its derivatives, scope is purely a lexical concept. This is evident since Sema and the Parser reason about scopes but don’t actually reason about control-flow. If you renamed ‘CFGScope’ to ‘ASTScope’ then it functionally would be the same thing. Thus in my mind tying it to CFGs really has no value, and conflates our mental model of what the CFG does. I think it’s reasonable to consider building ASTScopes (or whatever they are called) as part of the process of building a CFG (sort of a by-product), but unless the scope information is valuable to model in the CFG as something a client of the CFG would want to know about when reasoning about control-flow then I think it is an orthogonal topic. Before I believe I argued that this information could be represented in the CFG, but I’m honestly not certain what benefit that would provide other than modeling when we cross scope boundaries. That information, however, can likely be constructed by analysis using a side map from statements to scope, so it’s not clear if that needs to be in the CFG.

It seems that gcc’s gimple cfg has the same idea.

int main() {
int x;
{
int y;
{
int z;
}
}
}

gcc’s tree cfg :

;; Function main (main)

Scope blocks:

{ Scope block #0
int x; (unused)

{ Scope block #0
int y; (unused)

{ Scope block #0
int z; (unused)

}

}

}

main ()
{
int z;
int y;
int x;

<bb 2>:
return;

}

Hi,

I've created these two patches which add support for the llvm::dwarf::DW_TAG_friend for class types and flags for the access modifier (public/private/protected) for methods to the debug info. The llvm_*.diff patch needs to be applied to the llvm, the other one to clang.

Having this information is interesting when analyzing assembler while using the dbg info to recover information.

I'm not sure this is the proper way to submit patches, pls tell me if I should open a request in some bug tracking tool or what ever.

Also it would be good to know if the patch is accepted or not.

Thx,
Alex

clang_dbg_friend_method.diff (4.17 KB)

llvm_dbg_friend.diff (442 Bytes)

Hi Zhongxing,

W dniu 11 sierpnia 2010 04:58 użytkownik Zhongxing Xu <xuzhongxing@gmail.com> napisał:

Hi Marcin,

It is not clear where the CFGScope will fit in the original
CFG-CFGBlock-CFGElement hierarchy. It looks like CFGScope is
equivalent to the CFGBlock. Can we just extend CFGBlock? Whenever a
new scope begins, we create a new CFGBlock for it. We add a ‘Parent’
member to CFGBlock and let it point to the parent block and the
position where the scope begins.

Yes, I think it could be done. Because scope can have many CFGBlocks
in it we could use the ‘Parent’ member to:

  • point to enclosing scope in CFGBlock that begins a scope,
  • point to previous CFGBlock in the same scope for other CFGBlocks.
  • point to nothing if the CFGBlock is not relevant for the scope (has
    no declarations)
    This way it would be easier to traverse through CFGBlocks in search
    of declarations. The downside would be that we would have more blocks
    in CFG then it would be needed.

Currently we don’t distinguish Decls and Stmts in the CFGBlock (Decls
are DeclStmts), because Decls may have Stmts as their initializers and
we need to visit them along with other Stmts. But it is desired to
improve this representation.

If we go through statements in a block we can spot declarations by type.
Do we need more information than that?

Destructors could be implicit in the CFG. The client can maintain a
list of Decls which need to be destructed. An explicit list of Decls
in each scope-beginning CFGBlock may help in this case.

CFGScope is in large part just that: an explicit list of declarations.

Hi Ted,

Thank you for this very detailed response.

CFGScope serves two purposes in what I thought of:

  1. It describes the scope and it’s relation with enclosing scope.
  2. It allows for easy iteration through declarations (DeclStmts) we need to call destructors for.

Combining those two things is maybe not the best idea, so further on I will concetrate on destructors only.

What you wrote about forward traversing e.g. CompondStmt to spot all declarations we need to call destructors for was exactly what I thought of. However adding one CFGElement for each destructor call is unnecessery I think. What I thought of was using data stored in CFGScope to provide list of all declarations that needs destructors to be called for them. Now I think that it would be better to use some other structure just for destructors that could be even hidden from the client.

So I think that we could create CFGElement that would be placed at point, where destructors are called. It would provide, through iterators interface, list of declarations that needs destructors to be called for them.

Pros:

  1. Less memory used (in most cases I think), because each destructor will have one entry in some list, then we will have just one CFGElement for each point where destructors are called.
  2. Should be more robust, because less work will be done.
  3. Easier implementation for goto statements.

Cons:

  1. Maybe a little less intuitive interface, because having one CFGElement for each destructor call is somehow more readable.

So do you think that I can go on with my idea (for destructors only) or the one-CFGElement-for-one-desctructor-call is more desired?

W dniu 11 sierpnia 2010 08:21 użytkownik Ted Kremenek <kremenek@apple.com> napisał:

Hi Ted,

Thank you for this very detailed response.

CFGScope serves two purposes in what I thought of:
1. It describes the scope and it's relation with enclosing scope.
2. It allows for easy iteration through declarations (DeclStmts) we need to call destructors for.

I think this is a useful concept, but as I mentioned in my email I don't think this bears any relation to the CFG data structure itself.

Combining those two things is maybe not the best idea, so further on I will concetrate on destructors only.

What you wrote about forward traversing e.g. CompondStmt to spot all declarations we need to call destructors for was exactly what I thought of. However adding one CFGElement for each destructor call is unnecessery I think.

The advantage of explicit CFGElements is that it clearly captures where the destructor was called. This is useful for diagnostics. Since multiple destructors are likely to be called at once, we could collapse that information in the CFG, but that is a space optimization, and not something that should impact the CFG's interface.

What I thought of was using data stored in CFGScope to provide list of all declarations that needs destructors to be called for them. Now I think that it would be better to use some other structure just for destructors that could be even hidden from the client.

Right. I also don't think that using the CFGScope for this purpose would work. The set of destructors that need to be called for a given scope varies over the scope itself as additional objects are declared. You could possibly model this with "pseudo-scopes" layer additional scopes within a single lexical scope, each one capturing another destructor that needs to be called.

So I think that we could create CFGElement that would be placed at point, where destructors are called. It would provide, through iterators interface, list of declarations that needs destructors to be called for them.

This is reasonable, but I'd go one step further and say the the iterator interface presents distinct CFGElements for each destructor call. How we represent it in the CFG is an implementation detail that clients should not know about.

Pros:
1. Less memory used (in most cases I think), because each destructor will have one entry in some list, then we will have just one CFGElement for each point where destructors are called.
2. Should be more robust, because less work will be done.
3. Easier implementation for goto statements.

Cons:
1. Maybe a little less intuitive interface, because having one CFGElement for each destructor call is somehow more readable.

So do you think that I can go on with my idea (for destructors only) or the one-CFGElement-for-one-desctructor-call is more desired?

There is a difference between how we represent it in the CFG itself and what interface to provide for clients. We should not conflate how we represent the control-flow information in the CFG with how it is presented to a client of the CFG.

Clients will want an explicit view of destructor calls. Forcing them to reconstruct the control-flow for destructors from an implicit list is highly undesirable. There are a variety of reasons for this, but destructor calls need to compose well with the other control-flow elements in a CFG. The job of articulating control-flow should solely be in the CFG; no client of the CFG should need to reproduce this work.

I see implicitly representing the set of destructors called at a particular point as an implementation detail of the CFG. It could be a big efficiency win, or it may not matter. I prefer that we do the simplest thing first, getting it working for real clients of the CFG, and then looking at how we can make it more efficient. In the process we will likely see something that we left out of the design, and if we prematurely optimize that could just make things more painful.

I'm less familiar with the CFG than with its clients, but FWIW I agree
with Ted. The notion of "scope" does not have a one-to-one mapping onto any
/control flow/ structures.

Remember, the CFG is only used for analysis, not code generation. We don't
need to know about "scope" in and of itself -- it's only important for
cleanup functions. (And with the attribute((cleanup(xyz))) extension, it's
not limited to destructors...but we can worry about that later.)

"Scopes" also aren't so useful when you consider branches that don't
rejoin. Consider this function:

int PerformComputation (int x) {
  if (DEBUG)
    return 0;

  while (!Satisfactory(x)) {
    if (x == 0)
      break;
    Computation C = Compute(x);
    if (!C)
      break;
    x = C.getResult();
  }

  return x;
}

Clearly a little contrived, but the point stands -- ~C is called at two
different points in the CFG ("during" the break, and at the end of the
loop, and not on every path in its scope. So to properly represent
destructors, we're probably going to have duplicated destructor elements.

This isn't inherently a bad thing, but it does show that "scope" is not as
useful as it seems at first.

Why would we want distinct CFGElements for destructors? Because the way
our path sensitive analysis works is by going from CFGElement to
CFGElement. If we can determine where a destructor is invoked, we could
probably even just treat it as a simple CXXMethodCall, since we don't have
to worry about the deletion aspects.

If we had the DestroyThese element (or LeavingScope, or whatever) you
suggest, we could stick several object destructors into a single
CFGElement. But when we simulate the execution of these, we'd basically
unwrap the destructors into several conceptual elements anyway. And since
one destructor can affect the behavior of the next (consider a double-free
of shared memory), there's clearly a point "between" two destructors.

Finally, the fact that temporaries have destructors too suggests that it'd
be useful to allow destructors in the middle of CFGBlocks as well.

I (also) am not insisting on separate elements for separate destructors,
but it seems that even if the CFG is more complicated, clients will have an
easier time with separate elements.

This whole time I've been dancing around the issue of how to generate
destructors. I think we're going to need some variation of the
forward-sweep-then-backwards-CFG, though I haven't thought about it much,
just because a purely backwards traversal would have a hard time dealing
with the branching example I gave above. (Or so it seems to me.) A
forward-first traversal could build a list of Decls, which could be used
something like this: "any time you leave this scope you have to run the
destructors for all elements in the list; any time you see a Decl, it
should be removed from the list". A first-pass approximation of a rule, of
course.

I also wish we could re-use some of the main compiler's logic for where to
insert destructors. I kind of doubt this is possible, but it'd be nice if
we didn't have to duplicate this in the CFG-building code, possibly
inserting mistakes along the way.

Oh, and I agree about putting constructor initializers in as new
CFGElements. For the most part they behave just like Decls anyway.

Hi Ted,

Thank you for this very detailed response.

CFGScope serves two purposes in what I thought of:
1. It describes the scope and it's relation with enclosing scope.
2. It allows for easy iteration through declarations (DeclStmts) we need

to

call destructors for.

Combining those two things is maybe not the best idea, so further on I

will

concetrate on destructors only.

What you wrote about forward traversing e.g. CompondStmt to spot all
declarations we need to call destructors for was exactly what I thought

of.

However adding one CFGElement for each destructor call is unnecessery I
think. What I thought of was using data stored in CFGScope to provide

list

of all declarations that needs destructors to be called for them. Now I
think that it would be better to use some other structure just for
destructors that could be even hidden from the client.

So I think that we could create CFGElement that would be placed at

point,

where destructors are called. It would provide, through iterators
interface,
list of declarations that needs destructors to be called for them.

Pros:
1. Less memory used (in most cases I think), because each destructor

will

have one entry in some list, then we will have just one CFGElement for

each

point where destructors are called.
2. Should be more robust, because less work will be done.
3. Easier implementation for goto statements.

Cons:
1. Maybe a little less intuitive interface, because having one

CFGElement

for each destructor call is somehow more readable.

So do you think that I can go on with my idea (for destructors only) or

the

one-CFGElement-for-one-desctructor-call is more desired?

W dniu 11 sierpnia 2010 08:21 użytkownik Ted Kremenek
<kremenek@apple.com>napisał:

Hi Marcin,

First, I want to thank you for working on this. Adding support for
destructors, and possibly scopes, to the CFG is something that is

sorely

needed for analyzing C++ code.

At the risk of sounding pedantic, I think when evaluating our possible
approaches it's really important to keep in mind both *how* the
information
in the CFG will be used to service clients and *what* information the
CFG is
meant to represent. Second, I think we shouldn't focus on the
implementation details (e.g., "the builder is processing AST

backwards")

of
how we would construct the information before we have an idea of what

the

right API is for accessing the information in its final form. The

whole

point of the CFG is to make it easier to write analyses, so having a

good

API that represents useful information is the primary goal.

That's all well and good, but what does that mean? First let's step

back

and examine what purpose the CFG serves. The CFG is meant to represent
control-flow in a function; nothing more and nothing less.

Representing

the
control-flow is important for any analyses we want to build on top of

the

CFG that want to know the ordering between statements, the change of
control
caused by jumps and loops, etc. With this in mind, let's consider both
the
problem of scopes and destructors.

From your proposal of CFGScope, if you take a look at the definition it
actually has nothing to do with control-flow at all. In C and its
derivatives, scope is purely a lexical concept. This is evident since
Sema
and the Parser reason about scopes but don't actually reason about
control-flow. If you renamed 'CFGScope' to 'ASTScope' then it
functionally
would be the same thing. Thus in my mind tying it to CFGs really has

no

value, and conflates our mental model of what the CFG does. I think

it's

reasonable to consider building ASTScopes (or whatever they are called)
as
part of the process of building a CFG (sort of a by-product), but

unless

the
scope information is valuable to model in the CFG as something a client
of
the CFG would want to know about *when reasoning about control-flow*
then I
think it is an orthogonal topic. Before I believe I argued that this
information could be represented in the CFG, but I'm honestly not

certain

what benefit that would provide other than modeling when we cross scope
boundaries. That information, however, can likely be constructed by
analysis using a side map from statements to scope, so it's not clear

if

that needs to be in the CFG.

Now let's consider destructor calls. Destructor calls are really just
hidden function calls inserted by the compiler. If we looked at the
LLVM IR
for a function that created/destroyed C++ objects, the calls to the
destructors would be explicitly represented. This is crucial for
building
analyses in the LLVM backend. Source-level analyses really need this
information as well, since when destructors get called is non-trivial,
and
we don't want individual analyses needing to reconstruct this
information.
This is actually far more complicated then when we cross scope
boundaries.

To illustrate, consider:

  MyClass A;
  ...
  if (...)
    return; // Destructor for A called
  ...
  MyClass B;
  ...
  if (...)
    return; // Destructor for B called, followed by destructor for A
  ...
  if (...) {
    MyClass C;
    return; // Destructor C called, followed by destructor for B,
    followed
by destructor for A
  }
  ...
  return; // Destructor B called, followed by destructor for A

In the comments, I have listed distinct and well-ordered function

calls.

This example is fairly simple, but if you consider adding things like
exceptions, C++ temporaries, etc., we have an abundance of destructor
calls
that can be called at different times. We certainly do not want
individual
analyses built on the CFG to have to reconstruct this information from
the
raw scope information. Also, we want to be able to model individual
destructor calls, particularly if any of them can throw exceptions or

do

something else crazy that impacts control-flow (e.g., call a no-return
function). For the purpose of diagnostics from analyses (think of the
static analyzer), we want to know (relatively speaking) where a
destructor
was called.

This is why, fundamentally, I think destructors need to be explicitly
represented in the CFG. Dataflow analyses need to know when

destructors

are
called and in what order, and they need to model this precisely and
easily.
Scopes are obviously and important part of reasoning about

constructors,

and for the CFGBuilder to model destructors in the CFG it will need to
reason about scopes. That, however, is different than having the scope
information you described as 'CFGScope' being associated with the CFG
itself.

Now let's consider implementation. Indeed there is an interesting
challenge in doing this in CFGBuilder because we process the AST
backwards.
Processing it backwards, however, will actually allow us to readily
determine the right destructor calls *if* we can readily know the

current

set of objects in a given scope that need their destructors called.

I'm

just throwing an idea out here, but this can possibly be done by doing

a

mixture of "backwards" and "forward" processing of an AST.

To illustrate, consider the case of a CompoundStmt. Right now we

iterate

through the statements in the CompoundStmt backwards, and build
CFGBlocks as
needed. A CompoundStmt represents a distinct scope, and the objects in
that
scope are destroyed in reverse order that they are declared. One
possibility is that when we process a CompoundStmt in CFGBuilder we
first do
a forward pass to build up a list of objects (Decls) that would need
their
destructors called. We then process the CompoundStmt in reverse order.
When we see a point where we would exit the scope of the CompoundStmt,
we
consult our list of objects (Decls) that would need to be destroyed and
add
the needed CFGElements for the destructor calls. Later in our
processing of
CompoundStmt, when we see one of the DeclStmts for one of the objects,

we

just remove that object from the list of objects that needed to be
destroyed
and keep processing. Then when we see another point where we would

exit

the
scope, we consult the current list of objects that need to be destroyed
(which will only include the objects created up to that point) and add
the
appropriate CFGElements for the destructor calls. While this approach
requires a little extra processing, algorithmically it's not all that
complicated and it is precise.

Of course we would also need to model nested scopes that also create
objects. We would thus have a chain of "lists of objects" (one list

for

each nested scope), popping off lists as we process the first statement
in
that scope. We then create the destructors for all objects not in the
scope
we are going to.

To conclude, I'm not saying that my idea is the right (and only
approach).
I just wanted to point out more of the design parameters here, and

that

it
actually isn't as hard as it seems to explicitly model the destructors

in

the CFG. I'm also mixed on what value there is modeling scopes

directly

in
the CFG itself, since it seems like a complementary concept.

> Hi,
>
> I'm trying to figure out how to add scopes and destructors to CFG. As

I

understand, what we need is:
>
> For scopes:
> - CFGElement kinds for start/end of scopes that would allow for
> tracking
entering and leaving scopes.
> - If there are no variable declarations in scope, scope start/end
elements should be omited.
> - Scope end element should be omited if there are no variable
declarations before it in the scope.
>
> For destructors:
> - CFGElement kind for call to destructor.
>
> The main problem with implementing this in CFGBuilder is that the
> builder
is processing AST backwards. When the builder have to decide if it

should

add a scope end element or a destructor it doesn't know of variables
declared earlier in the scope. To solve this there have to be an
additional
step of walking e.g. a block of statements (compound statement) and
creating
a list of all statements declaring variables.
>
> Solution that I would like to propose is to make those lists a part

of

CFG instead of just temporary data used while building the graph. It
would
look something like this:
>
> class CFGScope {
> Stmt* stmt; // Statement that creates the scope (e.g. CompoundStmt)
> vector<Stmt*> decls; // Statements that declare variables in same
> order
as in AST
> CFGScopeIterator parent; // Link to position in parent scope where
> this
scope started
> };
>
> class CFGScopeIterator {
> CFGScope* scope; // Scope pointed by iterator
> vector<Stmt*>::iterator pos; // Position in declarations list of

the

scope.
> };
>
> Now the scope start element would point to CFGScope object. The scope
> end
element would point to pair (range) of CFGScopeIterators: first would
point
to position in current scope the scope end occured, second would point

to

Hi Alex,

I’ll apply these patches once following issues are resolved. I see few tabs sneaked into diffs. clang and llvm code is tabs free. Few other comments…

  • // Static methods do not need “this” pointer argument.
  • if (Method->isStatic())
  • return FnTy;

Why are you removing this check ? All the changes to getOrCreateMethodType() function are unnecessary and not related to this patch.

+/// CollectCXXFriends - A helper function to collect debug info for
+/// C++ base classes. This is used while creating debug info entry for
+/// a Record.
+void CGDebugInfo::
+CollectCXXFriends(const CXXRecordDecl *RD, llvm::DIFile Unit,

  • llvm::SmallVectorImplllvm::DIDescriptor &EltTys,
  • llvm::DICompositeType &RecordTy) {
  • for (CXXRecordDecl::friend_iterator BI = RD->friend_begin(),
  • BE = RD->friend_end(); BI != BE; ++BI) {
  • unsigned BFlags = 0;

This BFlags is never assigned any other value. Intentional ?

llvm diffs looks fine.
Thanks for working on this.

Any update on whether the corrected version of the patch will be accepted?

Thx.
Alex

Hi,

I was not on vacation hence delay in response.