I'm less familiar with the CFG than with its clients, but FWIW I agree
with Ted. The notion of "scope" does not have a one-to-one mapping onto any
/control flow/ structures.
Remember, the CFG is only used for analysis, not code generation. We don't
need to know about "scope" in and of itself -- it's only important for
cleanup functions. (And with the attribute((cleanup(xyz))) extension, it's
not limited to destructors...but we can worry about that later.)
"Scopes" also aren't so useful when you consider branches that don't
rejoin. Consider this function:
int PerformComputation (int x) {
if (DEBUG)
return 0;
while (!Satisfactory(x)) {
if (x == 0)
break;
Computation C = Compute(x);
if (!C)
break;
x = C.getResult();
}
return x;
}
Clearly a little contrived, but the point stands -- ~C is called at two
different points in the CFG ("during" the break, and at the end of the
loop, and not on every path in its scope. So to properly represent
destructors, we're probably going to have duplicated destructor elements.
This isn't inherently a bad thing, but it does show that "scope" is not as
useful as it seems at first.
Why would we want distinct CFGElements for destructors? Because the way
our path sensitive analysis works is by going from CFGElement to
CFGElement. If we can determine where a destructor is invoked, we could
probably even just treat it as a simple CXXMethodCall, since we don't have
to worry about the deletion aspects.
If we had the DestroyThese element (or LeavingScope, or whatever) you
suggest, we could stick several object destructors into a single
CFGElement. But when we simulate the execution of these, we'd basically
unwrap the destructors into several conceptual elements anyway. And since
one destructor can affect the behavior of the next (consider a double-free
of shared memory), there's clearly a point "between" two destructors.
Finally, the fact that temporaries have destructors too suggests that it'd
be useful to allow destructors in the middle of CFGBlocks as well.
I (also) am not insisting on separate elements for separate destructors,
but it seems that even if the CFG is more complicated, clients will have an
easier time with separate elements.
This whole time I've been dancing around the issue of how to generate
destructors. I think we're going to need some variation of the
forward-sweep-then-backwards-CFG, though I haven't thought about it much,
just because a purely backwards traversal would have a hard time dealing
with the branching example I gave above. (Or so it seems to me.) A
forward-first traversal could build a list of Decls, which could be used
something like this: "any time you leave this scope you have to run the
destructors for all elements in the list; any time you see a Decl, it
should be removed from the list". A first-pass approximation of a rule, of
course.
I also wish we could re-use some of the main compiler's logic for where to
insert destructors. I kind of doubt this is possible, but it'd be nice if
we didn't have to duplicate this in the CFG-building code, possibly
inserting mistakes along the way.
Oh, and I agree about putting constructor initializers in as new
CFGElements. For the most part they behave just like Decls anyway.
Hi Ted,
Thank you for this very detailed response.
CFGScope serves two purposes in what I thought of:
1. It describes the scope and it's relation with enclosing scope.
2. It allows for easy iteration through declarations (DeclStmts) we need
to
call destructors for.
Combining those two things is maybe not the best idea, so further on I
will
concetrate on destructors only.
What you wrote about forward traversing e.g. CompondStmt to spot all
declarations we need to call destructors for was exactly what I thought
of.
However adding one CFGElement for each destructor call is unnecessery I
think. What I thought of was using data stored in CFGScope to provide
list
of all declarations that needs destructors to be called for them. Now I
think that it would be better to use some other structure just for
destructors that could be even hidden from the client.
So I think that we could create CFGElement that would be placed at
point,
where destructors are called. It would provide, through iterators
interface,
list of declarations that needs destructors to be called for them.
Pros:
1. Less memory used (in most cases I think), because each destructor
will
have one entry in some list, then we will have just one CFGElement for
each
point where destructors are called.
2. Should be more robust, because less work will be done.
3. Easier implementation for goto statements.
Cons:
1. Maybe a little less intuitive interface, because having one
CFGElement
for each destructor call is somehow more readable.
So do you think that I can go on with my idea (for destructors only) or
the
one-CFGElement-for-one-desctructor-call is more desired?
W dniu 11 sierpnia 2010 08:21 użytkownik Ted Kremenek
<kremenek@apple.com>napisał:
Hi Marcin,
First, I want to thank you for working on this. Adding support for
destructors, and possibly scopes, to the CFG is something that is
sorely
needed for analyzing C++ code.
At the risk of sounding pedantic, I think when evaluating our possible
approaches it's really important to keep in mind both *how* the
information
in the CFG will be used to service clients and *what* information the
CFG is
meant to represent. Second, I think we shouldn't focus on the
implementation details (e.g., "the builder is processing AST
backwards")
of
how we would construct the information before we have an idea of what
the
right API is for accessing the information in its final form. The
whole
point of the CFG is to make it easier to write analyses, so having a
good
API that represents useful information is the primary goal.
That's all well and good, but what does that mean? First let's step
back
and examine what purpose the CFG serves. The CFG is meant to represent
control-flow in a function; nothing more and nothing less.
Representing
the
control-flow is important for any analyses we want to build on top of
the
CFG that want to know the ordering between statements, the change of
control
caused by jumps and loops, etc. With this in mind, let's consider both
the
problem of scopes and destructors.
From your proposal of CFGScope, if you take a look at the definition it
actually has nothing to do with control-flow at all. In C and its
derivatives, scope is purely a lexical concept. This is evident since
Sema
and the Parser reason about scopes but don't actually reason about
control-flow. If you renamed 'CFGScope' to 'ASTScope' then it
functionally
would be the same thing. Thus in my mind tying it to CFGs really has
no
value, and conflates our mental model of what the CFG does. I think
it's
reasonable to consider building ASTScopes (or whatever they are called)
as
part of the process of building a CFG (sort of a by-product), but
unless
the
scope information is valuable to model in the CFG as something a client
of
the CFG would want to know about *when reasoning about control-flow*
then I
think it is an orthogonal topic. Before I believe I argued that this
information could be represented in the CFG, but I'm honestly not
certain
what benefit that would provide other than modeling when we cross scope
boundaries. That information, however, can likely be constructed by
analysis using a side map from statements to scope, so it's not clear
if
that needs to be in the CFG.
Now let's consider destructor calls. Destructor calls are really just
hidden function calls inserted by the compiler. If we looked at the
LLVM IR
for a function that created/destroyed C++ objects, the calls to the
destructors would be explicitly represented. This is crucial for
building
analyses in the LLVM backend. Source-level analyses really need this
information as well, since when destructors get called is non-trivial,
and
we don't want individual analyses needing to reconstruct this
information.
This is actually far more complicated then when we cross scope
boundaries.
To illustrate, consider:
MyClass A;
...
if (...)
return; // Destructor for A called
...
MyClass B;
...
if (...)
return; // Destructor for B called, followed by destructor for A
...
if (...) {
MyClass C;
return; // Destructor C called, followed by destructor for B,
followed
by destructor for A
}
...
return; // Destructor B called, followed by destructor for A
In the comments, I have listed distinct and well-ordered function
calls.
This example is fairly simple, but if you consider adding things like
exceptions, C++ temporaries, etc., we have an abundance of destructor
calls
that can be called at different times. We certainly do not want
individual
analyses built on the CFG to have to reconstruct this information from
the
raw scope information. Also, we want to be able to model individual
destructor calls, particularly if any of them can throw exceptions or
do
something else crazy that impacts control-flow (e.g., call a no-return
function). For the purpose of diagnostics from analyses (think of the
static analyzer), we want to know (relatively speaking) where a
destructor
was called.
This is why, fundamentally, I think destructors need to be explicitly
represented in the CFG. Dataflow analyses need to know when
destructors
are
called and in what order, and they need to model this precisely and
easily.
Scopes are obviously and important part of reasoning about
constructors,
and for the CFGBuilder to model destructors in the CFG it will need to
reason about scopes. That, however, is different than having the scope
information you described as 'CFGScope' being associated with the CFG
itself.
Now let's consider implementation. Indeed there is an interesting
challenge in doing this in CFGBuilder because we process the AST
backwards.
Processing it backwards, however, will actually allow us to readily
determine the right destructor calls *if* we can readily know the
current
set of objects in a given scope that need their destructors called.
I'm
just throwing an idea out here, but this can possibly be done by doing
a
mixture of "backwards" and "forward" processing of an AST.
To illustrate, consider the case of a CompoundStmt. Right now we
iterate
through the statements in the CompoundStmt backwards, and build
CFGBlocks as
needed. A CompoundStmt represents a distinct scope, and the objects in
that
scope are destroyed in reverse order that they are declared. One
possibility is that when we process a CompoundStmt in CFGBuilder we
first do
a forward pass to build up a list of objects (Decls) that would need
their
destructors called. We then process the CompoundStmt in reverse order.
When we see a point where we would exit the scope of the CompoundStmt,
we
consult our list of objects (Decls) that would need to be destroyed and
add
the needed CFGElements for the destructor calls. Later in our
processing of
CompoundStmt, when we see one of the DeclStmts for one of the objects,
we
just remove that object from the list of objects that needed to be
destroyed
and keep processing. Then when we see another point where we would
exit
the
scope, we consult the current list of objects that need to be destroyed
(which will only include the objects created up to that point) and add
the
appropriate CFGElements for the destructor calls. While this approach
requires a little extra processing, algorithmically it's not all that
complicated and it is precise.
Of course we would also need to model nested scopes that also create
objects. We would thus have a chain of "lists of objects" (one list
for
each nested scope), popping off lists as we process the first statement
in
that scope. We then create the destructors for all objects not in the
scope
we are going to.
To conclude, I'm not saying that my idea is the right (and only
approach).
I just wanted to point out more of the design parameters here, and
that
it
actually isn't as hard as it seems to explicitly model the destructors
in
the CFG. I'm also mixed on what value there is modeling scopes
directly
in
the CFG itself, since it seems like a complementary concept.
> Hi,
>
> I'm trying to figure out how to add scopes and destructors to CFG. As
I
understand, what we need is:
>
> For scopes:
> - CFGElement kinds for start/end of scopes that would allow for
> tracking
entering and leaving scopes.
> - If there are no variable declarations in scope, scope start/end
elements should be omited.
> - Scope end element should be omited if there are no variable
declarations before it in the scope.
>
> For destructors:
> - CFGElement kind for call to destructor.
>
> The main problem with implementing this in CFGBuilder is that the
> builder
is processing AST backwards. When the builder have to decide if it
should
add a scope end element or a destructor it doesn't know of variables
declared earlier in the scope. To solve this there have to be an
additional
step of walking e.g. a block of statements (compound statement) and
creating
a list of all statements declaring variables.
>
> Solution that I would like to propose is to make those lists a part
of
CFG instead of just temporary data used while building the graph. It
would
look something like this:
>
> class CFGScope {
> Stmt* stmt; // Statement that creates the scope (e.g. CompoundStmt)
> vector<Stmt*> decls; // Statements that declare variables in same
> order
as in AST
> CFGScopeIterator parent; // Link to position in parent scope where
> this
scope started
> };
>
> class CFGScopeIterator {
> CFGScope* scope; // Scope pointed by iterator
> vector<Stmt*>::iterator pos; // Position in declarations list of
the
scope.
> };
>
> Now the scope start element would point to CFGScope object. The scope
> end
element would point to pair (range) of CFGScopeIterators: first would
point
to position in current scope the scope end occured, second would point
to