CFGElement changes and initializers addition (with patch)

Hi,

I’ve done some work on CFG. Biggest change is that CFGElement can no longer be converted to Stmt* (through static_cast nor dyn_cast). This is needed to properly handle all kinds of CFGElement: Statement, Initializer, ImplicitDtor and Scope. I’ve fixed some of the code that was assuming the CFGElement to represent Stmt* and left rest of it with appropriate comments.

I’ve also added Initializer CFGElements creation in CFG. Initializer CFGElement returns CXXBaseOrMemberInitializer object instead of Stmt.

For ImplicitDtor CFGElements I’m planning to add methods for getting:

  • CXXDestructorDecl for destructor to call,
  • Location of place destructor was called,
  • Exact statement that created the object for which destructor is called (Expr for temporary object and VarDecl for automatic object).

I think that it would also be useful to add methods for transparently getting:

  • Start location of the CFGElement,
  • End location of the CFGElement.

Now a few questions:

  • Do you have any suggestions on what more should be added to CFGElement interface?
  • In valid AST, can the CXXBaseOrMemberInitializer::getInit() method return null pointer?

Cheers,
Marcin

initializers.patch (27.2 KB)

Hi Marcin,

Awesome work. I'll comment on your specific points in your email, and then respond with comments to the patch inline below.

Hi,

I've done some work on CFG. Biggest change is that CFGElement can no longer be converted to Stmt* (through static_cast nor dyn_cast).

I think dyn_cast<> and cast<> is reasonable). dyn_cast is useful for clients wanting to check if the CFGElement wraps a Stmt* and then using the Stmt* directly. e.g.:

  CFGElement E = ...
  if (Stmt *S = dyn_cast<Stmt>(E))
    ...

Although I guess with that approach we end up doing two checks; one to do the downcast itself and one to check the pointer. With your API we will only do one check in practice.

This is needed to properly handle all kinds of CFGElement: Statement, Initializer, ImplicitDtor and Scope. I've fixed some of the code that was assuming the CFGElement to represent Stmt* and left rest of it with appropriate comments.

Great!

I've also added Initializer CFGElements creation in CFG. Initializer CFGElement returns CXXBaseOrMemberInitializer object instead of Stmt.

Great!

For ImplicitDtor CFGElements I'm planning to add methods for getting:
- CXXDestructorDecl for destructor to call,
- Location of place destructor was called,
- Exact statement that created the object for which destructor is called (Expr for temporary object and VarDecl for automatic object).

Excellent.

I think that it would also be useful to add methods for transparently getting:
- Start location of the CFGElement,
- End location of the CFGElement.

This would be potentially useful, although I don't think we have any clients of this yet. I'd rather we add this later if the need presents itself. Note that the location may not even be defined, as a base or member initializer may not even be explicitly written. Until the need presents itself, we might want clients to deliberately reason about such things, and let them decide their own policy on what locations they want to use.

Now a few questions:
- Do you have any suggestions on what more should be added to CFGElement interface?

I think this is a good start for now. We can always extend CFGElement later.

- In valid AST, can the CXXBaseOrMemberInitializer::getInit() method return null pointer?

I don't think so. It looks like it is always a ParenListExpr, but I'm not 100% certain.

Comments on the patch itself. Most of it is stylistic. I'm very happy with most of it:

Index: include/clang/Analysis/FlowSensitive/DataflowSolver.h

Wow, good work.

I agree with most of what Ted said. A few additional comments:

- I like the single enum rather than kind and subkind, but there are 7
elements, not 6. So it won't fit in a PointerIntPair anyway, though having
a Stmt* field (or void*) and an enum field would be preferable to two
PointerIntPairs.
I'm still not totally convinced we need scope markers in the CFG, though.

- Why don't we make this a static type hierarchy like SVal or MemRegion?
We'd have CFGStmt, CFGInitializer, CFGDestructor, and CFGScope(Marker)
inheriting from CFGElement. That way we can have, among other things, a
CFGStmt that simplifies to a Stmt. Instead of asserts, we can just have the
type-specific methods on the subclasses.

- In UnreachableCodeChecker::getUnreachableStmt(), isn't the block
guaranteed to have an initial statement for the same reason as in
ReachableCode.cpp?

- Very tiny comment: most of the LLVM codebase has no indentation on blank
lines.

Great to get more C++ support in the analyzer!
Jordy

Hi Jordy, Ted

Thanks for the appreciation and for comments. Some comments inline:

W dniu 18 sierpnia 2010 07:47 użytkownik Jordy Rose <jediknil@belkadan.com> napisał:

Wow, good work.

I agree with most of what Ted said. A few additional comments:

  • I like the single enum rather than kind and subkind, but there are 7
    elements, not 6. So it won’t fit in a PointerIntPair anyway, though having
    a Stmt* field (or void*) and an enum field would be preferable to two
    PointerIntPairs.
    I’m still not totally convinced we need scope markers in the CFG, though.

For both destructor kinds I will need space for two pointers. For automatic object I need it’s declaration and statement that cause destructor to be called. For temporary object I need expression bound to the temporary and the full expression that contains the temporary.

I could allocate separate struct for those and point to it from CFGElement. If we drop the scopes and move information about kind of destructor to this struct, then CFGElement will need only one PointerIntPair.

If we would want scopes modeled we could use same struct as for destructors. So we would end up with something like this:

  • Statement and LValStatement with pointer to Stmt*,
  • Initializer with pointer to CXXBaseOrMemberInitializer*,
  • Extendend CFGElement with pointer to struct with information about destructor or scope.

Our kind enum would look like this:
enum Kind {
Statement,
LvalStatment,
BEGIN_STATEMENT = Statement,
END_STATEMENT = LvalStatement,
Initializer,

ExtendedElement, // This would be internal, client would never get this
// with getKind() method.

AutomaticObjectDtor,
TemporaryObjectDtor,
BEGIN_DTOR = AutomaticObjectDtor,
END_DTOR = TemporaryObjectDtor,
StartScope,
EndScope,
BEGIN_SCOPE = StartScope,
END_SCOPE = EndScope
};

This way we will have space efficient CFG for Obj-C and C languages. For C++ CFG will occupy less memory if there will be more statement then destructor CFGElements.

  • Why don’t we make this a static type hierarchy like SVal or MemRegion?
    We’d have CFGStmt, CFGInitializer, CFGDestructor, and CFGScope(Marker)
    inheriting from CFGElement. That way we can have, among other things, a
    CFGStmt that simplifies to a Stmt. Instead of asserts, we can just have the
    type-specific methods on the subclasses.

Yes, this will be much better approach. I will change this.

  • In UnreachableCodeChecker::getUnreachableStmt(), isn’t the block
    guaranteed to have an initial statement for the same reason as in
    ReachableCode.cpp?

Do you mean that if block is unreachable it will start with a statement and not with initializer, destructor or scope?

  • Very tiny comment: most of the LLVM codebase has no indentation on blank
    lines.

I will try to follow this convention then.

Great to get more C++ support in the analyzer!
Jordy

Cheers,
Marcin

- I like the single enum rather than kind and subkind, but there are 7
elements, not 6. So it won't fit in a PointerIntPair anyway, though having
a Stmt* field (or void*) and an enum field would be preferable to two
PointerIntPairs.
I'm still not totally convinced we need scope markers in the CFG, though.

I don't think we need the scope markers either, and would actually prefer we took them out.

- Why don't we make this a static type hierarchy like SVal or MemRegion?
We'd have CFGStmt, CFGInitializer, CFGDestructor, and CFGScope(Marker)
inheriting from CFGElement. That way we can have, among other things, a
CFGStmt that simplifies to a Stmt. Instead of asserts, we can just have the
type-specific methods on the subclasses.

I had this thought as well. The approach would be closer to SVal than MemRegion; I'd prefer we didn't have virtual functions in these classes, and keep the size of CFGElement object constant. The nice thing about this approach is that it (a) uses strong-typing to enforce the APIs and (b) we can associate accessor (e.g., 'isLValue()') with only the appropriate CFGElement sub-class (e.g., CFGStmt).

W dniu 18 sierpnia 2010 07:47 użytkownik Jordy Rose
<jediknil@belkadan.com>napisał:

- I like the single enum rather than kind and subkind, but there are 7
elements, not 6. So it won't fit in a PointerIntPair anyway, though
having
a Stmt* field (or void*) and an enum field would be preferable to two
PointerIntPairs.
I'm still not totally convinced we need scope markers in the CFG,

though.

For both destructor kinds I will need space for two pointers. For
automatic
object I need it's declaration and statement that cause destructor to be
called. For temporary object I need expression bound to the temporary

and

the full expression that contains the temporary.

Hm. This might be true, but I'm not sure. I think the element's position
in the CFG is good enough that we don't need "the statement that causes the
destructor to be called". (Pseudo-destructors already count as statements
so we don't have to worry about those.)

The temporary one is a little more worrisome, but again you might be able
to get away with just the expression whose value is the temporary, and
leave out the context. We certainly need to keep track of both parts when
/building/ the CFG, but once it's complete we should be okay.

- In UnreachableCodeChecker::getUnreachableStmt(), isn't the block
guaranteed to have an initial statement for the same reason as in
ReachableCode.cpp?

Do you mean that if block is unreachable it will start with a statement
and not with initializer, destructor or scope?

Hm, again I forget about scope. An unreachable block certainly can't start
with an initializer or destructor, but it could very well start a scope.
But this means the changes in ReachableCode.cpp have to handle this case as
well.

I also don't think we need to hyper optimize here. We can burn two pointers; we just want it as svelt as possible. The number of CFGElements is on order of the number of statements, not expressions, in a function. If you think about the amount of data we keep track in the ASTs themsevles, burning a few extra bytes for a short-lived CFG is not really a big deal. We just don't want it to be too bloated.

For destructors, I would like to retain enough information so that we can precisely emit good diagnostics in the static analyzer. This includes being able to reference where the object was created and roughly where the object was destroyed. I agree that we could possible not include the statement that causes the destructor be called *as long* as it is easy to recreate this information and vend it through a clean API to the client.

Hi Jordy, Ted

Thanks for the appreciation and for comments. Some comments inline:

W dniu 18 sierpnia 2010 07:47 użytkownik Jordy Rose <jediknil@belkadan.com> napisał:

Wow, good work.

I agree with most of what Ted said. A few additional comments:

  • I like the single enum rather than kind and subkind, but there are 7
    elements, not 6. So it won’t fit in a PointerIntPair anyway, though having
    a Stmt* field (or void*) and an enum field would be preferable to two
    PointerIntPairs.
    I’m still not totally convinced we need scope markers in the CFG, though.

For both destructor kinds I will need space for two pointers. For automatic object I need it’s declaration and statement that cause destructor to be called. For temporary object I need expression bound to the temporary and the full expression that contains the temporary.

Repeating what I said in my reply to Jordy, I think it’s fine to burn an extra pointer if it gives us flexibility to track more information that would be useful for diagnostics.

I could allocate separate struct for those and point to it from CFGElement. If we drop the scopes and move information about kind of destructor to this struct, then CFGElement will need only one PointerIntPair.

That is also a possibility. It could be allocated from the BumpPtrAllocator.

If we would want scopes modeled we could use same struct as for destructors. So we would end up with something like this:

  • Statement and LValStatement with pointer to Stmt*,
  • Initializer with pointer to CXXBaseOrMemberInitializer*,
  • Extendend CFGElement with pointer to struct with information about destructor or scope.

Our kind enum would look like this:
enum Kind {
Statement,
LvalStatment,
BEGIN_STATEMENT = Statement,
END_STATEMENT = LvalStatement,
Initializer,

ExtendedElement, // This would be internal, client would never get this
// with getKind() method.

AutomaticObjectDtor,
TemporaryObjectDtor,
BEGIN_DTOR = AutomaticObjectDtor,
END_DTOR = TemporaryObjectDtor,
StartScope,
EndScope,
BEGIN_SCOPE = StartScope,
END_SCOPE = EndScope
};

What is the purpose of the ‘ExtendedElement’ kind?

This way we will have space efficient CFG for Obj-C and C languages. For C++ CFG will occupy less memory if there will be more statement then destructor CFGElements.

  • Why don’t we make this a static type hierarchy like SVal or MemRegion?
    We’d have CFGStmt, CFGInitializer, CFGDestructor, and CFGScope(Marker)
    inheriting from CFGElement. That way we can have, among other things, a
    CFGStmt that simplifies to a Stmt. Instead of asserts, we can just have the
    type-specific methods on the subclasses.

Yes, this will be much better approach. I will change this.

Sounds great.

  • In UnreachableCodeChecker::getUnreachableStmt(), isn’t the block
    guaranteed to have an initial statement for the same reason as in
    ReachableCode.cpp?

Do you mean that if block is unreachable it will start with a statement and not with initializer, destructor or scope?

Yes, an unreachable block could certainly start with an initializer if it had no initializer argument and the previous initializer called a no-return function. Gross, but I guess possible.

Moreover, a call to builtin_unreachable is not guaranteed to be the first CFGElement in a CFGBlock. I think the checker’s logic is likely wrong here. It should be iterating over the block, looking for this specific CallExpr.

Sorry, I still forget to choose “Replay all” option…

W dniu 20 sierpnia 2010 05:35 użytkownik Ted Kremenek <kremenek@apple.com> napisał:

Hi Jordy, Ted

Thanks for the appreciation and for comments. Some comments inline:

W dniu 18 sierpnia 2010 07:47 użytkownik Jordy Rose <jediknil@belkadan.com> napisał:

Wow, good work.

I agree with most of what Ted said. A few additional comments:

  • I like the single enum rather than kind and subkind, but there are 7
    elements, not 6. So it won’t fit in a PointerIntPair anyway, though having
    a Stmt* field (or void*) and an enum field would be preferable to two
    PointerIntPairs.
    I’m still not totally convinced we need scope markers in the CFG, though.

For both destructor kinds I will need space for two pointers. For automatic object I need it’s declaration and statement that cause destructor to be called. For temporary object I need expression bound to the temporary and the full expression that contains the temporary.

Repeating what I said in my reply to Jordy, I think it’s fine to burn an extra pointer if it gives us flexibility to track more information that would be useful for diagnostics.

So I have three acceptable solutions.

  1. Two pointers for each element.
  2. One pointer for each element, but in case of destructor it points to allocated pair of pointers.
  3. One pointer for each element, and location of call to destructor is calculated on demand.

1 is wasteful, because only destructors need two pointers. 3 will give as possibly unclean API for clients. 2 seams to be the best and I will implement it.

I could allocate separate struct for those and point to it from CFGElement. If we drop the scopes and move information about kind of destructor to this struct, then CFGElement will need only one PointerIntPair.

That is also a possibility. It could be allocated from the BumpPtrAllocator.

When will be the memory allocated with BumpPtrAllocator freed? If I will allocate memory for use in one instance of CFG, can I forget about it (not call Deallocate)?

If we would want scopes modeled we could use same struct as for destructors. So we would end up with something like this:

  • Statement and LValStatement with pointer to Stmt*,
  • Initializer with pointer to CXXBaseOrMemberInitializer*,
  • Extendend CFGElement with pointer to struct with information about destructor or scope.

Our kind enum would look like this:
enum Kind {
Statement,
LvalStatment,
BEGIN_STATEMENT = Statement,
END_STATEMENT = LvalStatement,
Initializer,

ExtendedElement, // This would be internal, client would never get this
// with getKind() method.

AutomaticObjectDtor,
TemporaryObjectDtor,
BEGIN_DTOR = AutomaticObjectDtor,
END_DTOR = TemporaryObjectDtor,
StartScope,
EndScope,
BEGIN_SCOPE = StartScope,
END_SCOPE = EndScope
};

What is the purpose of the ‘ExtendedElement’ kind?

Int part of PointerIntPair where pointer part points to pointer can have only 2 bits. ExtendedElement kind marks that the kind (- int(ExtendedElement)) is stored in int part of first PointerIntPair in pointed struct. It is an implementation detail realy.

This way we will have space efficient CFG for Obj-C and C languages. For C++ CFG will occupy less memory if there will be more statement then destructor CFGElements.

  • Why don’t we make this a static type hierarchy like SVal or MemRegion?
    We’d have CFGStmt, CFGInitializer, CFGDestructor, and CFGScope(Marker)
    inheriting from CFGElement. That way we can have, among other things, a
    CFGStmt that simplifies to a Stmt. Instead of asserts, we can just have the
    type-specific methods on the subclasses.

Yes, this will be much better approach. I will change this.

Sounds great.

The hierarchy is almost ready. I’ll send a patch with this and both destructors implemented after weekend.

Sorry, I still forget to choose “Replay all” option.

W dniu 19 sierpnia 2010 22:00 użytkownik Jordy Rose <jediknil@belkadan.com> napisał:

W dniu 18 sierpnia 2010 07:47 użytkownik Jordy Rose
<jediknil@belkadan.com>napisał:

  • I like the single enum rather than kind and subkind, but there are 7
    elements, not 6. So it won’t fit in a PointerIntPair anyway, though
    having
    a Stmt* field (or void*) and an enum field would be preferable to two
    PointerIntPairs.
    I’m still not totally convinced we need scope markers in the CFG,
    though.

For both destructor kinds I will need space for two pointers. For
automatic
object I need it’s declaration and statement that cause destructor to be
called. For temporary object I need expression bound to the temporary
and
the full expression that contains the temporary.

Hm. This might be true, but I’m not sure. I think the element’s position
in the CFG is good enough that we don’t need “the statement that causes the
destructor to be called”. (Pseudo-destructors already count as statements
so we don’t have to worry about those.)

The temporary one is a little more worrisome, but again you might be able
to get away with just the expression whose value is the temporary, and
leave out the context. We certainly need to keep track of both parts when
/building/ the CFG, but once it’s complete we should be okay.

Location where the destructor was called won’t be needed much I suppose. Instead of explicitly giving this position we could provide means of calculating it. If this would be 'kay I think that I can squeeze everything into one PointerIntPair.

One thing that would be needed and could be questionable (I think) would be end of scope element. This element wouldn’t have start of scope element counterpart, and it would be used only if sequance of destructors for automatic objects would not end with block termination statement. CFG clients would have to ignore it always as it would be of no use to them.

  • In UnreachableCodeChecker::getUnreachableStmt(), isn’t the block
    guaranteed to have an initial statement for the same reason as in
    ReachableCode.cpp?

Do you mean that if block is unreachable it will start with a statement
and not with initializer, destructor or scope?

Hm, again I forget about scope. An unreachable block certainly can’t start
with an initializer or destructor, but it could very well start a scope.
But this means the changes in ReachableCode.cpp have to handle this case as
well.

We could write code like this:

while (true) {
std::string a;
break;
break;
}

From CFGBuilder point of view each break will cause ‘a’ to be destroyed, so the block with second break as terminator will start with destructor of ‘a’. As for initializers I don’t see any case that would produce block with it as first element, but I think that it could be handled just to be on the safe side.

  • Marcin

Location where the destructor was called won't be needed much I suppose. Instead of explicitly giving this position we could provide means of calculating it. If this would be 'kay I think that I can squeeze everything into one PointerIntPair.

It will be needed by analyzer diagnostics. As long as we can reconstruct it when necessary that's fine by me.

One thing that would be needed and could be questionable (I think) would be end of scope element. This element wouldn't have start of scope element counterpart, and it would be used only if sequance of destructors for automatic objects would not end with block termination statement. CFG clients would have to ignore it always as it would be of no use to them.

I'm not quite certain what this means. We can have a block that contains a bunch destructor calls with a single successor to the block in the outer scope. An explicit "end scope element" isn't strictly needed.

Sorry for the late reply. There is a case where we need explicit scope
delimiters. We need them to detect resource leaks accurately. Consider
code:

void f() {
  {

  }

}

Sorry, I mistakenly pressed 'send' button.

void f() {
  {
      int *p = malloc(4);
  }
  ...
}

Without a scope delimiter, we can only detect that the block pointed
to by 'p' is leaked when getting to the end of 'f'. That's far from
where it happened.

It seems a little cumbersome for users to have to check if it is a Stmt* before getting the value. We could instead do:

return isStmt() ? static_cast<Stmt*>(Data.getPointer()) : NULL

and have the client just use 'getStmt()' instead of 'isStmt()' + 'getStmt()'. The first approach has one less check, while the second is simpler for clients.

@Jordy, Zhongxing: What do you prefer?

I prefer the latter and rename it to getAsStmt(). This pattern is used
many times in the front end.

One thing that would be needed and could be questionable (I think)
would be end of scope element. This element wouldn't have start of
scope element counterpart, and it would be used only if sequance of
destructors for automatic objects would not end with block

termination

statement. CFG clients would have to ignore it always as it would be

of

no use to them.

I'm not quite certain what this means. We can have a block that
contains a bunch destructor calls with a single successor to the block
in the outer scope. An explicit "end scope element" isn't strictly
needed.

Sorry for the late reply. There is a case where we need explicit scope
delimiters. We need them to detect resource leaks accurately. Consider
code:

void f() {
  {
      int *p = malloc(4);
  }
  ...
}

Without a scope delimiter, we can only detect that the block pointed
to by 'p' is leaked when getting to the end of 'f'. That's far from
where it happened.

Unless we have destructor nodes for /all/ objects, whether the destructor
is trivial or not.

(Not trying to push either side, just presenting another possibility.)

You mean creating a destructor node for 'p' in this case?

+/// addInitializer - Add C++ base or member initializer element to CFG.
+CFGBlock* CFGBuilder::addInitializer(CXXBaseOrMemberInitializer* I) {
+ if (!AddInitializers)
+ return Block;
+
+ autoCreateBlock();
+ AppendInitializer(Block, I);
+
+ if (!I->getInit())
+ // FIXME: Remove this check if is unnecessery.
+ return Block;
+
+ return Visit(I->getInit());

This should probably be addStmt(). We probably want the initializer values to be added to the CFGBlock as block-level expressions. The reason is that we want:

a(a)

to be properly sequenced in the CFG, just like we do for DeclStmts, e.g.:

int x = x

There is some subtleties here. Currently we don't lift initializers of
CXXConstructExpr to block-level exprs. The benefits is that by seeing
the VarDecl first we can evaluate the constructor directly into the
object being constructed. This is done with the help of
AggExprVisitor. A object pointer is passed to AggExprVisitor as
'DestPtr'.

Note that now only initializers of CallExpr is lifted to block-level
expr, because we need to place CallEnter/CallExit nodes around them.

One thing that would be needed and could be questionable (I think)
would be end of scope element. This element wouldn't have start of
scope element counterpart, and it would be used only if sequance of
destructors for automatic objects would not end with block

termination

statement. CFG clients would have to ignore it always as it would

be

of

no use to them.

I'm not quite certain what this means. We can have a block that
contains a bunch destructor calls with a single successor to the

block

in the outer scope. An explicit "end scope element" isn't strictly
needed.

Sorry for the late reply. There is a case where we need explicit

scope

delimiters. We need them to detect resource leaks accurately.

Consider

code:

void f() {
{
int *p = malloc(4);
}
...
}

Without a scope delimiter, we can only detect that the block pointed
to by 'p' is leaked when getting to the end of 'f'. That's far from
where it happened.

Unless we have destructor nodes for /all/ objects, whether the

destructor

is trivial or not.

(Not trying to push either side, just presenting another possibility.)

You mean creating a destructor node for 'p' in this case?

Yeah. I guess I don't mean "all objects" but "all variables plus all
temporaries with non-trivial destructors". I figure checkers are going to
be able to visit destructor nodes as well, so this would be almost as good
as a VisitEndScope. (Which itself is more powerful than VisitEndAnalysis,
as you pointed out.)

2010/8/24 Jordy Rose <jediknil@belkadan.com>

2010/8/20 Ted Kremenek <kremenek@apple.com>:

One thing that would be needed and could be questionable (I think)
would be end of scope element. This element wouldn’t have start of
scope element counterpart, and it would be used only if sequance of
destructors for automatic objects would not end with block
termination
statement. CFG clients would have to ignore it always as it would
be
of
no use to them.

I’m not quite certain what this means. We can have a block that
contains a bunch destructor calls with a single successor to the
block
in the outer scope. An explicit “end scope element” isn’t strictly
needed.

Sorry for the late reply. There is a case where we need explicit
scope
delimiters. We need them to detect resource leaks accurately.
Consider
code:

void f() {
{
int *p = malloc(4);
}

}

Without a scope delimiter, we can only detect that the block pointed
to by ‘p’ is leaked when getting to the end of ‘f’. That’s far from
where it happened.

Unless we have destructor nodes for /all/ objects, whether the
destructor
is trivial or not.

(Not trying to push either side, just presenting another possibility.)

You mean creating a destructor node for ‘p’ in this case?

Yeah. I guess I don’t mean “all objects” but “all variables plus all
temporaries with non-trivial destructors”. I figure checkers are going to
be able to visit destructor nodes as well, so this would be almost as good
as a VisitEndScope. (Which itself is more powerful than VisitEndAnalysis,
as you pointed out.)

Implicit call to C++ destructor is part of control flow of a program and it should have representation in CFG. However variable going out of scope is not IMO, so I don’t think that this would be a good idea. And for C we can jump over variable initialization. So should we create destructor node for ‘p’ in this case also?:

void f() {
goto label;
{
int* p = malloc(4);
label:

}

}

As I understand for delimiter approach we need StartScope and EndScope delimiters to know where in CFG we enter and leave scopes but without jump. In case of jumps it would be for client to deduce what scopes will be left and entered? I mean we don’t need any more information in CFG for this?

If that is the case I understand that that what is to be done is:

  • Adding implicit scopes for selection and iteration statements in case the body is not a compound statement but declares some variables,
  • Not adding EndScope delimiter if scope ends with jump as it would produce unnecessary unreachable CFGBlock.

I think that while I’m working on C++ initializers/destructors I could also add this.

I think 1 is affordable, and it would provide a clean implementation.
Suppose we analyze 1 million lines of code, roughly equivalent to 1
million stmts. Each CFGElement takes a pointer of 8 bytes. It only
consumes 7 MB extra memory.