CFG support for temporary object desctructors

Greetings,

I would be interested in working on bug <http://llvm.org/bugs/show_bug.cgi?id=15599>, dealing with noreturn destructors. From the bug I’ve learned that this happens because the CFG does not have proper support for temporary destructors and that adding this is a “hard problem”.

I’ve done some experiments with enabling cfg-temporary-dtors analyzer option and to me the resulting CFG looks exactly like I would expect it to be. I also tried some examples with lifetime extension of temporaries and those looked ok too.

So, my question is: What am I missing? Is there some known problem with the CFG when this option is enabled? If not, why is this option not enabled by default? When turning this option on I get a test failure in two other tests, but if this is not a result of some CFG bug, I can take a look and try to fix these issues instead.

I would appreciate any thoughts or directions where to look.
regards,
pl

Hi, Pavel. It’s great to hear that you’re interested in this.

As I remember from last time I looked at this, there are three things holding us back from turning on support for temporary destructors in the analyzer. The first is that the analyzer is not wired up to handle destructor CFG nodes for arbitrary regions. I don’t think this is a major blocking issue, but it might take a bit of threading through all the code.

The second is a lot more difficult. When you pass temporaries to a function, what happens when they are destroyed?

void byRef(const Object &ref);
void byVal(Object val);

byRef(Object())
byVal(Object())

The destructor here has to run at the end of the full-expression containing the call (roughly, at the outermost expression). However, the function call may have changed some of the object’s contents (even ‘byRef’, if the object has fields marked ‘mutable’). We need to make sure that’s reflected when the chosen destructor is inlined.

The inlining problem is compounded by the fact that we decay structs to a collection of values (nonloc::LazyCompoundVal) whenever we need to treat them as rvalues. This is usually the right thing to do, but has very confusing results for temporaries being copied in and out of functions. According to the standard, the copy constructor happens in the caller (and that’s how it appears in the AST), but the region it’s being copied into is based on a ParmVarDecl that’s part of a StackFrameContext for inlining the function…which we may not decide to inline after all. Ignoring temporary destructors our current behavior is indistinguishable from the standard, but as soon as we start claiming to support temporary destructors we’re going to hit this problem.

The third problem is that we simply haven’t put time into qualifying and validating the temporary destructor logic. Marcin implemented it a long time ago in the CFG, then Chandler and Doug made sure it was in good enough condition to use for the analysis-based warnings, but we haven’t actually tested it in the analyzer.

Now, all of that said, you’re only interested in making ‘noreturn’ work right now, so for that it seems reasonable to treat the destructors for temporary objects as opaque in the analyzer. It may turn out that we were properly conservative in everything we’ve done so far that turning it on will just work, but I’d like to see a fair number of test cases before we start shipping that.

‘noreturn’ is actually the simplest thing here: that’s just a change in the shape of the CFG, not necessarily in its use. It’s all the other destructors that come along for the ride that worry me. (And the setup of the call to the noreturn destructor as well.)

What do you think?
Jordan

P.S. I asked “what do you think?”, but I’m going to be on vacation for the next week, so please don’t expect an immediate response. Sorry about the time for this response.

Hello,

thank you for the response. It makes things much clearer. However, I still have some questions.

Hello,

thank you for the response. It makes things much clearer. However, I still have some questions.

I don’t think so. “implicit” destructors are really more like “scoped” destructors: they fire at the end of the scope. In this case, the temporary is materialized, but there is still no lifetime extension—it’s still destroyed at the end of the full-expression. (Which makes the code really really bad, of course, since you’re returning a reference to stack memory. Do we not warn about this?)

Without seeing the body of foo() or the contents of the various …, I can’t really offer any advice.

The poor AST representation. For this code:

struct Object { ~Object(); };

void test() {
extern void byRef(const Object &ref);
byRef(Object());
}

Clang generates this AST:

-FunctionDecl 0x7fb85b830b60 <line:3:1, line:6:1> test 'void (void)' -CompoundStmt 0x7fb85b8619c0 <line:3:13, line:6:1>

-DeclStmt 0x7fb85b8615b8 <line:4:2, col:35>
-FunctionDecl 0x7fb85b861510 <col:2, col:34> byRef 'void (const struct Object &)' extern -ParmVarDecl 0x7fb85b861450 <col:20, col:34> ‘const struct Object &’
-ExprWithCleanups 0x7fb85b8619a8 <line:5:2, col:16> 'void' -CallExpr 0x7fb85b861940 <col:2, col:16> ‘void’
-ImplicitCastExpr 0x7fb85b861928 col:2 ‘void (*)(const struct Object &)’
-DeclRefExpr 0x7fb85b8615d0 <col:2> 'void (const struct Object &)' lvalue Function 0x7fb85b861510 'byRef' 'void (const struct Object &)' -MaterializeTemporaryExpr 0x7fb85b861988 <col:8, col:15> ‘const struct Object’ lvalue
-ImplicitCastExpr 0x7fb85b861970 <col:8, col:15> 'const struct Object' <NoOp> -CXXBindTemporaryExpr 0x7fb85b8618d8 <col:8, col:15> ‘struct Object’ (CXXTemporary 0x7fb85b8618d0)
`-CXXTemporaryObjectExpr 0x7fb85b861890 <col:8, col:15> ‘struct Object’ ‘void (void)’ zeroing

Notice how the CXXTemporaryObjectExpr is a prvalue (it’s not marked “lvalue” or “xvalue”). In the analyzer, that means it’s not representing a region but rather values that could be stored in a region. The MaterializeTemporaryExpr is then an lvalue, which will cause the analyzer to create a region to store the temporary value in.

That’s actually all fine, as long as we destroy the newly-created region rather than what we originally got from the CXXTemporaryObjectExpr. But that’s not what the CXXTemporary will lead us to do.

The byVal case is interesting in a different way:

-FunctionDecl 0x7f8be1030b60 <line:3:1, line:6:1> test 'void (void)' -CompoundStmt 0x7f8be1061ab0 <line:3:13, line:6:1>

-DeclStmt 0x7f8be1061588 <line:4:2, col:20>
-FunctionDecl 0x7f8be10614e0 <col:2, col:19> byVal 'void (struct Object)' -ParmVarDecl 0x7f8be1061420 <col:13, col:19> ‘struct Object’
-ExprWithCleanups 0x7f8be1061a98 <line:5:2, col:16> 'void' -CallExpr 0x7f8be1061980 <col:2, col:16> ‘void’
-ImplicitCastExpr 0x7f8be1061968 col:2 ‘void (*)(struct Object)’
-DeclRefExpr 0x7f8be10615a0 <col:2> 'void (struct Object)' lvalue Function 0x7f8be10614e0 'byVal' 'void (struct Object)' -CXXBindTemporaryExpr 0x7f8be1061a78 <col:8, col:15> ‘struct Object’ (CXXTemporary 0x7f8be1061a70)
-CXXConstructExpr 0x7f8be1061a38 <col:8, col:15> 'struct Object' 'void (const struct Object &) throw()' elidable -MaterializeTemporaryExpr 0x7f8be10619c8 <col:8, col:15> ‘const struct Object’ lvalue
-ImplicitCastExpr 0x7f8be10619b0 <col:8, col:15> 'const struct Object' <NoOp> -CXXBindTemporaryExpr 0x7f8be1061918 <col:8, col:15> ‘struct Object’ (CXXTemporary 0x7f8be1061910)
`-CXXTemporaryObjectExpr 0x7f8be10618d0 <col:8, col:15> ‘struct Object’ ‘void (void)’ zeroing

Here we get an extra CXXConstructExpr and CXXBindTemporaryExpr, which is copying the temporary into the argument. However, at the time we evaluate the CXXConstructExpr, we don’t yet know if we’re going to be able to inline the whole call. If we do, we need to make sure the VarRegion for the by-value parameter contains the constructed object…once we’ve built the stack frame it’s in. We probably want to run the destructor within the inlined function as well. If we don’t inline the function, we either need to run the destructor outside the function, or elide the copy-constructor altogether and just destroy the original temporary.

Certainly you could say our representation of rvalue structs is what’s causing the problem, or at least part of the problem. Certainly knowing which region to destroy is a big part of this. But we haven’t thought about what it would take to solve the problem, or if changing the representation would actually help solve everything.

I would cheat a lot by asking if the destructor’s region is a CXXTempObjectRegion, or perhaps “stack memory space but not a VarRegion”. :wink:

On the other hand, your patch to NoReturnFunctionChecker is giving me more hope that it may be possible to solve the noreturn problem without fully implementing this feature.

Jordan

I don't think so. "implicit" destructors are really more like "scoped"
destructors: they fire at the end of the scope. In this case, the temporary
is materialized, but there is still no lifetime extension—it's still
destroyed at the end of the full-expression. (Which makes the code really
really bad, of course, since you're returning a reference to stack memory.
Do we not warn about this?)

The warning is there and is working ok. I was thrown off, because I was
expecting a different warning from the destructor call. You are right, this
does not need to be changed.

The second issue happens with code patterns like:

Object *o;
...
foo(&o);
...
o->bar();

Here, I get a "dereferencing undefined pointer" warning on the last line,
but it only happens for some values of "...". I am still investigating how
is this related to the destructor problem and what exactly is going on
there.

Without seeing the body of foo() or the contents of the various ..., I
can't really offer any advice.

This turned out to be related to the noreturn destructor problem in the
last patch, so this is covered also.

The second is a lot more difficult. When you pass temporaries to a

function, what happens when they are destroyed?

void byRef(const Object &ref);
void byVal(Object val);

byRef(Object())
byVal(Object())

The destructor here has to run at the end of the full-expression
containing the call (roughly, at the outermost expression). However, the
function call may have changed some of the object's contents (even 'byRef',
if the object has fields marked 'mutable'). We need to make sure that's
reflected when the chosen destructor is inlined.

The inlining problem is compounded by the fact that we decay structs to a
collection of values (nonloc::LazyCompoundVal) whenever we need to treat
them as rvalues. This is usually the right thing to do, but has very
confusing results for temporaries being copied in and out of functions.
According to the standard, the copy constructor happens in the caller (and
that's how it appears in the AST), but the region it's being copied into is
based on a ParmVarDecl that's part of a StackFrameContext for inlining the
function...which we may not decide to inline after all. Ignoring temporary
destructors our current behavior is indistinguishable from the standard,
but as soon as we start claiming to support temporary destructors we're
going to hit this problem.

I find this a bit strange. I can't say I know all the quirks of the c++,
but to me the first case looks very similar to:
{ Object foo; byRef(foo); }
You still need to construct the object, pass a reference to the function
and then call the destructor. Obviously, the AST for the two fragments
looks very differently, and we need to take care when parsing it to
construct a CFG. However, once we already have a CFG, processing it should
not depend (too much) on whether we are working with a temporary or a
normal object. Specifically, I do not see why internal implementation
details (LazyCompoundVal, et al.) should cause problems in one case but not
the other. Even in case of normal objects, you still need to pass the
reference to a function and then reflect those changes when you inline the
destructor.

What am I missing?

The poor AST representation. For this code:

struct Object { ~Object(); };

void test() {
extern void byRef(const Object &ref);
byRef(Object());
}

Clang generates this AST:

`-FunctionDecl 0x7fb85b830b60 <line:3:1, line:6:1> test 'void (void)'
  `-CompoundStmt 0x7fb85b8619c0 <line:3:13, line:6:1>
    >-DeclStmt 0x7fb85b8615b8 <line:4:2, col:35>
    > `-FunctionDecl 0x7fb85b861510 <col:2, col:34> byRef 'void (const
struct Object &)' extern
    > `-ParmVarDecl 0x7fb85b861450 <col:20, col:34> 'const struct Object
&'
    `-ExprWithCleanups 0x7fb85b8619a8 <line:5:2, col:16> 'void'
      `-CallExpr 0x7fb85b861940 <col:2, col:16> 'void'
        >-ImplicitCastExpr 0x7fb85b861928 <col:2> 'void (*)(const struct
Object &)' <FunctionToPointerDecay>
        > `-DeclRefExpr 0x7fb85b8615d0 <col:2> 'void (const struct Object
&)' lvalue Function 0x7fb85b861510 'byRef' 'void (const struct Object &)'
        `-MaterializeTemporaryExpr 0x7fb85b861988 <col:8, col:15> 'const
struct Object' lvalue
          `-ImplicitCastExpr 0x7fb85b861970 <col:8, col:15> 'const struct
Object' <NoOp>
            `-CXXBindTemporaryExpr 0x7fb85b8618d8 <col:8, col:15> 'struct
Object' (CXXTemporary 0x7fb85b8618d0)
              `-CXXTemporaryObjectExpr 0x7fb85b861890 <col:8, col:15>
'struct Object' 'void (void)' zeroing

Notice how the CXXTemporaryObjectExpr is a prvalue (it's not marked
"lvalue" or "xvalue"). In the analyzer, that means it's not representing a
region but rather values that could be stored in a region. The
MaterializeTemporaryExpr is then an lvalue, which will cause the analyzer
to create a region to store the temporary value in.

That's actually all fine, as long as we destroy the newly-created region
rather than what we originally got from the CXXTemporaryObjectExpr. But
that's not what the CXXTemporary will lead us to do.

The byVal case is interesting in a different way:

`-FunctionDecl 0x7f8be1030b60 <line:3:1, line:6:1> test 'void (void)'
  `-CompoundStmt 0x7f8be1061ab0 <line:3:13, line:6:1>
    >-DeclStmt 0x7f8be1061588 <line:4:2, col:20>
    > `-FunctionDecl 0x7f8be10614e0 <col:2, col:19> byVal 'void (struct
Object)'
    > `-ParmVarDecl 0x7f8be1061420 <col:13, col:19> 'struct Object'
    `-ExprWithCleanups 0x7f8be1061a98 <line:5:2, col:16> 'void'
      `-CallExpr 0x7f8be1061980 <col:2, col:16> 'void'
        >-ImplicitCastExpr 0x7f8be1061968 <col:2> 'void (*)(struct
Object)' <FunctionToPointerDecay>
        > `-DeclRefExpr 0x7f8be10615a0 <col:2> 'void (struct Object)'
lvalue Function 0x7f8be10614e0 'byVal' 'void (struct Object)'
        `-CXXBindTemporaryExpr 0x7f8be1061a78 <col:8, col:15> 'struct
Object' (CXXTemporary 0x7f8be1061a70)
          `-CXXConstructExpr 0x7f8be1061a38 <col:8, col:15> 'struct
Object' 'void (const struct Object &) throw()' elidable
            `-MaterializeTemporaryExpr 0x7f8be10619c8 <col:8, col:15>
'const struct Object' lvalue
              `-ImplicitCastExpr 0x7f8be10619b0 <col:8, col:15> 'const
struct Object' <NoOp>
                `-CXXBindTemporaryExpr 0x7f8be1061918 <col:8, col:15>
'struct Object' (CXXTemporary 0x7f8be1061910)
                  `-CXXTemporaryObjectExpr 0x7f8be10618d0 <col:8, col:15>
'struct Object' 'void (void)' zeroing

Here we get an extra CXXConstructExpr and CXXBindTemporaryExpr, which is
copying the temporary into the argument. However, at the time we evaluate
the CXXConstructExpr, we don't yet know if we're going to be able to inline
the whole call. If we do, we need to make sure the VarRegion for the
by-value parameter contains the constructed object...once we've built the
stack frame it's in. We probably want to run the destructor within the
inlined function as well. If we *don't* inline the function, we either
need to run the destructor *outside* the function, or elide the
copy-constructor altogether and just destroy the original temporary.

Certainly you could say our representation of rvalue structs is what's
causing the problem, or at least part of the problem. Certainly *knowing
which region to destroy* is a big part of this. But we haven't thought
about what it would take to solve the problem, or if changing the
representation would actually help solve everything.

The third problem is that we simply haven't put time into qualifying and
validating the temporary destructor logic. Marcin implemented it a long
time ago in the CFG, then Chandler and Doug made sure it was in good enough
condition to use for the analysis-based warnings, but we haven't actually
tested it in the analyzer.

Now, all of that said, you're only interested in making 'noreturn' work
right now, so for that it seems reasonable to treat the destructors for
temporary objects as opaque in the analyzer. It may turn out that we were
properly conservative in everything we've done so far that turning it on
will just work, but I'd like to see a fair number of test cases before we
start shipping that.

'noreturn' is actually the simplest thing here: that's just a change in
the shape of the CFG, not necessarily in its use. It's all the other
destructors that come along for the ride that worry me. (And the setup of
the call to the noreturn destructor as well.)

What do you think?
Jordan

P.S. I asked "what do you think?", but I'm going to be on vacation for
the next week, so please don't expect an immediate response. Sorry about
the time for *this* response.

After reading this email, I took a look if it is possible to make the
analyzer not inline temporary destructors. But I couldn't find a _nice_ way
to do that. At the point where the inlining decision is made, I don't have
access to the type of the destructor call. Obviously, I could push
information through somehow, but instead I decided to take another look at
what happens when I actually do inline the destructors, since it already
seems to be mostly working. I will try to fix the two issues I mentioned
above (which means the CFG will get more testing and validation) and then
you can decide whether the code is stable enough to be switched on. If I
fail at that, I can always make the destructors opaque, as you suggested.

I would cheat a lot by asking if the destructor's region is a
CXXTempObjectRegion, or perhaps "stack memory space but not a VarRegion".
:wink:

On the other hand, your patch to NoReturnFunctionChecker is giving me more
hope that it may be possible to solve the noreturn problem without fully
implementing this feature.

Jordan

After a couple of days of plowing through the code I'm not much smarter,
only more convinced that this is really not easy, so I'm starting to agree
with you that we should just not inline these calls (for now). I have tried
adding a inlining test like:
  if (const CXXDestructorCall *DC = dyn_cast<CXXDestructorCall>(&Call)) {
    if (isa<CXXTempObjectRegion>(DC->getCXXThisVal().getAsRegion())) return
false;
  }

but this also prevents inlining of lifetime-extended temporaries
destructors, even though they have an Implicit destructor call in the CFG.
Since they seem to work fine, it would be bad to regress here. I'm not sure
what you meant by "stack memory space but not a VarRegion". Would that
filter out cases like these? And it seems a bit hacky in any case. :slight_smile:
I guess the best solution would be to actually store the destructor call
type in the CXXDestructorCall object (it already stores whether it is a
base destructor call anyway). Would you accept a patch like that?

regards,
Pavel

I have also noticed a different problem. A pattern like
void foo() {
  static const Object &ref = Object();
}
will be flagged as an error (dangling reference after function return),
when in fact this code is ok. I guess this is also related to the problem
of not knowing which region to construct the temporary object into, since
this object shouldn't be constructed on the stack.

Actually, hang on…wouldn’t the case we care about go through ExprEngine::ProcessTemporaryDtor? It’d be a real hack, but you could check the current CFGElement and see if it’s a CFGTemporaryDtor. (The current block is inside currBldrCtx and the index is currStmtIdx.)

Jordan

Hm…point. Not sure what we’ll do about this one—maybe check if we’re on a static initialization branch. (The analyzer puts a fake branch in the CFG to decide whether we’re initializing a static variable.) Please file a bug so that we can track it, at least.