Consumed analysis resets its state in between CFG blocks

Heya,

I am currently playing around with how the CFG looks in the face of temporary destructors; one interesting tidbit is that temporary destructors insert CFG blocks in places where one might not have expected them - for example, before the call to an arbitrary destructor at the end of a full expression.

Now the consumed analysis kills it state unconditionally at the end of each block:
http://reviews.llvm.org/diffusion/L/browse/cfe/trunk/lib/Analysis/Consumed.cpp;207914$1400

This leads to missed warnings when we leave a CFG block in the middle of an expression, and in general feels very brittle regarding future changes to the CFG’s layout.

I was wondering whether there are already mechanisms in the CFG to deal with issues like this (I’d have expected a consumed analysis to work on the scope of a variable instead of arbitrary block boundaries - for example, could we use REGISTER_MAP_WITH_PROGRAMSTATE?).

Thoughts?
/Manuel

Btw, I see a similar problem with the ReachableCode analysis:
lib/Analysis/ReachableCode.cpp:61 static bool isDeadReturn(…)
seems to rely on the invariant:
“A new block may only be started if there is control flow”
Is the analyzer actually trying to enforce this invariant?

Thanks!
/Manuel

Neither of these are part of the analyzer; they’re part of the analysis-based warnings in Clang. libClangAnalysis is the common base between the two, which is why we have to be careful about the CFG, but the full analyzer infrastructure isn’t going to be the answer.

“A new block may only be started if there is control flow” probably is not enforced, but I’m not sure why we wouldn’t do that. (“control flow” doesn’t have to mean a C/C++ notion of control flow; it can be something like the static variable DeclStmt that clients of the CFG can use to decide where execution goes next.)

Jordan

Neither of these are part of the analyzer; they're part of the
analysis-based warnings in Clang. libClangAnalysis is the common base
between the two, which is why we have to be careful about the CFG, but the
full analyzer infrastructure isn't going to be the answer.

What would be "full analyzer infrastructure" here? The
REGISTER_MAP_WITH_PROGRAMSTATE
part?

"A new block may only be started if there is control flow" probably is not
enforced, but I'm not sure why we wouldn't do that. ("control flow" doesn't
have to mean a C/C++ notion of control flow; it can be something like the
static variable DeclStmt that clients of the CFG can use to decide where
execution goes next.)

I'm aware control flow doesn't necessarily mean C/C++ control flow.
Here's my reasons why we might not want to have that invariant.
1. if I remember correctly I have already seen blocks that don't have
terminators that just flow into a single other block (I'll try to fish one
out if you're curious); so I was surprised to see other places basically
assert in a comment that this is an assumed invariant
2. it seems that sometimes having the CFG be more uniform regarding
different related patterns might be nice (although I see that it is a very
very weak argument)
3. it might just be too hard to assert the invariant, in which case I'd say
it's better to explicitly say it's *not* an invariant than to let checks or
cfg-based compiler checks rely on it

Thoughts?
/Manuel

ProgramState at all. The analysis-based warnings track their own state in more lightweight structures, and aren’t usually following paths anyway.

I guess it’s not. After an if-statement, both branches have to transition to the same block, of course. I’m not sure what I was thinking, nor am I sure what the comment was getting at.

Jordan

Neither of these are part of the analyzer; they're part of the
analysis-based warnings in Clang. libClangAnalysis is the common base
between the two, which is why we have to be careful about the CFG, but the
full analyzer infrastructure isn't going to be the answer.

What would be "full analyzer infrastructure" here? The REGISTER_MAP_WITH_PROGRAMSTATE
part?

ProgramState at all. The analysis-based warnings track their own state in
more lightweight structures, and aren't usually following paths anyway.

The consumed analysis seems to have to? Otherwise, how would it figure out
that something was not consumed in all paths?

"A new block may only be started if there is control flow" probably is not

enforced, but I'm not sure why we wouldn't do that. ("control flow" doesn't
have to mean a C/C++ notion of control flow; it can be something like the
static variable DeclStmt that clients of the CFG can use to decide where
execution goes next.)

I'm aware control flow doesn't necessarily mean C/C++ control flow.
Here's my reasons why we might not want to have that invariant.
1. if I remember correctly I have already seen blocks that don't have
terminators that just flow into a single other block (I'll try to fish one
out if you're curious); so I was surprised to see other places basically
assert in a comment that this is an assumed invariant
2. it seems that sometimes having the CFG be more uniform regarding
different related patterns might be nice (although I see that it is a very
very weak argument)
3. it might just be too hard to assert the invariant, in which case I'd
say it's better to explicitly say it's *not* an invariant than to let
checks or cfg-based compiler checks rely on it

I guess it's not. After an if-statement, both branches have to transition
to the same block, of course. I'm not sure what I was thinking, nor am I
sure what the comment was getting at.

Well, I thought "incoming" control flow (at the end of an if) is just part
of control flow.

And I think I caused some confusion, as the invariant proposal I made was
not actually a citation of the code (but might have looked like I implied
that).
The comment is:
// Note also that we are restricting
// to looking at return statements in the same CFGBlock,
// so this will intentionally not catch cases where the
// return statement contains nested control-flow.

Which seems to imply that invariant, though?

Cheers,
/Manuel

Both thread safety analysis and consumed analysis just use the CFG;
they are not part of the static analyzer, and they maintain their own
notion of "state". Moreover, the line you're looking at is not
"killing state". The initial state of a block is determined from the
end state of its immediate predecessors, and the first step when
starting to analyze a new block is to calculate the initial state for
that block. Blocks are traversed in topological order, so the
previously processed block has no relationship to the one we're just
starting to process.

I don't understand what you mean by "brittle" here. This algorithm
works for any "layout" in which the CFG captures control flow, which
is, after all, the whole point of the CFG.

  -DeLesley

Both thread safety analysis and consumed analysis just use the CFG;
they are not part of the static analyzer, and they maintain their own
notion of "state". Moreover, the line you're looking at is not
"killing state". The initial state of a block is determined from the
end state of its immediate predecessors, and the first step when
starting to analyze a new block is to calculate the initial state for
that block. Blocks are traversed in topological order, so the
previously processed block has no relationship to the one we're just
starting to process.

I don't understand what you mean by "brittle" here. This algorithm
works for any "layout" in which the CFG captures control flow, which
is, after all, the whole point of the CFG.

It seems like it has problems with temporary destructors in the face of
control flow:
If 'Status' is the class taken from
test/SemaCXX/warn-consumed-analysis.cpp, but I add an "operator bool()
const;" to it (which is not annotated in any way), the following happens:
Status g();
void f() {
  bool b = g();
}
Everything works as expected:
/tmp/t7.cc:39:12: warning: invalid invocation of method '~Status' on a
temporary object while it is in the 'unconsumed' state
  bool b = g();

Now if I change the code to:
Status g();
bool coin();
void f() {
  bool b = coin() && g();
}
I don't get any errors, but did not change the consumed state of the
returned temporary, if I'm not mistaken.

The main difference is that with temporary destructors, we have control
flow, and thus multiple CFG blocks in between the generation of the
temporary and the destructor call.

I debugged into this and found the Visitor.reset(CurrStates) to be the
point where the information that the status is not consumed yet is lost. I
apparently jumped to the wrong conclusion (which is kinda obvious when I
think about patterns like "{ Status s = g(); if (...) {} }", which of
course works). Sorry about that.

Any insight in how this could be fixed (or where my mistake is) would be
highly appreciated.

Cheers,
/Manuel

Both thread safety analysis and consumed analysis just use the CFG;
they are not part of the static analyzer, and they maintain their own
notion of "state". Moreover, the line you're looking at is not
"killing state". The initial state of a block is determined from the
end state of its immediate predecessors, and the first step when
starting to analyze a new block is to calculate the initial state for
that block. Blocks are traversed in topological order, so the
previously processed block has no relationship to the one we're just
starting to process.

I don't understand what you mean by "brittle" here. This algorithm
works for any "layout" in which the CFG captures control flow, which
is, after all, the whole point of the CFG.

It seems like it has problems with temporary destructors in the face of
control flow:
If 'Status' is the class taken from
test/SemaCXX/warn-consumed-analysis.cpp, but I add an "operator bool()
const;" to it (which is not annotated in any way), the following happens:
Status g();
void f() {
  bool b = g();
}
Everything works as expected:
/tmp/t7.cc:39:12: warning: invalid invocation of method '~Status' on a
temporary object while it is in the 'unconsumed' state
  bool b = g();

Now if I change the code to:
Status g();
bool coin();
void f() {
  bool b = coin() && g();
}
I don't get any errors, but did not change the consumed state of the
returned temporary, if I'm not mistaken.

The main difference is that with temporary destructors, we have control
flow, and thus multiple CFG blocks in between the generation of the
temporary and the destructor call.

I debugged into this and found the Visitor.reset(CurrStates) to be the
point where the information that the status is not consumed yet is lost. I
apparently jumped to the wrong conclusion (which is kinda obvious when I
think about patterns like "{ Status s = g(); if (...) {} }", which of
course works). Sorry about that.

Any insight in how this could be fixed (or where my mistake is) would be
highly appreciated.

Ok, so the problem seems to be the unconditional call
to CurrStates->clearTemporaries(); in line 1439. Could we instead remove
the temps from the map just above that call when we visit the TemporaryDtor?
Also, where are other elements removed? ConsumedStateMap::remove doesn't
seem to have any references (I'd guess we can safely remove stuff when the
dtor is called?).

Just removing line 1439 (the clearTemporaries() call) "fixes" all problems
I can identify, but I assume it's not the right solution :slight_smile:

Thanks,
/Manuel

It doesn't surprise me that temporaries are creating a problem; that
code was always a bit flaky. I think Chris assumed that a temporary
would always be local to a single block, so he wanted to kill them
eagerly as an optimization. In general, having multiple CFG blocks
between the generation of a value and its destructor should not be an
issue, but temporaries are not handled like ordinary values. That
should probably be changed.

Let me take a look, and see what I can do. What's your time frame on
this? If this is blocking you, then just kill the clearTemporaries()
and put a FIXME comment next to it. There's another memory leak and a
few other fixes to consumed analysis that I have to do anyway, so I'll
sit down and see if I can restore sanity to the code.

Thanks!

  -DeLesley

It doesn't surprise me that temporaries are creating a problem; that
code was always a bit flaky. I think Chris assumed that a temporary
would always be local to a single block, so he wanted to kill them
eagerly as an optimization. In general, having multiple CFG blocks
between the generation of a value and its destructor should not be an
issue, but temporaries are not handled like ordinary values. That
should probably be changed.

Let me take a look, and see what I can do. What's your time frame on
this? If this is blocking you, then just kill the clearTemporaries()
and put a FIXME comment next to it. There's another memory leak and a
few other fixes to consumed analysis that I have to do anyway, so I'll
sit down and see if I can restore sanity to the code.

Cool, I'll do that if it blocks us. For now, I wanted to make sure I
understand it correctly. This breaks with some in-progress patch I have to
fix temp dtor handling in the CFG, and I wanted to make sure I'm not
designing this patch into a direction that violates invariants we need to
have in the CFG. (My current patch unconditionally inserts a new block for
the temp dtor, which breaks existing tests in the consumed analysis).

Thanks for you help!
/Manuel

The important thing is that you don't break thread safety analysis,
because the build cops will hate you, but the unit tests for that are
pretty extensive. Don't worry about consumed analysis; I'll work
around any changes to the CFG.

  -DeLesley

The important thing is that you don't break thread safety analysis,
because the build cops will hate you, but the unit tests for that are
pretty extensive. Don't worry about consumed analysis; I'll work
around any changes to the CFG.

I've not run into any problems with thread-safety analysis in my
experiments so far - do you think it has a hidden corner-case dependency on
how temporary destructors are modeled that we might run into later?

No. Thread safety analysis generally ignores temporaries. You can
trust the unit tests. :slight_smile:

  -DeLesley