Static Analyzer: immortal objects and getting the symbol for 'this'

Hi,

I’m using the Clang Static Analyzer from Clang 3.3. I want to check object’s state when they die. I tried using checkDeadSymbols, but according to the SymbolReaper, the objects of interest never die. This is my test code:

struct Foo
{

int* x;

Foo() { x = new int(10); }

};

int main(int argc, const char** argv)
{

Foo f;

return 0;

}

The int* does die, but f does not. (I cannot check this directly: I just made the SymbolReaper print all symbols are regions that died, using the dead_begin(), etc. and region_begin() etc. methods). Why is this, and how can I work around it?

Even when there is a user-made destructor, which I can get using checkEndFunction, I’m unable to get back the symbol that represents ‘f’. I spent like an hour digging through clang::ento’s reference, and this was my best tip:

const CXXThisRegion* thisRegion = context.getStoreManager().getRegionManager().getCXXThisRegion(
dtor->getThisType(context.getASTContext()),
context.getLocationContext());

Unfortunately, this gives something that doesn’t seem to have anything to do with ‘f’. How can I get the symbol representing ‘f’, using only the CheckerContext and the const CXXDestructorDecl* I can obtain from the LocationContext?

I’m really stuck, and any help would be greatly appreciated. Thank you!

Gabor

Hi, Gabor. Objects that live on the stack don’t have symbols; the properties of a stack object’s region is entirely known and doesn’t need to be symbolic. It’s a VarRegion, not a SymbolicRegion.

The good news is that regions are also uniqued, and so if you’re just looking to identify an object, using the region as the key is often reasonable as well, possibly with a call to stripCasts().

Additionally, symbol death is not quite the same as local variable death. If the value of a local variable is symbolic, and that value gets copied somewhere else, the symbol won’t die when the local variable does. (Symbol information is basically garbage collected—each cleanup starts by crawling through all the live regions and values to see what’s still accessible.) Local variables have very well-defined scope rules, so all of their control is in the CFG.

Rather than using the ‘this’ region and checkEndFunction, why not use checkPostCall and CXXDestructorCall’s getCXXThisVal? (What are you actually trying to do?)

Jordan

Hi Jordan,

Thanks for the answer!

Objects that live on the stack don’t have symbols; the properties of a stack object’s region is entirely known and doesn’t need to be symbolic. It’s a VarRegion, not a SymbolicRegion.

I’m kind of confused about the difference between SVal and SymExpr. I understand that a MemRegion models a region of the memory, i.e. this contains a value that can be obtained using context.getState()->getSVal(memRegion). My impression is that SVal is just a bridge between SymExpr and MemRegion, where SymExpr actually represents the symbolic values assigned during analysis. Is this about right?

Additionally, symbol death is not quite the same as local variable death. If the value of a local variable is symbolic, and that value gets copied somewhere else, the symbol won’t die when the local variable does.

This is the kind of code I’m talking about:

void foo()
{

Foo f;

}

As far as I can tell, ‘f’ here is actually a MemRegion, which seems counter-intuitive to me. I want to be notified when the ‘f’ object dies. checkDeadSymbols doesn’t seem to work, I dump everything every time it’s called and I don’t get anything resembling what I’m looking for. I also tried checkRegionChanges, but that doesn’t seem to tell me either.

What I’m left doing is looking when a dtor is called. checkPostCall is actually a good idea on that, although I’ve also figured out how to get the ‘this’ region from only a CheckerContext:

const CXXThisRegion* thisRegion = context.getStoreManager().getRegionManager().getCXXThisRegion(
dtor->getThisType(context.getASTContext()),
context.getLocationContext());

SVal thisVal = context.getState()->getSVal(thisRegion);

const MemRegion* objectRegion = thisVal.getAsRegion(); // this is ‘f’

What I’m trying to do here is to check what resources the object destructor deallocates. checkDeadSymbols does tell me when a member pointer is freed using delete, but this unfortunately happens after both checkEndFunction and checkPostCall for the destructor, so it is not useful for me. Or it would be, if I had been able to get the MemRegion that contains the SymExpr that died - this would get me the FieldRegion, in which the pointer was stored, and that would be enough.
Unfortunately, I get UnknownSpaceRegion if I try to do this: context.getStoreManager().getRegionManager().getSymbolicRegion(deadSymbol)->getSuperRegion(). This kind of makes sense, since the symbol is dead, so the information about where it has been is probably already lost.

I also tried to do checkPreStmt for CXXDeleteExpr, but here I have no clue as to how to get the SVal/SymExpr/MemRegion that is being deleted.

In the meanwhile, I figured out how to get the FieldRegion that is the argument of the CXXDeleteExpr:

ProgramStateRef state = context.getState();
ASTContext& astContext = context.getASTContext();
MemRegionManager& memRegionManager = context.getStoreManager().getRegionManager();

if(const MemberExpr* memberExpr = llvm::dyn_cast(delExpr->getArgument()->IgnoreImpCasts()))
{
const FieldDecl* fieldDecl = llvm::dyn_cast(memberExpr->getMemberDecl());
const CXXThisRegion* thisRegion = context.getStoreManager().getRegionManager().getCXXThisRegion(
astContext.getPointerType(astContext.getRecordType(fieldDecl->getParent())),
context.getLocationContext());

MemRegionRef thisObjectRegion = context.getState()->getSVal(thisRegion).getAsRegion();

const FieldRegion* fieldRegion = memRegionManager.getFieldRegion(fieldDecl, thisObjectRegion); // this is what I needed
}

(Hope this helps someone, someday.)

This MemRegionManager class is really useful, but this code still has been a headache to put together. (e.g. I got numerous assertion failures until I realized getCXXThisRegion wants a QualType that is a pointer type.) What I was trying to do is working now, but I’m still interested in how to do it more easily…

I also just noticed that implicit destructors (i.e. compiler-generated or defaulted) are ignored by the Static Analyzer. Is this intentional? This makes it impossible to figure out when an object without an explicit destructor is destroyed. Or am I missing something?

I’m a bit late answering these questions now that you’ve had your whole discussion with Anna and Ted, but I’ll try to fill out some info just for future reference.

Hi Jordan,

Thanks for the answer!

Objects that live on the stack don’t have symbols; the properties of a stack object’s region is entirely known and doesn’t need to be symbolic. It’s a VarRegion, not a SymbolicRegion.

I’m kind of confused about the difference between SVal and SymExpr. I understand that a MemRegion models a region of the memory, i.e. this contains a value that can be obtained using context.getState()->getSVal(memRegion). My impression is that SVal is just a bridge between SymExpr and MemRegion, where SymExpr actually represents the symbolic values assigned during analysis. Is this about right?

Well…no. MemRegion models a region of memory, yes, though you may not always be able to get a value. (It doesn’t make sense to ask for the value of a function pointer, for example, which is represented as a FunctionTextRegion.) SVals represent values, symbolic or otherwise; these values can be region addresses (loc::MemRegionVal) or just regular values (e.g. nonloc::ConcreteInt).

Symbols represent values the analyzer does not have immediate information about. As you know, the most common use of symbols is the return value of a function call that the analyzer can’t model, but they’re used in a couple of other places. A key thing to note is that *non-*symbolic values are not represented as symbols—such as stack region addresses, constant literals (not Objective-C runtime literals), function values, and nullptr.

Only symbols can accrue constraints because you have perfect information about everything else.* Symbols and regions are persistent and unique, but values are not, so you can’t ever attach information to an SVal. This has been a source of pain at times (“where did this null come from?”) but in practice it keeps memory usage down and allows us to short-circuit operations involving constants.

  • in theory, anyway. In practice, there are a couple of places this breaks down. Weakly-linked globals are one example.

Additionally, symbol death is not quite the same as local variable death. If the value of a local variable is symbolic, and that value gets copied somewhere else, the symbol won’t die when the local variable does.

This is the kind of code I’m talking about:

void foo()
{

Foo f;

}

As far as I can tell, ‘f’ here is actually a MemRegion, which seems counter-intuitive to me. I want to be notified when the ‘f’ object dies. checkDeadSymbols doesn’t seem to work, I dump everything every time it’s called and I don’t get anything resembling what I’m looking for. I also tried checkRegionChanges, but that doesn’t seem to tell me either.

Right; we don’t consider that a “death” because it’s not an analyzer construct. (A long time ago there was preliminary support for tracking scopes in the analyzer, but that’s been turned off for years and doesn’t actually work now. We could probably make it work, which would help in catching use-after-scope bugs.)

We will actually consider the VarRegion as dead as soon as it is no longer referenced, but that’s not exactly the same thing for types without destructors:

void foo()
{
Foo f;
f.doSomething();
// f is dead here because no one else refers to it.
doSomethingElse()
}

What I’m left doing is looking when a dtor is called. checkPostCall is actually a good idea on that, although I’ve also figured out how to get the ‘this’ region from only a CheckerContext:

const CXXThisRegion* thisRegion = context.getStoreManager().getRegionManager().getCXXThisRegion(
dtor->getThisType(context.getASTContext()),
context.getLocationContext());

SVal thisVal = context.getState()->getSVal(thisRegion);

const MemRegion* objectRegion = thisVal.getAsRegion(); // this is ‘f’

You can do this much more easily by casting the CallEvent down to a CXXDestructorCall.

What I’m trying to do here is to check what resources the object destructor deallocates. checkDeadSymbols does tell me when a member pointer is freed using delete, but this unfortunately happens after both checkEndFunction and checkPostCall for the destructor, so it is not useful for me. Or it would be, if I had been able to get the MemRegion that contains the SymExpr that died - this would get me the FieldRegion, in which the pointer was stored, and that would be enough.

checkPostCall (or checkPreCall) is the way to go here, because that will catch all kinds of destructor invocations: local variables going out of scope, explicit calls using ~Foo(), base destructor calls, and some day ‘delete’. Note that there’s currently no real support for ‘delete’—it just hasn’t been a priority since ‘new’ hasn’t been finished. (See PR12014.)

Unfortunately, I get UnknownSpaceRegion if I try to do this: context.getStoreManager().getRegionManager().getSymbolicRegion(deadSymbol)->getSuperRegion(). This kind of makes sense, since the symbol is dead, so the information about where it has been is probably already lost.

It’s probably clear by now, but since symbols are only used for values the analyzer can’t model, you’ll never have a symbol for a field or variable. (The former will be a FieldRegion within a larger region, the latter a VarRegion.) And for the record, the get*Region() methods are essentially factory methods, not lookup methods, so you can’t use them in general to recreate a region that should already exist.

In the meanwhile, I figured out how to get the FieldRegion that is the argument of the CXXDeleteExpr:

ProgramStateRef state = context.getState();
ASTContext& astContext = context.getASTContext();
MemRegionManager& memRegionManager = context.getStoreManager().getRegionManager();

if(const MemberExpr* memberExpr = llvm::dyn_cast(delExpr->getArgument()->IgnoreImpCasts()))
{
const FieldDecl* fieldDecl = llvm::dyn_cast(memberExpr->getMemberDecl());
const CXXThisRegion* thisRegion = context.getStoreManager().getRegionManager().getCXXThisRegion(
astContext.getPointerType(astContext.getRecordType(fieldDecl->getParent())),
context.getLocationContext());

MemRegionRef thisObjectRegion = context.getState()->getSVal(thisRegion).getAsRegion();

const FieldRegion* fieldRegion = memRegionManager.getFieldRegion(fieldDecl, thisObjectRegion); // this is what I needed
}

Again, if you’re doing this in a preCall/postCall check, getCXXThisVal() is the way to go. If you’re doing this in a pre-CXXDeleteExpr check, you probably just want getSVal on the argument.

I also just noticed that implicit destructors (i.e. compiler-generated or defaulted) are ignored by the Static Analyzer. Is this intentional? This makes it impossible to figure out when an object without an explicit destructor is destroyed. Or am I missing something?

Compiler-generated destructors aren’t ignored, but trivial destructors are. This is intentional, since it simplifies the CFG and general analysis and makes the analyzer do less work, but we wouldn’t have to keep it that way. Still, if the destructor is trivial, what cleanup is it supposed to be doing? (I guess if it’s trivial when it’s supposed to be doing cleanup it’d be a problem. An AST-traversal check might be the way to go for that, though, rather than spending the full efforts of a path-sensitive engine to tell you you didn’t implement something.)

In all of this I’ve kind of lost track of what you’re actually trying to do. I think checkPreCall (not postCall) and filtering for interesting destructors is probably what you want if you’re going to check resource management and cleanup, and you probably want to be thinking about regions rather than symbols, which are just a certain kind of value.

Hope this is helpful, sorry for the delay, please do continue asking questions?
Jordan

Hi Jordan,

Thanks for the reply, a lot of useful information!

Compiler-generated destructors aren't ignored, but trivial destructors
are. This is intentional, since it simplifies the CFG and general analysis
and makes the analyzer do less work, but we wouldn't have to keep it that
way. Still, if the destructor is trivial, what cleanup is it supposed to be
doing? (I guess if it's trivial when it's *supposed* to be doing cleanup
it'd be a problem. An AST-traversal check might be the way to go for that,
though, rather than spending the full efforts of a path-sensitive engine to
tell you you didn't implement something.)

This is a significant issue, as it makes it impossible for me to
efficiently clean up resources used for tracking objects. checkDeadSymbols
only tells me about very-very few deaths, and I haven't yet seen it tell me
that a region has died. Many objects have trivial dtors. For example, I'm
now writing a checker that aims to detect invalidated iterator usages.
Obviously, when an iterator object dies, I can stop tracking it. Granted,
the tracking information is not too significant in this case (2 bools), but
you see the problem.

Point taken. I don’t think we want to be explicitly tracking lifetimes of everything, though. Perhaps you could try using SymbolReaper::isLiveRegion for every time through checkDeadSymbols? Does that have any performance issues?

(It does mean that you might get values lingering in the state when a region dies but no symbols do, but it probably won’t hurt that much.)

New checkers have been patches on cfe-commits (or Phabricator) in the past. There’s usually more back-and-forth than a normal patch to try to make the checker general and conservative, but no formal proposal process or anything. On the other hand, a well-written description of a new checker and its motivation is definitely appreciated.)

It’s not so much a deliberate absence as much as an expensive proposition. Everything anybody does is accessing a value, so if you want to essentially set a watchpoint on that value you’re going to be impacting everything the analyzer is doing. And what about derived values? (e.g. $sym + 1.)

The question is definitely different between tracking variables and fields vs. tracking values. For an iterator-checker, for example, the metadata would be associated with a region, but the important thing to track is the value – a temporary iterator is just as much associated with a container as one with an associated variable, even when it’s an rvalue.

It does seem expensive to track every lvalue-to-rvalue conversion along with every function call (for operator= and such). I don’t think your hypothetical checkVariable (checkValue?) would be any better, though. What do you actually need this for?

Jordan