MallocChecker/RetainCountChecker bind symbol invalidation logic

Hi all,

In MallocChecker::checkBind and RetainCountChecker::checkBind we have logic to determine if a bind causes a symbol to escape. I was wondering why we require a symbol to have local storage, as it seems a bit conservative to me. Hopefully I understand this correctly...

For example: if we are binding to a global, we stop tracking immediately. It seems to me that we could still track the global symbol on the current path until we make a function call, at which point we can no longer assume we know the value.
Another example, and the one that brought this up: If we have a class field, then the symbol could be tracked until the field or the class escapes in a call. There could be smart ways to track this, e.g field visibility, but the conservative way would be to assume that any call may result in access to the class. Most of the smart ways I can think of would require accurate inter-procedural analysis.

Perhaps we can allow for non-local storage to be continue to be tracked (subject to the other escaping rules) and then purge all non-local symbols when a call is made? Is there any other case where non-local storage could escape before a call?

class->foo = malloc(42); // continue tracking after bind
...
bar(class->foo); // escapes - trivial
bar(class); // escapes - bar might modify foo
class->method() //; escapes, unless method is const?
bar(); // escapes - conservative - bar may have class pointer

Tom

Tom,

Another issue is that the escape code is spread out and copied in several places and it would be nice to have it unified in form of an analyzer callback.

Hi all,

In MallocChecker::checkBind and RetainCountChecker::checkBind we have logic to determine if a bind causes a symbol to escape. I was wondering why we require a symbol to have local storage, as it seems a bit conservative to me. Hopefully I understand this correctly…

For example: if we are binding to a global, we stop tracking immediately. It seems to me that we could still track the global symbol on the current path until we make a function call, at which point we can no longer assume we know the value.

This sounds like the right thing to do; especially now as we are inlining many of the calls and it takes longer to reach the next opaque function call. I am not sure why we specially handle this case - the global variables should be invalidated as part of a function call. You’d also need to allow all the global variables escape when the function exits.

I’ve just ran through our regression tests with the change in MallocChecker and the only failing tests are due to us not handling the escape on return, which shouldn’t be hard to add.

escapes = false;//!regionLoc->getRegion()->hasStackStorage();

Another example, and the one that brought this up: If we have a class field, then the symbol could be tracked until the field or the class escapes in a call.

Malloc checker (possibly RetainCount as well) are not even this smart. We give up when a pointer is assigned to a field in a struct/class. We should associate the pointer with the struct/class object and assume that the pointer escapes when either the object or the pointer itself escape. We ran out of time implementing this one.

There could be smart ways to track this, e.g field visibility, but the conservative way would be to assume that any call may result in access to the class. Most of the smart ways I can think of would require accurate inter-procedural analysis.

Perhaps we can allow for non-local storage to be continue to be tracked (subject to the other escaping rules) and then purge all non-local symbols when a call is made? Is there any other case where non-local storage could escape before a call?

I cannot think of any. (There is a program path on which the global value would escape even before a call (another thread modifies it), but the path we are exploring(with no interleaving) is also a valid one.)

class->foo = malloc(42); // continue tracking after bind

bar(class->foo); // escapes - trivial
bar(class); // escapes - bar might modify foo
class->method() //; escapes, unless method is const?

bar(); // escapes - conservative - bar may have class pointer

I am not following this one. If ‘bar()’ is just any function, how does it have ‘class’ pointer? Is ‘class’ global?

Hi Ana,

Thanks for the feedback - responses inline.

Tom,

Another issue is that the escape code is spread out and copied in several places and it would be nice to have it unified in form of an analyzer callback.

Good point - if I have a decent go at the struct/class/field support then I might try refactoring it out too.

Hi all,

In MallocChecker::checkBind and RetainCountChecker::checkBind we have logic to determine if a bind causes a symbol to escape. I was wondering why we require a symbol to have local storage, as it seems a bit conservative to me. Hopefully I understand this correctly…

For example: if we are binding to a global, we stop tracking immediately. It seems to me that we could still track the global symbol on the current path until we make a function call, at which point we can no longer assume we know the value.

This sounds like the right thing to do; especially now as we are inlining many of the calls and it takes longer to reach the next opaque function call. I am not sure why we specially handle this case - the global variables should be invalidated as part of a function call. You’d also need to allow all the global variables escape when the function exits.

I hadn’t thought about the inlining aspect. I imagine this will be more important as we evolve inlining.

I’ve just ran through our regression tests with the change in MallocChecker and the only failing tests are due to us not handling the escape on return, which shouldn’t be hard to add.

escapes = false;//!regionLoc->getRegion()->hasStackStorage();

Another example, and the one that brought this up: If we have a class field, then the symbol could be tracked until the field or the class escapes in a call.

Malloc checker (possibly RetainCount as well) are not even this smart. We give up when a pointer is assigned to a field in a struct/class. We should associate the pointer with the struct/class object and assume that the pointer escapes when either the object or the pointer itself escape. We ran out of time implementing this one.

There could be smart ways to track this, e.g field visibility, but the conservative way would be to assume that any call may result in access to the class. Most of the smart ways I can think of would require accurate inter-procedural analysis.

Perhaps we can allow for non-local storage to be continue to be tracked (subject to the other escaping rules) and then purge all non-local symbols when a call is made? Is there any other case where non-local storage could escape before a call?

I cannot think of any. (There is a program path on which the global value would escape even before a call (another thread modifies it), but the path we are exploring(with no interleaving) is also a valid one.)

class->foo = malloc(42); // continue tracking after bind

bar(class->foo); // escapes - trivial
bar(class); // escapes - bar might modify foo
class->method() //; escapes, unless method is const?

bar(); // escapes - conservative - bar may have class pointer

I am not following this one. If ‘bar()’ is just any function, how does it have ‘class’ pointer? Is ‘class’ global?

I was a bit vague here. My point was that by taking the conservative view, ‘bar’ may have had access to ‘class’ at some point and retained the pointer. It may not even respect constness (due to a const cast) or field visibility (friend class/function) etc. Without doing more interprocedural analysis or having a complete view of the code, it would be hard to make assumptions without leading to false positives.

class->method() //; escapes, unless method is const?

bar(); // escapes - conservative - bar may have class pointer

I am not following this one. If ‘bar()’ is just any function, how does it have ‘class’ pointer? Is ‘class’ global?

I was a bit vague here. My point was that by taking the conservative view, ‘bar’ may have had access to ‘class’ at some point and retained the pointer.

Are you assuming class is a global here?

If ‘class’ is a local, I don’t see how ‘bar’ can modify it.
If ‘class’ is a global and we don’t know anything about bar(), we should assume that the pointer escapes.

Adding to Anna’s comment, the logic here is overly conservative until we do something smarter for modeling escaping memory in the core analyzer engine.

Essentially, this is a localized hack to avoid a potentially large source of false positives, but it is overly conservative. The problem is that the right solution requires us having a general mechanism of “escaping” memory (as handled by ExprEngine) and then having checker callbacks to respond to such cases, just like we do for invalidating regions, etc. Since that problem hasn’t been tackled yet, this localized hack remains our safest option. It absolutely doesn’t belong in the checker in this form.

This is what I was trying to get to, though the example in my head had some false assumptions. I was thinking of something like:

Object* bar() {
static Object *obj = new Object();
...
obj->var = 0;
...
return obj;
}

void test(Object *obj) {
obj->var = malloc ... // track obj->var
...
bar(); // don't know anything about bar
}

However my example was wrong because any pointer passed in as a parameter would have to be invalidated when we call bar().

Tom