[PATCH] expand a symbolic pointer

Conditionally expand a symbolic pointer into a pointer to a real struct. In the following code,

struct x {
int partno;
};
int foo(struct x *pt)
{
int found;
if(pt->partno != 1) {
found = 0;
}
if(pt->partno == 1 || !found) {
return 1;
}
return 0;
}

when we visit pt->partno, an AnonPointeeRegion is created lazily for pt. Moreover, only when pt is feasible not null, the region is created. This fixes PR2304.

expand.patch (9.06 KB)

Hi Zhongxing,

I like the main ideas about the patch, but I'm a little concerned about placing this logic in StoreManager::getLValueField.

Do you feel that StoreManager::getLValueField should do the work for both 'x.f' and 'x->f'. That is, the latter can be rewritten as: (*x).f. It seems to me that the (*x) portion is what should lazily create the AnonPointeeRegion, and at that point do the null check. Then, getLValueField just reasons (and only reasons) about "SOMETHING.f", where SOMETHING is an SVal.

The nice thing about decoupling things in this way is that:

1) StoreManager::getLValueField and GRStateManager::getLValueField stay simple; they only need to reason about dot-accesses.

2) GRExprEngine can do the heavy lifting of reasoning about the syntax. As I said, x->f is the same as (*x).f. There is no reason that StoreManager or GRStateManager should be reasoning about syntax in this way. Let's keep the complexity of reasoning about syntax in one place (i.e, GRExprEngine), and have the other components just reason about pure semantic operations (if possible).

3) Having GRExprEngine handle the syntax allows different StoreManagers to automatically "get it right" with regards to both (*x).f and x->f if they decide to reason about fields. If we need an interface to conjure up new regions (AnonPointeeRegion), then we should add it to StoreManager.

I also found the following FIXME interesting:

+ // FIXME: Should also update the value of the base expr. It's no longer a
+ // symbolic pointer.

In what ways is it no longer symbolic? What does it mean for a region to be symbolic? A symbol is just an abstract name. If we rename the region, what would its new name be?

We should probably put some more effort into defining the concept of a symbolic region. Also, if we were to alpha-rename a region, we would need to do it throughout the entire state (environment, store, and gdm). Renaming/replacing regions in a GRState once we start using them is something that will be challenging to do, and we should think about whether or not it is necessary.

New patch.

  • A new method CheckStructBase is taking care of the creation of AnonPointeeRegion.
  • A lightweight lvalue evaluation method is used to get the location of the base expr
    whose store may be changed.
  • AnonPointeeRegion is now identified the the memregion of the pointer pointing to it.

Hi Zhongxing,

I like the main ideas about the patch, but I’m a little concerned about placing this logic in StoreManager::getLValueField.

Do you feel that StoreManager::getLValueField should do the work for both ‘x.f’ and ‘x->f’. That is, the latter can be rewritten as: (*x).f. It seems to me that the (*x) portion is what should lazily create the AnonPointeeRegion, and at that point do the null check. Then, getLValueField just reasons (and only reasons) about “SOMETHING.f”, where SOMETHING is an SVal.

The nice thing about decoupling things in this way is that:

  1. StoreManager::getLValueField and GRStateManager::getLValueField stay simple; they only need to reason about dot-accesses.

  2. GRExprEngine can do the heavy lifting of reasoning about the syntax. As I said, x->f is the same as (*x).f. There is no reason that StoreManager or GRStateManager should be reasoning about syntax in this way. Let’s keep the complexity of reasoning about syntax in one place (i.e, GRExprEngine), and have the other components just reason about pure semantic operations (if possible).

  3. Having GRExprEngine handle the syntax allows different StoreManagers to automatically “get it right” with regards to both (*x).f and x->f if they decide to reason about fields. If we need an interface to conjure up new regions (AnonPointeeRegion), then we should add it to StoreManager.

I also found the following FIXME interesting:

  • // FIXME: Should also update the value of the base expr. It’s no longer a
  • // symbolic pointer.

In what ways is it no longer symbolic? What does it mean for a region to be symbolic? A symbol is just an abstract name. If we rename the region, what would its new name be?

This fixme is no longer relevant. GRStateManager::CheckStructBase takes care of updating the base expr value.

By ‘symbolic pointer’ I simply means loc::SymbolVal. If we create an AnonPointeeRegion, we should replace it with the loc::MemRegionVal of the AnonPointeeRegion.

I think SymbolicRegion is not involved in this problem. All regions we deal with here are concrete. Only their bindings may be loc::SymbolVal.

expand2.patch (9.17 KB)

Nice work! This is definitely going in the right direction! Comments inline:

Index: include/clang/Analysis/PathSensitive/Store.h

Thanks for your comments. They were helpful. Now the interfaces get simplified a lot. Variables and functions are commented well.

Here I explain the main idea.

Some struct base is not mutable, e.g., struct s a; a.f = 3; Here the struct base ‘a’ is not mutable. But some can, e.g.,

void foo(struct s* p) {
p->f = 3
}

When we getInitialStore(), p is initialized to have value loc::SymbolVal, which I call a symbolic pointer. (Note: here no symbolic region is involved).

In this patch we try to concretize the struct pointer p. We have several preconditions for this concretization:

  • the struct base is a pointer
  • the pointer is symbolic
  • the pointer is an lvalue, so that it can be modified.

When concretizing the struct base pointer, we pass the pointer’s location to StoreManager. Therefore the pointer should be an lvalue and has a location. We get this location by calling
GRExprEngine::getBasePtrLoc(). This method only handles a few cases. In other cases, BasePtrLoc is left Undefined intentionally.

These are the main ideas. Other issues were commented in the patch

concretize.patch (6.68 KB)

Hi Zhongxing,

The patch is looking great, but I still have some concerns about the overall approach. Consider your example with one modification:

void foo(struct s* p) {
   struct s* q = p;
   p->f = 3
}

After evaluating 'p->f' p will bind to an AnonPointeeRegion, but 'q' will bind to the original symbol. They are aliases. It gets arbitrarily worse if the value of 'p' is stored in other places, passed to functions, etc.

This overall approach feels a little "destructive." We're losing information here. After evaluating 'p->f' we've lost the fact that 'p' *still* binds to the value at the entry to the function.

I see this as being fundamental problematic for four reasons:

(1) Different analyses may attach information (perhaps using the GDM) to a symbolic value. Actually replacing a symbolic value with something "concrete" loses that information.

(2) Consider function summaries. We don't have them yet, but basically we would build up a set of constraints and effects in terms of the input to the function. That input is the symbolic values. Even if we "concretize" them we need to keep them around.

(3) Aliasing (i.e., the example I gave). Concretization via replacing loses this critical information.

(4) The BugReporter "module" needs to reconstruct information from the ExplodedGraph, including when aliasing relationships come into effect, etc. Blowing away the original value of 'p' makes this difficult at best.

The only way to replace a symbolic value with a concrete value is to alpha-rename it throughout the entire state. This isn't trivial, especially when one considers the GDM.

I honestly think we need to take a "layered" approach. Keep a map on the side from "symbols" to concrete values. That's basically what we do in ConstraintManager when it comes to equality relationships with integers. Equality relationships with integers is a form of concretizations; that's all were basically doing here, except we're reasoning about the concretization of the heap.

What do you think? I have specific comments about your patch (which is looking great!) but I will defer them until we iterate more on this design point.

Ted

Thanks for bringing up these issues. I agree that we should keep the original symbolic value around, and do the conretization with a side mapping. I’ll try to implement the idea and see what problems would come up.

Now we use a GDM to map a symbol to its concretized region. We no longer need the base struct pointer to be an lvalue. It only need to be a symbolic value.

Known problem: we need a way to iterate over live symbols, otherwise the values of AnonPointeeRegions will be swept.

c2.patch (7.58 KB)

Now we use a GDM to map a symbol to its concretized region. We no longer need the base struct pointer to be an lvalue. It only need to be a symbolic value.

Great! Many of my comments are still of design, and I'm not convinced we need an AnonPointeeRegion. We also may not need to have a separate GDM entry; just use the ConstraintManager to determine if a region is concrete or not if its symbol is not null.

Comments inline!

Known problem: we need a way to iterate over live symbols, otherwise the values of AnonPointeeRegions will be swept.

I don't think this is a problem if we just use SymbolicRegions instead of AnonPointeeRegions. More comments below.

Now we use a GDM to map a symbol to its concretized region. We no longer need the base struct pointer to be an lvalue. It only need to be a symbolic value.

Great! Many of my comments are still of design, and I'm not convinced we need an AnonPointeeRegion. We also may not need to have a separate GDM entry; just use the ConstraintManager to determine if a region is concrete or not if its symbol is not null.

Comments inline!

Known problem: we need a way to iterate over live symbols, otherwise the values of AnonPointeeRegions will be swept.

I don't think this is a problem if we just use SymbolicRegions instead of AnonPointeeRegions. More comments below.

Hi Ted,

Your comments are inspiring. I feel that there are some more fundamental problems that should be solved before this one. Or, this one is a special case of how we treat assumptions on symbolic values (including pointer and non-pointers) and lazy binding of locations.

For me things got clearer as we dove into the design. Modeling the heap symbolically is a tricky concept, but I think we’re making good progress towards a clean formulation and implementation.