[Patch] add KillStruct() to RegionStore

This patch adds KillStruct() to region store. When we assign UnknownVal to a struct, we set the region’s default value to Unknown, and remove bindings for all subregions of the struct region.

-Zhongxing Xu

killstruct.patch (2.1 KB)

Hi Zhongxing,

This is nice. Could you include a test case that would trigger KillStruct? More comments below:

Index: lib/Analysis/RegionStore.cpp

This patch adds KillStruct() to region store. When we assign UnknownVal to a struct, we set the region’s default value to Unknown, and remove bindings for all subregions of the struct region.

Hi Zhongxing,

This is nice. Could you include a test case that would trigger KillStruct? More comments below:

Index: lib/Analysis/RegionStore.cpp

— lib/Analysis/RegionStore.cpp (版本 61900)
+++ lib/Analysis/RegionStore.cpp (工作副本)
@@ -234,6 +234,9 @@

const GRState* BindStruct(const GRState* St, const TypedRegion* R, SVal V);

  • /// KillStruct - Set the entire struct to unknown.
  • const GRState* KillStruct(const GRState* St, const TypedRegion* R);

// Utility methods.
BasicValueFactory& getBasicVals() { return StateMgr.getBasicVals(); }
ASTContext& getContext() { return StateMgr.getContext(); }
@@ -910,6 +913,9 @@
RecordDecl* RD = RT->getDecl();
assert(RD->isDefinition());

  • if (V.isUnknown())
  • return KillStruct(St, R);

When precisely can V.isUnknown() be true when the value is a struct? (test case please).

We already have that test case. It is the f5() in test/Analysis/array-struct.c.

+const GRState* RegionStoreManager::KillStruct(const GRState* St,

  • const TypedRegion* R){
  • GRStateRef state(St, StateMgr);
  • // Set the default value of the struct region to UnknownVal.
  • St = state.set(R, UnknownVal());

Interesting. Do we need the killset anymore, or are we going to model “killed values” in this way?

I think we still need killset. The regions in killset are ones we query its binding directly. The regions in RegionDefaultValue are ones whose subregions we query bindings for. I haven’t had a way to combine them.

  • Store store = St->getStore();
  • RegionBindingsTy B = GetRegionBindings(store);
  • std::vector<const MemRegion*> ToBeKilled;

In general, please use a llvm::SmallVector<> so that we can optimize for the case when the number of items fits on the stack. Using std::vector forces us to malloc() memory every time this method is called (which is slow). My comment below, however, illustrates why we don’t need to use either in this case.

OK.

  • // Remove all bindings for the subregions of the struct.
  • for (RegionBindingsTy::iterator I = B.begin(), E = B.end(); I != E; ++I) {
  • const MemRegion* r = I.getKey();
  • if (const SubRegion* sr = dyn_cast(r))
  • if (sr->isSubRegionOf(R))
  • ToBeKilled.push_back(R);

I think the region to be killed is ‘r’, not ‘R’ (since we just bound a value to it up above).

Yes, my mistake (typo).

  • }
  • for (std::vector<const MemRegion*>::iterator I = ToBeKilled.begin(),
  • E = ToBeKilled.end(); I != E; ++I)
  • store = Remove(store, Loc::MakeVal(*I));

You don’t need to build a vector of killed regions and then iterate through vector because ‘B’ is a purely functional data structure. That is, removing entries from store just creates a new store, and ‘B’ will continue have the same values and the iterator won’t be invalidated. The following will do just fine:

for (RegionBindingsTy::iterator I = B.begin(), E = B.end(); I != E; ++I) {
const MemRegion* r = I.getKey();
if (const SubRegion* sr = dyn_cast(r))
if (sr->isSubRegionOf(R))
store = Remove(store, Loc::MakeVal(r));

  • return StateMgr.MakeStateWithStore(St, store);

Ah, I forgot that point. Is this also known as persistent data structure?

Do we also need to remove region views for the regions that are killed, or should we dispense with that GDM entry entirely?

I think region views are orthogonal to this one. Region views are used to cast region back and forth between typed and untyped ones. Here we only handle the region bindings.

Index: lib/Analysis/RegionStore.cpp

--- lib/Analysis/RegionStore.cpp (版本 61900)

+ if (V.isUnknown())
+ return KillStruct(St, R);
+

When precisely can V.isUnknown() be true when the value is a struct? (test case please).

We already have that test case. It is the f5() in test/Analysis/array-struct.c.

Great! We should add a comment to that test case the says what part of the analyzer/region store is getting tested.

+const GRState* RegionStoreManager::KillStruct(const GRState* St,
+ const TypedRegion* R){
+ GRStateRef state(St, StateMgr);
+ // Set the default value of the struct region to UnknownVal.
+ St = state.set<RegionDefaultValue>(R, UnknownVal());

Interesting. Do we need the killset anymore, or are we going to model "killed values" in this way?

I think we still need killset. The regions in killset are ones we query its binding directly. The regions in RegionDefaultValue are ones whose subregions we query bindings for. I haven't had a way to combine them.

I believe I might be confused about some basic concepts. Here is how I interpreted things.

The "killset" is part of the state, and represents those regions whose values have been changed (via direct assignment) after the entry to the analyzed function. This way we don't need to explicitly bind initial values to regions and instead lazily infer their bindings.

According to comments in RegionStore, the "default value" map (RegionDefaultValue) is used to track "what regions have a default value of 0 if they have no bound value and have not been killed."

From the patch I see 'KillStruct' being used in the following way:

const GRState*
RegionStoreManager::BindStruct(const GRState* St, const TypedRegion* R, SVal V){
   // FIXME: Verify that we should use getRValueType or getLValueType.
   QualType T = R->getRValueType(getContext());
   assert(T->isStructureType());

   RecordType* RT = cast<RecordType>(T.getTypePtr());
   RecordDecl* RD = RT->getDecl();
   assert(RD->isDefinition());

   if (V.isUnknown())
     return KillStruct(St, R)
   ...

Two things occur to me:

(1) We should always be adding any region directly assigned to the killset. Doesn't that include structs?

(2) After assigning the value "unknown" to the struct, the values of its fields should not be 0 (as indicated by the comments for RegionDefaultValue). That doesn't make sense to me. Shouldn't they just be "unknown"?

In other words, regardless of whether V.isUnknown() == true (in the above code snippet) we should always do (1) and then just remove any bindings to struct and it's fields. Adding entries to RegionDefaultValue (i.e., (2)) doesn't make sense to me, so I think I'm just missing something and don't really understand the overall design.

Ah, I forgot that point. Is this also known as persistent data structure?

Yes it is. The term "purely functional data structures" comes from the functional programming mindset. Okasaki has an excellent book on that topic:

http://books.google.com/books?id=SxPzSTcTalAC&dq=purely+functional+data+structures&printsec=frontcover&source=bn&hl=en&sa=X&oi=book_result&resnum=4&ct=result

Do we also need to remove region views for the regions that are killed, or should we dispense with that GDM entry entirely?

I think region views are orthogonal to this one. Region views are used to cast region back and forth between typed and untyped ones. Here we only handle the region bindings.

"Views" are just a specific type of subregions (whose bindings are all getting nuked by KillStruct). The GDM entry "RegionViews" is just a back-mapping from a region to its subregions. If we remove all of those bindings why do we need to keep that backmapping around since none of them bind to any value? Removing stale data from the state allows better caching.

As a meta comment, it would probably be good to add comments to the top of RegionStore that documents the overall design. More specifically, we should mention how the different GDM pieces are used, what's the overall abstraction of how regions bind to values, how we model "views" of regions (i.e., what they are and what purpose they serve) and so forth. I feel that some of my confusion is stemming from not completely understanding the design as well as I thought I did.

Index: lib/Analysis/RegionStore.cpp

— lib/Analysis/RegionStore.cpp (版本 61900)

  • if (V.isUnknown())
  • return KillStruct(St, R);

When precisely can V.isUnknown() be true when the value is a struct? (test case please).

We already have that test case. It is the f5() in test/Analysis/array-struct.c.

Great! We should add a comment to that test case the says what part of the analyzer/region store is getting tested.

Done.

+const GRState* RegionStoreManager::KillStruct(const GRState* St,

  • const TypedRegion* R){
  • GRStateRef state(St, StateMgr);
  • // Set the default value of the struct region to UnknownVal.
  • St = state.set(R, UnknownVal());

Interesting. Do we need the killset anymore, or are we going to model “killed values” in this way?

I think we still need killset. The regions in killset are ones we query its binding directly. The regions in RegionDefaultValue are ones whose subregions we query bindings for. I haven’t had a way to combine them.

I believe I might be confused about some basic concepts. Here is how I interpreted things.

The “killset” is part of the state, and represents those regions whose values have been changed (via direct assignment) after the entry to the analyzed function. This way we don’t need to explicitly bind initial values to regions and instead lazily infer their bindings.

According to comments in RegionStore, the “default value” map (RegionDefaultValue) is used to track “what regions have a default value of 0 if they have no bound value and have not been killed.”

From the patch I see ‘KillStruct’ being used in the following way:

const GRState*
RegionStoreManager::BindStruct(const GRState* St, const TypedRegion* R, SVal V){
// FIXME: Verify that we should use getRValueType or getLValueType.
QualType T = R->getRValueType(getContext());
assert(T->isStructureType());

RecordType* RT = cast(T.getTypePtr());

RecordDecl* RD = RT->getDecl();
assert(RD->isDefinition());

if (V.isUnknown())
return KillStruct(St, R)

Two things occur to me:

(1) We should always be adding any region directly assigned to the killset. Doesn’t that include structs?

Yes. I thought we would never query a struct region’s binding like we never do it to an array region. But later I realized I was wrong. I’ve done this step in the new patch.

And, I think we should add a region to the killset only when it is assigned “unknown” directly. Because if it is assigned other value, we would have its binding in the region bindings. It is redundant to add it to the killset.

(2) After assigning the value “unknown” to the struct, the values of its fields should not be 0 (as indicated by the comments for RegionDefaultValue). That doesn’t make sense to me. Shouldn’t they just be “unknown”?

In other words, regardless of whether V.isUnknown() == true (in the above code snippet) we should always do (1) and then just remove any bindings to struct and it’s fields. Adding entries to RegionDefaultValue (i.e., (2)) doesn’t make sense to me, so I think I’m just missing something and don’t really understand the overall design.

RegionDefaultValue was originally designed to include regions which have default values of 0. But later I find we could have default values other than 0. This is just a case that we need it, since we would query the bindings of the subregions of the struct region that is killed (assigned “unknown”). If we don’t set the struct region’s default value, we would have to add all subregions of the struct region to the killset. Now that we set the struct region’s default value, we can just remove the subregions of the struct region from RegionBindings.

Ah, I forgot that point. Is this also known as persistent data structure?

Yes it is. The term “purely functional data structures” comes from the functional programming mindset. Okasaki has an excellent book on that topic:

http://books.google.com/books?id=SxPzSTcTalAC&dq=purely+functional+data+structures&printsec=frontcover&source=bn&hl=en&sa=X&oi=book_result&resnum=4&ct=result

Thanks.

Do we also need to remove region views for the regions that are killed, or should we dispense with that GDM entry entirely?

I think region views are orthogonal to this one. Region views are used to cast region back and forth between typed and untyped ones. Here we only handle the region bindings.

“Views” are just a specific type of subregions (whose bindings are all getting nuked by KillStruct). The GDM entry “RegionViews” is just a back-mapping from a region to its subregions. If we remove all of those bindings why do we need to keep that backmapping around since none of them bind to any value? Removing stale data from the state allows better caching.

Yes, I’ve done this in the new patch.

As a meta comment, it would probably be good to add comments to the top of RegionStore that documents the overall design. More specifically, we should mention how the different GDM pieces are used, what’s the overall abstraction of how regions bind to values, how we model “views” of regions (i.e., what they are and what purpose they serve) and so forth. I feel that some of my confusion is stemming from not completely understanding the design as well as I thought I did.

Yes. I should have done that. But I didn’t do it because I feel some designs are not fixed.

killstruct2.patch (5.64 KB)

This looks great! Two comments:

+const GRState* RegionStoreManager::KillStruct(const GRState* St,

  • const TypedRegion* R){
  • GRStateRef state(St, StateMgr);

This looks great! Two comments:

+const GRState* RegionStoreManager::KillStruct(const GRState* St,

  • const TypedRegion* R){

  • GRStateRef state(St, StateMgr);

  • // Kill the struct region because it is assigned “unknown”.

  • St = state.add(R);

  • // Set the default value of the struct region to “unknown”.

  • St = state.set(R, UnknownVal());

  • // Remove the struct region from the RegionViews. It could only be a “view” of

  • // its super region.

  • St = RemoveRegionView(St, R, R->getSuperRegion());

Hmm. I’m looking at this again, I guess removing the view of ‘R’ from its super region doesn’t make sense. It’s just the values bound to ‘R’ that change, not the fact that ‘R’ is a view of its super region.

Yes, that was my initial thought. :slight_smile:

  • Store store = St->getStore();

  • RegionBindingsTy B = GetRegionBindings(store);

  • // Remove all bindings for the subregions of the struct.

  • for (RegionBindingsTy::iterator I = B.begin(), E = B.end(); I != E; ++I) {

  • const MemRegion* r = I.getKey();

  • if (const SubRegion* sr = dyn_cast(r))

  • if (sr->isSubRegionOf(R))

  • store = Remove(store, Loc::MakeVal(sr));

I am wondering if we also need to remove bindings for views of ‘sr’. That is, iterate over the views of ‘sr’ and remove any bindings from state. What do you think? I guess it depends on how we decide to bind values in weird cases where people abuse the type system.

I am afraid the cost would be too high, and the logic be too complex for an initial implementation. After all we are doing an approximate simulation, not an exhaustive one. I suggest we leave a FIXME here and fix it when we see a real bug caused by this simplification.

This looks great! Two comments:

+const GRState* RegionStoreManager::KillStruct(const GRState* St,

  • const TypedRegion* R){

  • GRStateRef state(St, StateMgr);

  • // Kill the struct region because it is assigned “unknown”.

  • St = state.add(R);

  • // Set the default value of the struct region to “unknown”.

  • St = state.set(R, UnknownVal());

  • // Remove the struct region from the RegionViews. It could only be a “view” of

  • // its super region.

  • St = RemoveRegionView(St, R, R->getSuperRegion());

Hmm. I’m looking at this again, I guess removing the view of ‘R’ from its super region doesn’t make sense. It’s just the values bound to ‘R’ that change, not the fact that ‘R’ is a view of its super region.

Yes, that was my initial thought. :slight_smile:

Indeed it was. :wink:

  • Store store = St->getStore();

  • RegionBindingsTy B = GetRegionBindings(store);

  • // Remove all bindings for the subregions of the struct.

  • for (RegionBindingsTy::iterator I = B.begin(), E = B.end(); I != E; ++I) {

  • const MemRegion* r = I.getKey();

  • if (const SubRegion* sr = dyn_cast(r))

  • if (sr->isSubRegionOf(R))

  • store = Remove(store, Loc::MakeVal(sr));

I am wondering if we also need to remove bindings for views of ‘sr’. That is, iterate over the views of ‘sr’ and remove any bindings from state. What do you think? I guess it depends on how we decide to bind values in weird cases where people abuse the type system.

I am afraid the cost would be too high, and the logic be too complex for an initial implementation. After all we are doing an approximate simulation, not an exhaustive one. I suggest we leave a FIXME here and fix it when we see a real bug caused by this simplification.

Makes sense. Please take out the RemoveRegionView call (but go ahead and add the method itself) and commit the change.