Question about checkRegionChanges in static analyzer

I am trying to make the checker work for pointer escape. Since I track the status of objects using MemRegion, so I am using checkRegionChanges. I’ve observed something weird.

Suppose I have free functions whose definitions are not seen by current TU: void f(const S*); void g(S*);

  1. Suppose I have
    S s;
    Then in checkRegionChanges, s’s region will appear in explicit_regions but not regions.

  2. Suppose I have
    S s;
    Then in checkRegionChanges, s’s region will appear in both explicit_regions and regions.

Based on the above observations, if I want to remove s from GDM in case 2 but not 1, then I need to iterate over all entries in GDM and remove the ones that are sub-regions of any one in region variable, not explicit_regions variable. However, this doesn’t work for the following case

  1. If I have struct T { S s; D d; }; and U has some non-const member void D::h(); for the following call:
    T t;
    Then in checkRegionChanges, t.d is in explicit_regions and t is in regions. So t.s will be removed from GDM if I check ‘regions’ variable, which is incorrect.

My question is, is case 1 a bug or a feature that s appears in explicit_regions? If it is by design, how what is the blessed way to distinguish the above cases? Thanks.

BTW: the code that causes the behavior is
if (const MemRegion *R = V.getAsRegion()) {
if (TopLevelRegions)
All regions are added to TopLevelRegions, without checking RegionAndSymbolInvalidationTraits::TK_PreserveContents.

Weird, I was super sure I answered that back then. Probably misclicked something :confused: Sorries.

Normally you shouldn't be using explicit_regions at all. They are tweaked for the use-case of RetainCountChecker which behaves very differently from other checkers, because invariants imposed by Apple memory management policy for Foundation and CoreFoundation are about maintaining a clear ownership of the object, not simply making sure it's released at some point - which is not how most checkers work.

For the purposes of invalidation you should not really care if the region was explicitly passed into the function or is implicitly retrievable from within the call by exploring the heap pointed to by other arguments or globals. In either case the object's state is not known anymore.

And the non-explicit regions array seems to work just fine: when the object is passed by a const pointer or reference, its contents are not invalidated, because, well, you cannot (or rather should not) mutate an object through a const pointer.

Now, speaking of 't' and 't.d' - this behavior is also by design (documented in docs/analyzer/RegionStore.txt), because it is possible to mutate t.s through a non-const pointer to t.d by using completely safe pointer arithmetic: (this - offsetof(T, s) + offsetof(T, d)) and such. This is definitely a language feature that some projects use (but definitely not many). I think you should be able to partially suppress it on the checker side, probably by looking for sub-regions of the region in the explicit_regions list (and see what happens). We've been thinking about investigating possibilities to make this feature less aggressive but it requires a very careful investigation because false positives are more scary than false negatives, and most of the time extra escaping causes more false negatives than false positives.