[analyzer] Toning down invalidations?

So far we've been preferring aggressive invalidation of large chunks of the program state whenever we encounter anything we cannot model, be it a function without source code or a language construct we don't support. As i've mentioned recently, it does both eliminate certain kinds of false positives (when we're uncertain we assume less incorrect stuff) and introduce other false positives (when we're uncertain, we may think that something might be possible, while in fact it isn't). Hence, invalidation is a trade-off. Hence the question: does everybody like the current trade-off? I'm particularly thinking about two parts of it:

* Invalidating base region when a field is invalidated (as the whole base region is reachable through safe pointer arithmetic).
* Invalidating heap regions passed to us by const pointers when heap is invalidated (as a non-const pointer may exist elsewhere).

Please let us know if you have other cases you'd like to think about :slight_smile:

So far we've been preferring aggressive invalidation of large chunks of
the program state whenever we encounter anything we cannot model, be it a
function without source code or a language construct we don't support. As
i've mentioned recently, it does both eliminate certain kinds of false
positives (when we're uncertain we assume less incorrect stuff) and
introduce other false positives (when we're uncertain, we may think that
something might be possible, while in fact it isn't). Hence, invalidation
is a trade-off. Hence the question: does everybody like the current
trade-off? I'm particularly thinking about two parts of it:

* Invalidating base region when a field is invalidated (as the whole base
region is reachable through safe pointer arithmetic).

I think even if we want to be overly conservative this can be mitigated
once the code is modular, since this should only possible if the definition
of the class is available in the translation unit where the called function
is defined. But since most of the functions are well behaved in a sense
they won't touch anything other than the field I think it would be a good
default to not to invalidate the whole region and introduce an annotation
to mark functions that do such pointer arithmetic to suppress false
positives resulting from the lack of invalidation. But I would expect those
cases to be very rare (although this only based on intuition I did not do
any esearch yet).

* Invalidating heap regions passed to us by const pointers when heap is
invalidated (as a non-const pointer may exist elsewhere).

What events trigger heap invalidation? Can this effect be mitigated by a
conservative points to analysis?

Hi!

What is the status of this?

I was distracted and never got to actually do this, but i still think it’s a good idea to try out. Your results look very promising, yay. I totally agree that systems of mutually-canceling bugs are worth untangling even if the amount of false positives temporarily increases.

P.S. A related issue - if i go for this, i’d probably start with relaxing the C++ container inlining heuristic, i.e. replacing it with visitor-based suppressions, so that to still enjoy the benefits of inlining.

P.P.S. Mildly related - i noticed that it shouldn’t be all that hard to model extents of bindings within RegionStore, so that bindings to sub-structures didn’t overwrite bindings to super-structures simply because they have the same base region and the same offset. The only problem here is to model extents of integers because we don’t represent casts as part of SymbolRefs. All other sorts of SVals have well-defined extents (including, say, lazy compound values).

P.P.P.S. Not really related - just wanted to share an example of a curious false positive due to lack of invalidation that i’ve seen recently:

int test(int **x) {
int *y = *x;

if (*y == 0)
invalidate(x);

// should not warn
return 1 / *y;
}