[analyzer] Speaking about reaching definitions

This is very loosely related to Kristof's GSoC and this is my favorite


void foo1() {
int *a = bar();
int *b = a;
if (b) { /* ... */ }
*a = 1;

This is a valid null dereference bug. Like, 'b' is probably null (otherwise why check?), therefore 'a', which is equal to 'b', may also be null.

Now consider:

void foo2() {
int *a = bar();
int *b = nullptr;
if (coin()) {
b = a;
if (b) { /* ... */ }
*a = 1;

In foo2 we will report a null dereference as well, however the null check for 'b' is well-justified even if bar() never returns null, therefore it's a false positive.

How 'bout we suppress the null dereference warning when the reaching-definition analysis for 'b' that starts at 'if (b)' - i.e. at the collapse point - yields multiple definitions and some of them is a plain null?

Note that the plain-null definition would never be a part of the bug path because it would not have a corresponding collapse point (it's already a concrete null).

I am not sure I follow why do we think that the second example is a false positive.

I think it depends on the user intent. If the user wanted to check if b was reassigned (i.e. checking for the source of the value), and bar never returns a null pointer than it is definitely a false positive. But we cannot be sure what the intent of the check was. What if the user just wanted to check the value regardless of its source.

Yeah, i mean, we cannot be sure, therefore we want to be conservative and not bother the user with a warning. It might be a true positive, but it’s very wonky and there’s nothing in the code that indicates that bar() may return null; the code makes perfect sense even if bar() doesn’t ever return null.

Would it make sense to allow this sort of behaviour to be configurable?

For example, much of the time I might not want to be nagged with “this may be a problem” and would like a pragmatic approach, but if I’m writing some critical code I would like to know “this cannot be proven to be correct” and would like the check to be pessimistic.

These different use-cases can also be adopted when checking legacy code (pragmatic) or new code (pessimistic).

For the pessimistic case, there is still the chance to use information about bar() to drop the warning if it can be shown never to yield a null pointer.

We do occasionally make things configurable but this particular situation strikes me as something that i’d find hard to explain to the users if i add such option. I guess we can eventually add a global “optimism level” option that would tweak a lot of such behaviors.

But even when we do that, we’ll be pretty far away from a verification machine. Even in the most optimistic mode we won’t give any guarantees that we’ll prevent all the bugs of the given kind, so for a seriously critical piece of code you’ll have to use other tools.

I talked with Gábor, and am shamelessly stealing his comment, but I’m doing this for the greater good so it’s not forgotten :^)

I don’t immediately see anything wrong with this heuristic – I mean, we have to see how it behaves, but let’s presume it works. It’s obvious though we would sometimes suppress true positives. How many similar suppressing tricks do we have? Does this mean that a small change in my code in order to solve a bug found by static analysis could trigger some suppression and “untrigger” others? Should I expect seemingly random results after minimal changes? I guess there is an argument to be made with also being conservative with how many suppression techniques do we have, and how would they interact.

Just to be clear I am not against having more heuristics to suppress false positives, I just have some concerns around some scenarios:

1.) Debugging: it will get harder and harder to actually write examples that trigger the desired parts of the analyzer unless we turn some of the heuristics off, but this of course only a minor concern.
2.) Probing: When someone evaluates the analyzer she is likely to do it using little code examples. While I do agree that the analyzer should be optimized for real world code I think it is good if it works for silly little examples as well to give a good impression for the users. When we writes such examples we often do not notice subtle properties of the code like the one you written before. And the user might have the impression that the analyzer is unable to catch those simple examples.

3.) Fixing reports: how can a user be user if she really fixed a problem or only triggered yet another suppression heuristics?
4.) Introducing reports: changing seemingly unrelated parts of the code can “untrigger” suppressions introducing new reports and the user might wonder what change triggered the new warnings and why.

Of course 3) and 4) are already kind of inherent properties of symbolic execution as the analysis coverage can change pretty easily by slightly changing the code but I am still a bit uncomfortable with making this worse without being able to give a good explanation what is going on.

All in all I think a heuristic like this is a good idea if we can give the users a good and intuitive description why the bug was suppressed in a code snippet. And that description should not require the knowledge of reaching definitions or other similar concepts. So I think if we can easily extend the FAQ with such information I am on board with this change. For example, inline defensive check suppression is easy to explain, we can tell the users the analyzer ignore some checks in the callees since they might only be applicable for other callers. If we can have a similar clear description for this one let’s move ahead. What do you think?

Taking another look at the example:It is only false positive if bar never returns null. Letting the analyzer know that this is the case either with an assertion or an annotation makes the issue disappear. I guess this is a question how static analysis friendly one is. But my preferred solution in this case would be to let the analyzer know the invariants of the code rather than suppressing the report. But of course, to truly know the usefulness of the heuristics we need to try it on real world code :slight_smile: