[Static Analyzer Query] Why is suppress-null-return-paths enabled by default?

Hi,
I was running the following code through clang SA -

#include <stdlib.h>
int* myAlloca(int i,int maxCount) {
if (i >= maxCount)
return 0;
int* k = (int*) malloc(sizeof(int));
return k;
}

int main() {
int max = 1;
for(int i =0;i< 2;i++) {
int* k = myAlloca(i,max);
*k = 1;
}
return 0;
}

This code will result in Null Deference in the second iteration of for loop.
When i debugged i found that the reason for it is by default null return paths are suppressed by clang SA.

Running the above code with suppress-null-return-paths=false gives the desired result.

Any particular reason why this flag is enabled by default in clang SA?

Isn’t it common in code to return null from a function in case we have a failure and hence can result in deref if used further?

Shouldn’t we be disabling this by default? or am i missing something?

Thanks
Karthik Bhat

Hi, Karthik. It’s true that the suppress-null-return-paths option suppresses many real errors. However, it also handles keeps us from reporting a huge number of false positives. Consider this silly example involving DenseMap:

llvm::DenseMap<const void*, const MemRegion *> map;
const MemRegion *getBaseRegion(const void *key) {
return map[key]->getBaseRegion();
}

In this case, “key” is known to be in the map, so we just look up the value and immediately dereference it. However, the analyzer isn’t smart enough to model a DenseMap, so it considers the case where the key is not in the map, which means a default-constructed value is inserted and returned (in this case, a null pointer). And we get a warning. Using an assert to silence the warning requires rewriting the code in terms of find(), which is not as clean.

Another instance is a case where some API wraps a dyn_cast (or equivalent):

RValue *getOperand(int i) {
return dyn_cast(getRawOperand(i));
}

In the general case, you don’t know if the operand is actually an RValue, but in some particular case (such as dealing with the arguments of a known builtin), you might have that information already. But again, the analyzer doesn’t know that, so if you try to use the result of getOperand() immediately, we get a warning.

if (getOperand(0)->isConstant()) { … }

Silencing this warning requires introducing a new temporary variable. It’s not impossible, but it’s the sort of hassle that makes people give up on the static analyzer.

These two examples are based on a ridiculous number of false positives reported in LLVM before we added this suppression. I think Anna and Ted and I all agree that it could be much better, to catch cases where the use really is plausible and accidental. But it’s hard to differentiate those. So, patches and suggestions welcome.

Hope that answers your question. (This should probably be put in some advanced analyzer FAQ somewhere…)
Jordan

Hi,
I was running the following code through clang SA -

#include <stdlib.h>
int* myAlloca(int i,int maxCount) {
  if (i >= maxCount)
    return 0;
  int* k = (int*) malloc(sizeof(int));
  return k;
}

int main() {
  int max = 1;
  for(int i =0;i< 2;i++) {
    int* k = myAlloca(i,max);
    *k = 1;
  }
  return 0;
}

This code will result in Null Deference in the second iteration of for loop.
When i debugged i found that the reason for it is by default null return paths are suppressed by clang SA.

Running the above code with suppress-null-return-paths=false gives the desired result.

Any particular reason why this flag is enabled by default in clang SA?

This is a part of our rude false positive suppression heuristic. If a NULL pointer is assumed to be NULL in a called function or returned from a called function, we suppress the report.

The idea is that if a pointer is assumed to be NULL in a callee "if (p == 0)", it's often the case that the callee is overly defensive and does not know that the pointer cannot be NULL (inlined defensive checks is the term we use internally to describe it). Some of the reports will not be false positives, but currently, we have no way of telling the difference. Having this suppression gets rid of numerous false positive reports.

Similarly, we suppress the reports where NULL is returned. The justification here is a bit more murky. The main idea is that NULL is returned when some other condition is false, like "i >= maxCount" in your case. However, again, it is often the case of the callee being overly defensive and there are callers that know that the condition is never false. Currently, we do not have a more refined heuristic to tell the difference and we always go for less false positives. Jordan has ran some experiments on llvm codebase and, I believe, he found that this suppression is worth it.

Cheers, Anna.

Hi Anna, Jordan,
Thanks for the response. Yes it does seem reasonable to suppress warning than emit false alarms. I will dig more into core and see if it is possible to differentiate plausible cases as mentioned by jordan.
Thanks!