Clang Analyzer false positives with relations between variables

I'm working my way through a full Clang Analyzer report of FreeBSD source code: http://scan.freebsd.your.org/freebsd-head/, fixing bugs in FreeBSD and reporting false positives.

First of all, someone at LLVM just shaved 1000 reports off the list since Sunday. Thanks!

There's a large group of false positives that fall in the same category where the analyzer doesn't sufficiently account for relations between variables. Here's an example from http://llvm.org/bugs/show_bug.cgi?id=13426:

int foo(int y, int z) {
    int x;
    if (y == z) {
        x = 0;
    } else {
        if (y != z) {
            x = 1;
        }
    }
    return x;
}

which warns that x may be uninitialized. Here's a more real-world example in FreeBSD: http://scan.freebsd.your.org/freebsd-head/usr.sbin.mtree/2012-09-30-amd64/report-KuXNHJ.html#EndPath

But according to the implementation of parsekey() (called at line 177) at http://svn.freebsd.org/base/head/usr.sbin/mtree/misc.c then "value" is always 1 when "type" is F_FLAGS and thus val is always initialized at line 178 and thus the reported situation can never occur.

My question is how hard this would be to implement, at least starting with the simple example? Where would the code go in the LLVM tree?

Kind regards,
Erik

I'm working my way through a full Clang Analyzer report of FreeBSD source code: http://scan.freebsd.your.org/freebsd-head/, fixing bugs in FreeBSD and reporting false positives.

First of all, someone at LLVM just shaved 1000 reports off the list since Sunday. Thanks!

There's a large group of false positives that fall in the same category where the analyzer doesn't sufficiently account for relations between variables. Here's an example from Invalid Bug ID

int foo(int y, int z) {
   int x;
   if (y == z) {
       x = 0;
   } else {
       if (y != z) {
           x = 1;
       }
   }
   return x;
}

which warns that x may be uninitialized. Here's a more real-world example in FreeBSD: http://scan.freebsd.your.org/freebsd-head/usr.sbin.mtree/2012-09-30-amd64/report-KuXNHJ.html#EndPath

But according to the implementation of parsekey() (called at line 177) at http://svn.freebsd.org/base/head/usr.sbin/mtree/misc.c then "value" is always 1 when "type" is F_FLAGS and thus val is always initialized at line 178 and thus the reported situation can never occur.

My question is how hard this would be to implement, at least starting with the simple example? Where would the code go in the LLVM tree?

The first action toward fixing the simple example, would be to add alpha-remaning support to the analyzer's constraint manager. While performing symbolic execution of the program, we cannot record the fact that x == y, so even this simplified example will not work:

int foo(int y, int z, int *p) {
  int *x;
  if (y == z)
    x = 0;
  if (y == z)
    x = p;
  return *x; // False positive: null pointer dereference reported.
}

This would not guarantee that the second example will be solved. For example, it looks like the 'parsekey()' function is in a separate translation unit. The analyzer is not yet capable of reasoning across translation unit boundaries.

One could argue that the fact that parsekey's return values have the dependency has to be recorded by the programmer. Without a better mechanism, an assert could be helpful.

Cheers,
Anna.

The first action toward fixing the simple example, would be to add alpha-remaning support to the analyzer's constraint manager. While performing symbolic execution of the program, we cannot record the fact that x == y, so even this simplified example will not work:

int foo(int y, int z, int *p) {
int *x;
if (y == z)
   x = 0;
if (y == z)
   x = p;
return *x; // False positive: null pointer dereference reported.
}

Thanks for the explanation. It's a bit over my head to implement but nice to know what's going on.

This would not guarantee that the second example will be solved. For example, it looks like the 'parsekey()' function is in a separate translation unit. The analyzer is not yet capable of reasoning across translation unit boundaries.

One could argue that the fact that parsekey's return values have the dependency has to be recorded by the programmer. Without a better mechanism, an assert could be helpful.

I'll have a look at it again.

Erik