RFC clang analyzer false positives?

Hello!

We try to avoid false positives in the static analyzer. In my opinion we should try pretty hard. When there is reason for doubt, I would say it's better with false negatives.

I wonder what you think about these.

1. Dead store

...
     default: // should not happen
          x = 0;
          ASSERT(FALSE); // <- dead store, x is assigned a value that is not read
      }
      return x;
}

In certain configurations the execution is terminated and then the dead store is correct. But in other configurations the "x" value will be returned.

Do you think that it would be acceptable to skip the warning if the noreturn function is in a macro call? And ideally we would warn if the value is not used after the macro but that might be quite hard to achieve (I did not look but it's my assumption).

2. Garbage value.

     #define INVALID ~0
     struct XY { int x; int y; };
     
     inline struct XY getXY(int index) {
       struct XY xy;
       if (index < 12) {
         xy.x = 1;
         xy.y = 2;
       } else {
         xy.x = INVALID;
       }
       return xy;
     }
     
     void f(int index) {
       struct XY xy;
       xy = getXY(index);
       if (xy.y == 0) {} // <- garbage value xy.y
     }

In the real code the "getXY" function is implemented in a header and can be called from many locations.

Clang assumes that the condition "index < 12" in getXY() can be false. Should such assumption really be made since it's a subfunction declared in a header that might be called from many places? If the condition was moved to f() I would agree that we can assume that it can be false - otherwise it would be useless to have it.

I believe we could solve this FP in our code by adding an assertion in f() that makes sure index is less than 12. However such assertion would be misplaced. We want to validate input values as soon as possible. The "index" is validated long before the function f() is called.

Best regards,
Daniel Marjamäki

..................................................................................................................
Daniel Marjamäki Senior Engineer
Evidente ES East AB Warfvinges väg 34 SE-112 51 Stockholm Sweden

Mobile: +46 (0)709 12 42 62
E-mail: Daniel.Marjamaki@evidente.se

www.evidente.se

Some inline answers:

Hello!

We try to avoid false positives in the static analyzer. In my opinion we should try pretty hard. When there is reason for doubt, I would say it's better with false negatives.

I wonder what you think about these.

  1. Dead store

  ...
      default: // should not happen
           x = 0;
           ASSERT(FALSE); // <- dead store, x is assigned a value that is not read
       }
       return x;
  }

In certain configurations the execution is terminated and then the dead store is correct. But in other configurations the "x" value will be returned.

Do you think that it would be acceptable to skip the warning if the noreturn function is in a macro call? And ideally we would warn if the value is not used after the macro but that might be quite hard to achieve (I did not look but it's my assumption).

I think that's a fundamental issue with all deadcode checkers, and i don't think i see a good solution. Unfortunately, the analyzer deals with preprocessed code most of the time, and figuring out what was there before the preprocessing is hard - after all, when the code is preprocessed out, it isn't even compiled; often, it cannot be. So yeah, whenever macros are used for turning code on and off, our deadcode checkers (or other all-path-based checkers, which aren't many i think) are having problems.

For all other checkers, we certainly should not drop asserts. Because they sew away the paths that are very unlikely and will be definitely treated as false positives.

  2. Garbage value.

      #define INVALID ~0
      struct XY { int x; int y; };
            inline struct XY getXY(int index) {
        struct XY xy;
        if (index < 12) {
          xy.x = 1;
          xy.y = 2;
        } else {
          xy.x = INVALID;
        }
        return xy;
      }
            void f(int index) {
        struct XY xy;
        xy = getXY(index);
        if (xy.y == 0) {} // <- garbage value xy.y
      }

In the real code the "getXY" function is implemented in a header and can be called from many locations.

Clang assumes that the condition "index < 12" in getXY() can be false. Should such assumption really be made since it's a subfunction declared in a header that might be called from many places? If the condition was moved to f() I would agree that we can assume that it can be false - otherwise it would be useless to have it.

I believe we could solve this FP in our code by adding an assertion in f() that makes sure index is less than 12. However such assertion would be misplaced. We want to validate input values as soon as possible. The "index" is validated long before the function f() is called.

I agree that this is a problem. Symbolic execution is very much based on psychological assumption that if there's a branch in the program, then the author put it here for a reason. Which means, both paths must be possible.

For inlined calls, that's certainly not true. The author might have never seen this branch.

In my head, the completely untested idea for such cases would be to compute a "realism score" of every path, and give the user a lever for setting realism threshold. For example, branches in the top-level function would score towards "this branch is realistic", branches inside callees would score towards "we're not sure this path is possible", branches with UnknownVal as condition would have a significantly negative realism score (both true and false branch), branches with tainted condition are hugely realistic (the attacker may forge any value here, so we badly need to scan both true and false branch), loop condition branches would probably have lower realism, and so on.

I suspect there'd be a lot of problems with this approach, but that's something i wish to try eventually.