New division by zero check

Hi,

There is a division by zero check where the denominator value is already known. I would like some feedback on a new static check for division by zero I intend to write where the value isn’t known yet.

It shall catch scenarios where the division is made before checking it against a value. To start with it will have to be directly after the division.

e.g

x = 100 / y;

if (y == 0)

I plan to do it by:

  1. When a division is reached where the denominator is a variable with unknown value, “mark” the denominator variable.

  2. When reaching a condition that checks if a variable is 0, check if the variable in the condition is “marked”.

  3. Report bug

I guess RecursiveASTVisitor is best for this?

Is there perhaps a better way to do this that would allow me to follow the variable from division to check so that it doesn’t have to be directly after?

//Anders

Hi, Anders. RecursiveASTVisitor isn't going to handle what you want here: you need something at least flow-sensitive (to know that the if-check actually follows the division) if not path-sensitive (to know that the value hasn't changed since you divided). The thing is, the analyzer's not actually good at answering questions of the form "does X occur on all paths through a function?". There are two reasons for this: first, the analyzer's model of C/C++ is known to be incomplete (e.g. we don't model catching exceptions), and second, we have a performance optimization that says if a function is called and we decide to inline it, we won't analyze it again as top level, which means we're usually not seeing the fully general case.

The next problem is that this can contribute to state explosion: where before we might have been able to merge two paths where a division only happens on one, we'd now be forced to treat them as separate because the denominator is marked along one path.

Ted, Anna, and I have actually talked about this kind of check, not so much for divide-by-zero as for null dereferences. (I forget what we were calling this; something like an "inverted null check", because the check and the dereference are backwards.) This is actually a real security issue a lot of the time; see John Regehr's blog post at http://blog.regehr.org/archives/970.

(It's fun to see Xi Wang cited; he was an LLVM contributor for a while several years ago.)

One idea we came up with is that while doing this as a fully general check is hard, it's probably worth checking this in the very simple case of straight-line code, where the division or the dereference happens in the same CFG block as the test. I still don't know if you want to do this as a CFG walk (with lots of care taken for the variable changing in between the use and the check) or as a full analyzer checker, and we haven't really worked out any of the details, but it does seem to be more feasible than the general check while still being immediately useful.

Let me know if you have any more questions!
Jordan

Thanks for your comments Jordan. Additional questions below.

Hi, Anders. RecursiveASTVisitor isn't going to handle what you want here: you need something at least flow-sensitive (to know that the if-check actually follows the division) if not path-sensitive (to know that the value hasn't changed since you divided). The thing is, the analyzer's not actually good at answering questions of the form "does X occur on all paths through a function?". There are two reasons for this: first, the analyzer's model of C/C++ is known to be incomplete (e.g. we don't model catching exceptions), and second, we have a performance optimization that says if a function is called and we decide to inline it, we won't analyze it again as top level, which means we're usually not seeing the fully general case.

The next problem is that this can contribute to state explosion: where before we might have been able to merge two paths where a division only happens on one, we'd now be forced to treat them as separate because the denominator is marked along one path.

Ted, Anna, and I have actually talked about this kind of check, not so much for divide-by-zero as for null dereferences. (I forget what we were calling this; something like an "inverted null check", because the check and the dereference are backwards.) This is actually > a real security issue a lot of the time; see John Regehr's blog post at http://blog.regehr.org/archives/970.

Ok, i think the null dereference version is on our todo list after this one.

(It's fun to see Xi Wang cited; he was an LLVM contributor for a while several years ago.)

One idea we came up with is that while doing this as a fully general check is hard, it's probably worth checking this in the very simple case of straight-line code, where the division or the dereference happens in the same CFG block as the test. I still don't know if you > want to do this as a CFG walk (with lots of care taken for the variable changing in between the use and the check) or as a full analyzer checker, and we haven't really worked out any of the details, but it does seem to be more feasible than the general check while still > being immediately useful.

How do i restrict my checker to the same CFG block? No i don't intend to do a CFG walk, the variable has to be untouched between division and test. Will i have to check on each assignment that my variable isn't modified or is there an easier way? Is it SymbolRef that should be stored and matched against to follow changes on the variable?

Let me know if you have any more questions!
Jordan

Anders

Hi Jordan can you help me with my questions below, or someone else perhaps?

//Anders

Sorry, this did slip my queue. Thanks for the ping.

One idea we came up with is that while doing this as a fully general check is hard, it’s probably worth checking this in the very simple case of straight-line code, where the division or the dereference happens in the same CFG block as the test. I still don’t know if you > want to do this as a CFG walk (with lots of care taken for the variable changing in between the use and the check) or as a full analyzer checker, and we haven’t really worked out any of the details, but it does seem to be more feasible than the general check while still > being immediately useful.

How do i restrict my checker to the same CFG block? No i don’t intend to do a CFG walk, the variable has to be untouched between division and test. Will i have to check on each assignment that my variable isn’t modified or is there an easier way? Is it SymbolRef that should be stored and matched against to follow changes on the variable?

Oh right. I didn’t reply because I didn’t have a good answer. :slight_smile: The best idea I can come up with is to walk backwards through the ExplodedGraph (from the predecessor node) and see what happens first: the variable is assumed non-zero because it was used in a division, or we hit a BlockEntrance program point. There are a few problems with that:

  • The backwards walk is unbounded. It probably won’t be that large, but it’s not actually limited by anything.
  • We unique nodes in the ExplodedGraph (which is why it’s a graph rather than a tree). This may not be an issue in practice: if two nodes are joined, they have the same state and the same program point, and that doesn’t usually happen along two different paths except when you’re joining up after a branch anyway.

The “obvious” thing to do would be to store the CFG block where a variable was assumed to be non-zero inside the state. The problem there is that then you’ve lost any chance of paths joining up, which means more work for the analyzer.

I’m not quite sure what the right thing is to do. It would be possible to have some kind of “per-CFG-block cache” in the ProgramState that gets cleared when we hit a transition point, either by having a special flag in the ProgramStateTrait or by adding a checker callback for BlockExit (or BlockEdge, or BlockEntrance…whatever’s easiest). I haven’t thought through either of these approaches, but neither of them strikes me as particularly clean. Perhaps there are other ways to solve the problem?

Anyway, I’m happy to answer specific questions, but as far as a more concrete design goes I’m not really any further along than you are.

Jordan

One specific question, in the following example.

if (x > 3)
  var = 77 / x;
if (x == 0){}

When i reach the division in my check, the ProgramState knows that x is greater than 3. I can see this in the ProgramState dump. But how do i access the range in my check?

//Anders
.......................................................................................................................
Anders Rönnholm Senior Engineer
Evidente ES East AB Warfvinges väg 34 SE-112 51 Stockholm Sweden

Mobile: +46 (0)70 912 42 54
E-mail: Anders.Ronnholm@evidente.se

www.evidente.se

Checkers don't have direct access to the ranges, at least not currently, but you can evaluate 'x' using ProgramState::assume and see if the false-state is invalid. That means we already know x is non-zero.

Jordan