PR16833 - Analyzer hangs trying to analyze large case ranges

All-

I was looking over 16833 and had a few questions.

The issue is the analyzer tries to concretize all the values in a case
statement with a range value:

    switch (a) {
        case 1 ... 0xfffff: // hang's trying each value in the range
           return 1;
        default:
           return 0;
    }

Is there any reason we can't symbolically evaluate 1 <= a <= 0xfffff,
rather than trying each 'a == v for v in 1..0xfffff'? I've attached a
patch that attempts to do this and it seems to work (it no longer
hangs anyway). Is there a technical/historical reason for doing it
the current way?

The FIXME above the block mentions using '"ranges" for NonLVals'. I'm
not sure what this means, or if any work has been done toward this.
If not, would there be any value in pursing a patch like this in the
meantime to resolve the performance issue?

Thanks,
Zach

16833_hang_on_large_case_range.patch (3.71 KB)

That FIXME, and indeed the code to support case ranges, actually predates RangeConstraintManager, which implements most of what the FIXME is referring to. (That is, the technical/historical reason we couldn’t do it symbolically is “it wasn’t implemented yet”.) So your change is in fact on the right track.

However, what you’ve done is construct the expression “$x >= LOWER && $x <= UPPER”. This is correct, but it’s not a form that RangeConstraintManager will actually handle right now:

void clang_analyzer_eval(int);
void test(int i) {
switch (i) {
case 1 … 5:
return;
}
clang_analyzer_eval(x == 2); // should print FALSE, but will print UNKNOWN with your patch.
}

(clang_analyzer_eval is a special function used to probe the analyzer state, handled by the debug.ExprInspection checker.)

The easiest way to fix this would be to use assumeDual on each expression individually. Then you end up with three cases: $x < LOWER, $x is in range, and $x > UPPER. Unfortunately, you’d then have to restructure the whole function, which currently assumes that you end up with exactly one state that can end up in the “default” branch. That wouldn’t be too hard.

The harder way to solve this is to actually teach [Range]ConstraintManager to recognize constraints like this, and do the right thing. The implementation is already set up to handle disjoint sets of ranges, so you could actually do this without changing everything. It’s probably more messy than you’d like, though.

The sneaky way to do this is to realize that “$x >= LOWER && $x <= UPPER” is actually equivalent to “$x - (LOWER - MIN) <= UPPER - (LOWER - MIN)”, where MIN is the minimum value for the integer type, for integers with wraparound semantics. (RangeConstraintManager actually takes advantage of this in reverse to translate “$x - OFFSET <= VALUE” back into a pair of range disjunctions.) The only trouble with doing something like this is that wraparound is Hard™ and you/we will want to be very careful that this actually handles everything correctly, for all the integer types we care about. A definite benefit of this is that we’re back to only having one state, which prevents potential exponential explosion as analysis continues.

What do you think we should do here?
Jordan

Let me see if I understand what you are saying:

We are able to symbolically evaluate (correctly) the "x >= lower && x
<= upper" expression with SVal's. However, we can't 'remember' this
result because we don't construct a Constraint on the expression, and
even if we did the existing RangeConstraint manager doesn't support
it.

I feel like the RangeConstraint manager is really the best way to fix
this, even if it's not easy. It doesn't seem to be a huge issue
anyway, if someone _really_ wants to analyse a file with such an
expression, they can use the patch i supplied in the mean time.

Unfortunately, looking through the Constraint manager(s) code, I have
no idea where to start.

Zach

Hm, okay. Here’s some basic structure for what happens now:

  1. SimpleConstraintManager::canReasonAbout decides whether or not a symbolic expression is even worth processing.
  2. SimpleConstraintManager::assumeAux (the one that takes a NonLoc) does some preprocessing and canonicalizing so that it can identify the parts of a constraint.
  3. SimpleConstraintManager::assumeSymRel makes sure all the numbers involved are compatible, and then dispatches to the various RangeConstraintManager methods
  4. RangeConstraintManager::assumeSym* handle the six possible relational operators and the minute tricky differences between them, applying the new constraints as intersections with the existing range sets.

If you really want to put this into the constraint manager, I think I would still do this the “sneaky way”, by canonicalizing your “LOW < $sym < HIGH” into “$sym - LOW < HIGH - low” in SimpleConstraintManager::assumeAux. That achieves maximal code reuse. If you weren’t to do it this way, I think you’d have to introduce a new assumeInRange method that could be overridden; this would probably be easier to get right but I’m not sure it would actually add value.

As a side note, I would consider writing your disjunction with & rather than &&. Because of &&'s short-circuiting, you’d never actually see that expression show up in assume() otherwise, and handling & would make it possible to handle similar cases in user code in the future.

Please continue to e-mail with any questions you may have!
Jordan

Looks like Aleksei’s taken some of these ideas and run with them in http://reviews.llvm.org/D5102. Aleksei, I haven’t had time to look at your implementation yet, but if you haven’t seen it yet here’s some of the issues Zach and I were discussing.

Jordan

Thanks Jordan.

I have participated in this conversation and it was the thing that pointed me to the source of analyzer misbehavior I face on some files. This problem was critical for me and, considering your ideas, I have tried to implement a solution that fixes a problem (and it seems to work in my case).

The source of the problem we faced was that we can correctly handle "LOW <= $sym <= HIGH" assumptions but cannot handle their opposite in the _one_ ProgramState as it needed to get correct assumptions for 'default' case. Using given assumes, we can handle "$sym < LOW" or "$sym > HIGH" but not both "$sym < LOW || $sym > HIGH" because there is a non-constant symbolic expression in both parts of an expression so we cannot reason about it now. I have implemented a helper method that returns a pair of states: "LOW <= $sym <= HIGH" in the first state and its opposite in the second one. The first state is used for the current case, second state is used to check another cases until 'default' case.

Zach, sorry for not CC'ing you in review message.

I’m glad you have had more success than me. The road I’ve been going down has caused much head-scratching and little progress.

Zach