Handling of loops in the Clang Static Analyzer

Hi,

I am re-sending the question I asked under a different thread so that the subject is more relevant to the topic.

I did not quite realize it earlier but it seems that the static analyzer unrolls a loop up to a certain number of times and then stops exploring paths beyond that. In the checker I have written, I get the message “Block count exceeded” and then state exploration stops. As a result, my checker give false positives and does not achieve what it sets out to do.

I understand that handling loops is a difficult problem, but is there a work-around available, perhaps, even some source annotation from the user?

Also, would it not be possible to re-start exploration after the loop if there is some code independent of the loop computation. I understand that this independent flow would eventually merge with the computation in the loop in some manner (otherwise the loop computation would probably have been “dead”), but, even then, it may be possible to so some useful analysis with the “independent” code.

Finally, I understand that the checkers have been used with “realistic” test cases. I am curious, how did they work if the test cases had loops in them?

Thanks.

Regards,
Venugopal Raghavan.

Hello!

I did not quite realize it earlier but it seems that the static analyzer unrolls a loop up to a certain number of times and then stops exploring paths beyond that.

Yes that is how it is working. It is not perfect.

I understand that handling loops is a difficult problem, but is there a work-around available, perhaps, even some source annotation from the user?

Perhaps there are work-arounds, I know that people have looked at this.

Maybe -analyzer-max-loop is interesting. It tells Clang analyzer how many times you want to go through loops. The bigger value the better analysis, but slower analysis.

clang -cc1 -analyze -analyzer-checker=alpha,core -analyzer-max-loop 10000 test1.c

Best regards,
Daniel Marjamäki

Daniel Marjamäki Senior Engineer

Evidente ES East AB Warfvinges väg 34 SE-112 51 Stockholm Sweden

Mobile: +46 (0)709 12 42 62

E-mail: Daniel.Marjamaki@evidente.se

www.evidente.se

Hi Venugopal,

Hi Venugopal,

Sean Eveson (cc’d) did some initial work on loop widening to mitigate this problem.

I started to work on this, but have unfortunately not had time to take the next steps. There is a mode which does ‘loop widening’ which is off by default as the state after the loop is completely cleared which can lead to false positives. The idea is to improve on the widening, by changing it to only clear state that might be affected by the loop.

To enable the loop widening pass: “-analyzer-config widen-loops=true”.

Maybe -analyzer-max-loop is interesting. It tells Clang analyzer how many times you want to go through loops. The bigger value the better analysis, but slower analysis.

An issue you will run in to when changing analyzer-max-loop is that it also affects variable-bound loops. If you set the max loop higher to work around your problem you will see some slow down for the concrete-bound loops, but you will see a lot more for the variable-bound ones.

Example:
// Where the value of j is unknown.

for (i = 0; i < j; ++i) {
// …
}
foo(i);

As far as I remember, with a max loop of 1000 and no other limits on the analyzer, foo will be called for all the values between 0…~1000. This is because the analyzer will branch each time it reaches the loop condition.

Regards,
Sean Eveson

Hi,

Many thanks for all your replies. I am trying to see how best I can change my checker/or the way I run my checker taking into account your suggestions. However, I am afraid, I have one more question. Consider the following test case:

test(int q, int z) {
int b;
int x;
int p;

if (z > 0) { // S1
p = q + 1;
x = p + 1;
}

if (x <= (q + 1)) { // S2
b = 0; // S3
}
else {
b = 1;
}
printf("%d\n", b);
}

If (z > 0) is true in statement S1 above, x gets the value q + 2 (propagating through p). In the exploded graph, I do see this being the case. Hence, shouldn’t the constaint solver recognize the fact that the condition (x <= (q + 1)) in statement S2 is always false? However, in the exploded graph, I do not see this condition and the assignment of 0 to b in S3 getting pruned away when I follow the path in which (z > 0) is true. Can’t the constraint solver handle this case? Or have I goofed up in my reasoning with the example (hopefully, my reasoning about the mutual exclusivity of the statements holds true irrespective of the signedness etc.)

Thanks.

Regards,
Venu.

Hi,

Many thanks for all your replies. I am trying to see how best I can change my checker/or the way I run my checker taking into account your suggestions.

I'd add to this that ideally, the current approach to loops shouldn't lead to false positives, because all paths explored this way are still valid, assuming everything else works fine. When something else goes wrong as well, such as loop condition being improperly modeled (eg. we loop from 0 to collection.size() and don't realize that the size is the same in two subsequent loops), then it gets worse. But it'd get worse without loops for the same reason.

If you're trying to solve an all-path problem and rely on full coverage, then you'd have more luck with regular data flow analysis (see DeadStoresChecker), or if you still want to use symbolic execution for that, you'd need to detect lost coverage in the checkEndAnalysis callback (eg. if ExprEngine has work remaining) and suppress all warnings obtained during this analysis if the coverage was lost (i don't think i have examples, this trick wasn't explored much, but i suspect it could work).

However, I am afraid, I have one more question. Consider the following test case:

test(int q, int z) {
   int b;
   int x;
   int p;

   if (z > 0) { // S1
     p = q + 1;
     x = p + 1;
   }

   if (x <= (q + 1)) { // S2
     b = 0; // S3
   }
   else {
     b = 1;
   }
   printf("%d\n", b);
}

If (z > 0) is true in statement S1 above, x gets the value q + 2 (propagating through p). In the exploded graph, I do see this being the case. Hence, shouldn't the constaint solver recognize the fact that the condition (x <= (q + 1)) in statement S2 is always false?

Well, technically not, because ((q + 2) <= (q + 1)) is true when q == INT_MAX - 1. However, i also suspect that the constraint solver may fail to recognize both your and my point: our constraint solver is too simple (for the sake of being fast) to simplify even that much (see also the recent work on adding Z3 to us - the reviews around https://reviews.llvm.org/D28952 ).