Loop handling improvement plans

DonatNagyE · August 16, 2024, 12:35pm

@NoQ Thanks for the feedback, I think I mostly understand your positions. However, I still think that the widening/invalidation approach is at least worth a try, because:

implementing a prototype for my plans is significantly easier than the work needed for a bug report set post-processing framework (even cleaning up trackExpressionValue is more work than my plan);
the results of Péter Szécsi’s loop widening feature are promising (even if that simple implementation is not sufficient by itself, it’s not terribly bad).

Based on this I’ll start to work on creating a prototype for my plans (by refining the code of Péter Szécsi) and try to get some experimental results as soon as possible.

I think we will be able to continue this discussion more productively once I’ll have some measurements (or perhaps, I find something that blocks my plans).

Yeah I think that’s a very good way to look at this. Our current loop unrolling is like function inlining, whereas a naive widening machine (“let’s drop all state that could have been touched, considering all variables accessed by the loop”) is very similar to conservative evaluation of a function (“let’s drop all state that could have been touched, considering all variables passed as parameters”), and a clever widening machine would be akin to collecting a “summary” of a function […]

There is one fundamental difference between inlining a function and executing some iterations of a loop: if we see a function call foo() we know that it is equivalent to one execution of its body – while a loop like while (condition()) { /*body*/ } may mean zero or one or several executions of the loop body.

We still believe that inlining typically yields better results, […]

If the iteration count can be determined (not necessarily before the loop – it’s enough if the condition is not ambiguous whenever we reach it), I obviously support unrolling the loop (= “inlining its body”). (With the obvious exception that we don’t want to unroll 1000 iterations.)

However, if the analyzer cannot determine the iteration count, then we don’t know what to “inline”, and we get false positives if we inline the wrong amount of iterations.

[…] simply because the state [that it doesn’t discard] is typically much better than nothing.

Discarding real knowledge is bad, but discarding (or preferably not even making) unfounded assumptions is often good.

Which of those is more popular, probably depends on the codebase. Eg., in codebases with “unity builds” such as sqlite, most functions will be inlined (or at least attempted). If you make all loops evaluated conservatively, it could be a big regression.

Fair point, but we could only regress to the level of other projects with a different build system (which is still usable).

Also note that my experiments with Péter Szécsi’s loop widening (which is more rough than what I propose) only added +2-3% results and it is a feature with the explicit goal of increasing coverage, so we expect that it increases the number of results with findings from previously un-analyzed code.

Based on this I think it’s safe to say that we won’t get a big regression.

Another thing that makes the loop situation a bit unusual is the user’s “expectation” that the code is evaluated literally because the code is “right there”.

That’s a good observation, I’ll pay attention to it!

However note that even according to my plans we do evaluate a few iterations (to catch errors within the loop) and we do unroll and evaluate the loop if we can resolve the loop condition.

The loop widening is only applied when the loop condition is opaque, and in that case the user usually won’t expect a certain concrete iteration count – and they may be confused/annoyed if the analyzer assumes a certain concrete iteration count without a reason to do so.

So I’m somewhat worried that every time we fail to account for an effect of the loop (or lack of such effect) we’d be rightfully called out and the false positive will be less “defensible” - depending on how obvious was the invariant we’ve missed.

The invalidation that I’m planning will be conservative enough to invalidate everything that is obviously visible for the user.

More precisely (let’s call this Problem B), I think it is usually impossible for the analyzer to distinguish

(1) “the programmer assumed that all four paths through the two if statements are valid, but made a bug that breaks one of them” and
(2) “the programmer knew that only three of the four paths can occur and created an implementation that doesn’t handle the impossible variant”

because it’s common that the difference between these two is only present in the mind of the programmer (or it’s encoded in opaque functions that we can’t investigate).

I think Problem B applies equally to loops.

I had highlighted Problem B as a point against general thoughts like “it [postprocessing the set of results] is actually capable of reaching zero false positives (because all insufficiently reliable warnings are refuted)”.

We can and should take the conservative (a.k.a. “justified” a.k.a. “cowardly”) approach in some specific situations to limit the amount of false positives (it’s worth to lose some TPs to get rid of a comparable amount of FPs).

However, I claim that we cannot always take the conservative approach, because that would amount to losing practically all TPs (to get rid of practically all FPs) – and that scaled-up trade is a bad one.

I’m saying that we shouldn’t aim for a general conservative solution for the “correlated ifs” problem because it would discard a significant majority of TPs (I feel that the majority of bug paths contain several if assumptions). We may introduce heuristics for certain suspicious if-pairs (e.g. if a variable was invalidated between them), but in general we should act as if unrelated ifs were uncorrelated (and we can take any combination of feasible assumptions).

On the other hand, I’m suggesting that we should not consider the N=0 assumption after the loop because that’s one specific situation and it can discard only a small percentage of bug reports (<1% according to my rough measurements, even if we count the reports that are replaced by a logically equivalent, but slightly different report).

However, I would support an off-by-default analyzer flag (~equivalent to an optin checker) which enables the separate modeling of the N=0 branch (unless the analyzer rules it out). That would be helpful for users who want to specifically look for this corner case and can stomach some additional FPs.

Yeah but, we’re still allowed to make unjustified assumptions. It’s just that we need to demonstrate that the unjustified ones didn’t matter in hindsight. This doesn’t mean that we give up as soon as the program has more than one if-statement.

A widening pass eagerly destroys previous assumptions, no matter how justified, because it absolutely needs to reach a unified state at the end in order to continue path exploration.

My approach doesn’t require that. It’s more relaxed. It’s more like, maybe let’s see if we can justify some of them later, and maybe a lot of them never needed to be justified? If we can’t, we can always give up on them anyway.

I see your point that hindsight has some advantages, but I fear that approximating the real behavior by post-processing a combination of several slightly incorrect execution paths is too complicated to be practical.

I suspect that that a solution that tries to filter and tweak the “raw” results will be a bigger, more complex version of trackExpressionValue() which will end up as an unmaintainable nightmare full of whack-a-mole patches that address certain corner cases.

I’m not claiming that there is nobody who could implement this postprocessing in a sustainable manner, but personally I wouldn’t dare to work on it.

Are you sure this comparison remains favorable when you compare it to the number of core.Dereference false positive divided by the number of all pointer dereference operations? (Including, well, a solid portion of array subscript operations that aren’t performed on immediate arrays?) What about the number of core.uninitialized.* warnings divided by the total number of reads from arbitrary variables?

Fair point, you’re right that those checkers produce much less FPs than ArrayBoundV2.

But I still think that the argument “improved loop handling is particularly important for the array bounds checker” holds true.

You’re probably right, but I’d rather say “improved loop handling is particularly important for having good ranges for numerical symbols”. IIRC I have also seen some analogous issues when I was working with the BitwiseShift checker.

And that’s why you want to improve loop modeling too right?

Yes, I’m focusing on loop modeling because I see that a significant part of the ArrayBoundV2 false positives are caused by errors (unjustified information in the State) introduced by bad loop modeling.

I also considered introducing ArrayBoundV2-specific workarounds, but there I ran into technical difficulties (concrete values don’t have an identity, and trackExpressionValue had several issues, so I don’t want to use it), so I decided to aim for a more general approach.

Even if we have to instantiate an entirely separate analysis that cannot co-exist with other checkers in the same ExplodedGraph, we should seriously consider automatically invoking the engine twice, with different loop unrolling rules, when the array bounds checker is enabled.

I don’t think that we’ll need an entirely separate analysis. IIRC Péter Szécsi’s loop widening was not too far from being on-by-default, and my plans are more refined than that.

On that note, have you explored my old ideas about “emitting the warning before entering the loop”?

I conceptually dislike this non-path-sensitive approach because it “reinvents the wheel” and and I fear that it needs to duplicate lots of logic to cover more than a different set of “toy examples” (especially if we want to be conservative and discard all the loops where we don’t understand something).

However, my long-term plans do include a step that runs the last iteration of the loop after the loop widening, which would be suitable for finding those problems that you’re speaking about.

Right now I’m not thinking too much about these questions, I just wish to finally reach a state where ArrayBoundV2 can be brought out of alpha.

Topic		Replies	Views
Handling of loops in the Clang Static Analyzer Clang Frontend	5	223	February 28, 2017
[GSOC] Improved Loop Execution Modeling in Clang Static Analyzer - Final Report Clang Frontend	0	107	August 29, 2017
[StaticAnalyzer] LoopUnrolling measurements Clang Frontend	3	79	August 28, 2017
one suggestion for Clang static analyzer Clang Frontend	1	88	August 13, 2014
static analysis: covers all possible control flows? Clang Frontend	3	77	January 7, 2015

Loop handling improvement plans

Related topics