Hello everyone,
During the summer I was working on improving the loop modeling in the Clang Static Analyzer.
NOTE: The full length report, which also describes various implementation details and results of quantitative evaluation of the implemented features, can be found here.
- Motivation and aims:
In the current state of the analyzer, it handles loops quite simple. It unrolls them 4 times by default and then cut the analysis of that path where the loop would have been unrolled more times. This behavior is reached by having a counter for every CFG blocks which determines that how many times that block was already visited. The analyzer checks it for every encountered block if the number of visits is more than 4 on the simulated path. If yes, then it aborts the simulation on that path.
The loss in code coverage is one of the problems with this approach to loop modeling. Specifically, in cases where the loop bound is statically known to be greater than 4, the analyzer often did not analyze the code following the loop. Thus, the naive loop handling (described above) could lead to unchecked code. Here is a small example for that:
void f() {
int a[6];
for (int i = 0; i < 6; i++){
a[i] = i+2;
}
//complex body
}
In the above example, because the loop bound is a statically known value, the analysis will be stopped at the 4th iteration of the loop and the “complex body” after the loop will not be analyzed.
- The working process
My work on improving the loop modeling primarily can be divided into 2 tasks:
-
Loop Unrolling - Work up heuristics and AST patterns in order to find specific loops which are worth to be completely unrolled. All patches for this work have been reviewed and committed to the clang repository which are the following:
-
The initial implementation: D34260
-
Changed the CFG of the clang compiler to represent loops more directly: D35668 and D35670
-
Update the Loop Unrolling feature to use the newly added CFG affordances: D35684
-
Evaluated the option on different C/C++ open source projects, and measured performance as well as coverage to choose the best defaults.
-
New heuristics and fixes added in: D36962, D37103, and D37181
-
Loop Widening - The Loop Unrolling work laid the infrastructure for future improvements in loop modeling. In the last part of the summer, I started investigating approaches to Loop Widening, where the analyzer simulates the execution of an arbitrary number of iterations. There is already a solution which reaches this behavior by discarding all of the known information. My aim was to give a more precise solution/approach for widening. I submitted the following patches for review:
-
The initial patch which only invalidates the possibly changed variables but does not handle every possible case. In this scenario if we encounter a not handled statement we decide not to widen the loop. D36690
-
In order to handle nested loops better, the analyzer tries to reanalyze the loops with a widened state which execution was aborted due to visiting the same block too many times on a given path. D37108
-
Usage
Currently, both of the features are hidden behind a flag. In order to use the loop unrolling feature, the user has to pass the ‘unroll-loops=true’ config option to the analyzer.
For widening you have to download and apply the patches from the Phabricator and build the clang with these changes.(D36690 and D37108) It is behind the flag ‘widen-loops’ so you have to pass the config option ‘widen-loops=true’ to the analyzer in order to use this feature.
- Results
The loop unrolling results were summarized and sent to the cfe-dev mailing list (link). The feature increased the coverage of the analysis by 2.3% in average on the investigated projects while the analysis time remained the same (or even went down). Based on these results I would suggest using the loop unrolling feature by default.
The widening results and the question about changing the functionality of the flag ‘loop-widening’ was sent to the cfe-dev as well: link. Although has a small observable coverage loss (2.3% in average comparing to Widen) it’s offset by the number of the false positives not discovered. In conclusion it would be beneficial to replace the current implementation of the ‘widen-loops’ option.
Regards,
Peter Szecsi