[StaticAnalyzer] LoopUnrolling measurements

Peter_Szecsi · August 24, 2017, 3:32pm

Hello everyone,

I am working on improving the loop modeling strategy of the Clang Static Analyzer.

In the current state of the analyzer it handles loops quite simple. It unrolls it 4 time by default and than cut the analysis of that path where the loop would have been unrolled more times.

One motivation, if not anything else, can be that there were already questions on this behaviour in the cfe-dev mailing list, even it was thought as a bug.

If we want to have a motivation goal which can be expressed in a more objective way than the coverage of the analysis is the closest what can show the impact of the modeling. Why?

Because the naive loop handling (described above) can easily lead to unsimulated/unchecked code lines. A small example for that:

int f(){
int a[6];

for(int i = 0; i < 6; i++)
a[i] = i+2;
//complex body
}

In this example the analysis will be stopped at the 4th step of the loop and it would not be continued after the loop. So this cause a lot of coverage loss in this case. For this problem I experimented with identifying specific known bound loops which would be worth to completely unroll. (eg. the above loop in f())

At the moment a loop has to fulfill the following requirements to be worth unrolled:

Currently only forStmts can be considered.
The bound has to be an integer literal. (eg. i < 5, i >= MACRO_TO_LITERAL)
The counter variable (i) has not escaped before/in the body of the loop and
changed only in the increment statement corresponding to the loop. It also
has to be initialized by a literal in the corresponding initStmt.
Does not contain goto, switch and returnStmt.
These version is marked as Unroll in the statistics.

In addition to that I run the measurements with a twist. So whenever an unrolled loop is creates new branches then we consider it as a normal (non-unroll) loop in the further analysis. It is done because of the exponential behaviour of the ExplodedGraph (symbolic execution) it could create a lot of new paths which will be analysed but has only a few difference to the one step ahead analyzed path. This is marked as Unroll_v2 in the statistics.

Time measurements: (avg of 5 run)

| curl | libpng | vim | bitcoin | ffmpeg | xerces |

| - | - | - | - | - | - |
Normal | 50.168 | 62.128 | 180.404 | 197.385 | 501.14 | 227.6075 |
Unroll | 54.562 | 64.186 | 205.468 | 1767.475 | 1127.076 | 248.5375 |
Unroll v2 | 52.892 | 63.866 | 203.476 | 198.025 | 693.486 | 237.87 |

Coverage measurements (% of reachable basic blocks statistics)

| curl | libpng | vim | bitcoin | ffmpeg | xerces |

| - | - | - | - | - | - |
Normal | 58.05 | 55.91 | 51.13 | 68.58 | 49.78 | 74.86 |
Unroll | 69.1 | 56.04 | 51.41 | 68.7 | 62.18 | 74.79 |
Unroll v2 | 69.19 | 56.04 | 51.61 | 68.83 | 52.41 | 74.8 |

Summary table which contains the deviation (percentage) to the normal run.

| curl | libpng | vim | bitcoin | ffmpeg | xerces |

| - | - | - | - | - | - |
Unroll – time | 8.76 | 3.31 | 13.89 | 795.445 | 124.9 | 9.19 |
Unroll – coverage | 11.05 | 0.13 | 0.28 | 0.12 | 12.4 | -0.07 |
Unroll v2 - time | 5.43 | 2.8 | 12.79 | 0.324 | 38.38 | 4.5 |
Unroll v2 – cov | 11.14 | 0.13 | 0.48 | 0.25 | 2.63 | -0.06 |

The most outstanding (negative) difference is that the analysis time of project bitcoin increased and took 8x more time than in the normal case. That was single handedly caused by one benchmark file which contains a lot of for loop which makes ~10,000,000 step.
(An improvement can be to not allow unrolling loops which takes N or more steps where N could be a well chosen number - I am running some measurements at the moment with the version N = 128).

The number of findings:

| curl | libpng | vim | bitcoin | ffmpeg | xerces |

| - | - | - | - | - | - |
Normal | 35 | 32 | 81 | 10 | 375 | 52 |
Unroll | 27 | 33 | 81 | 10 | 367 | 48 |
Unroll v2 | 27 | 32 | 81 | 10 | 363 | 52 |

Most of the time it hasn’t changed the founded bugs. On FFmpeg there are new finding which come from the fact that we are able to unroll loops (and analyze the codelines after them). However the unrolling resulted some loss of the findings as well. Most of the time (curl, ffmpeg) these loss was because these bugs were founded in complex functions on long paths which analysis exhausted the maximum limit of the nodes in the ExplodedGraph even in the normal analysis. It happened “faster” with the unroll feature since we so these paths were not analyzed thus the bugs were not found.

In conclusion the unrolling of the above defined specific loops can have a positive impact on the coverage (% of reachable basic blocks), however, it comes with the price that it affects the running time. In most projects (all except curl) the time increase % was higher than the coverage increase %.

So all in all, I think it is a feature which can be useful for a more thoroughgoing analysis but should not be used by default at the time.

What do you think?

Cheers,

Peter

Note: if you are interested in the detailed results they are available on the following site: http://cc.elte.hu/szepet/loop_modeling/unrolling/

Peter_Szecsi · August 28, 2017, 2:35am

Some further measurements with the limit of 128 resulted better statistics!

Time measurements: (avg of 4 or 5 run depending on the project)

NoQ · August 28, 2017, 3:24pm

This sounds good to me to get enabled by default. You've overcome a huge bunch of troubles to get that far, and these results are quite impressing. The analyzer no longer gives up on straightforward paths that take just a bit longer than expected, and even if looping through the same loop more times doesn't bring immediate benefit, getting past the loop to cover blocks behind it seems to pay off very well indeed.

AnnaZaks · August 28, 2017, 4:01pm

Some further measurements with the limit of 128 resulted better statistics!

Time measurements: (avg of 4 or 5 run depending on the project)

| curl | libpng | vim | bitcoin | FFmpeg | xerces | LLVM |

| - | - | - | - | - | - | - |
Normal | 52.63 | 62.872 | 191.9225 | 208.25 | 526.215 | 229.9425 | 5044.62 |
Unroll_limit | 54.402 | 59.946 | 198.5075 | 212.94 | 646.3325 | 234.045 | 5125.96 |
Unroll_limit_v2 | 51.952 | 58.908 | 193.2825 | 200.792 | 543.86 | 225.8675 | 4838.26 |

What mode would you recommend turning on by default? Would you suggest setting the loop iteration limit to 128 and using unroll 2 (or something else)?

How much data are you storing in the state (info about each loop or just cases where the bound is known)? Storing data in the state might effect memory consumption, so it would be important to measure that it did not regress, especially, if you store additional information in most nodes.

You’ve made a lot of progress during the summer. Good work!
Anna

Topic		Replies	Views
[GSOC] Improved Loop Execution Modeling in Clang Static Analyzer - Final Report Clang Frontend	0	128	August 29, 2017
Handling of loops in the Clang Static Analyzer Clang Frontend	5	306	February 28, 2017
Loop unrolling opportunity in SPEC's libquantum with profile info LLVM Dev List Archives	31	351	February 2, 2014
[GSoC 2017] Improve loop execution model of the Clang Static Analyzer Clang Frontend	5	117	March 27, 2017
Loop handling improvement plans Static Analyzer	21	989	January 29, 2025

[StaticAnalyzer] LoopUnrolling measurements

Related topics