Clang Static Analyzer execution path for loop

Hi all,

Now I am focusing on the execution path of Clang Static Analyzer. Firstly, I did some tests on code_1.c.

//code_1.c

void func(int arg) {

int v;

v = 0;

int i;

for(i = 0; i < arg; i++) {

v = v + 1;

}

int a, b;

a = v;

b = a;

}

The CFG of code_1.c is http://ww3.sinaimg.cn/large/a74e55b4jw1e3h14gvh8vj.jpg.

Then I tried to get the corresponding execution path sequence. Through test I got its execution path sequence, which is B5-B4-B1-B3-B2-B4-B1-B3-B2-B4-B1-B3-B2-B4-B1-B3-B2. (In code_1.c, the variable arg, which determines the loop times, is a symbolic value. And I used the default max_loop value in Clang Static Anaylzer, which equals 4.)

I tried to do the same tests on code_2.c.

//code_2.c

void func(int arg) {

int v;

v = arg + 1;

if(arg < 314) {

for(int i = 0; i < arg; i++) {

v = arg + 33;

}

} else {

v = arg + 11;

}

int a, b;

a = 62951413;

b = v;

}

The CFG of code_2.c is http://ww2.sinaimg.cn/large/a74eed94jw1e3h13zu3raj.jpg. And its execution path sequence is B7-B2-B1-B6-B5-B1-B4-B3-B5-B1-B4-B3-B5-B1-B4-B3-B5-B1-B4-B3. The rule of tracing path in code_2.c is consistent with the rule in code_1.c.

However, when I tried to do the same tests on code_3.c.

//code_3.c

void func(int arg) {

int v;

v = arg + 1;

if(arg < 314) {

v = arg + 11;

} else {

for(int i = 0; i < arg; i++) {

v = arg + 33;

}

}

int a, b;

a = 62951413;

b = v;

}

The CFG of code_3.c is http://ww4.sinaimg.cn/large/a74ecc4cjw1e3h13ae2xtj.jpg. It seems that some strange things happened. Through test I got its execution path sequence, which is B7-B5-B4-B3-B2-B4-B3-B2-B4-B3-B2-B4-B3-B2-B6-B1. In fact, I expected the execution sequence should be B7-B5-B4-B1-B3-B2-B4-B1-B3-B2-B4-B1-B3-B2-B4-B1-B3-B2-B6-B1.

So, is there anybody can help me explain the execution path of Clang Static Analyzer in code_3.c?

Am I doing something stupid here? Any help will be greatly appreciated.

P.S. The version number of LLVM and Clang in my tests is 3.3.

Hi all,

Now I am focusing on the execution path of Clang Static Analyzer. Firstly, I did some tests on code_1.c.

//code_1.c

void func(int arg) {

int v;

v = 0;

int i;

for(i = 0; i < arg; i++) {

v = v + 1;

}

int a, b;

a = v;

b = a;

}

The CFG of code_1.c is http://ww3.sinaimg.cn/large/a74e55b4jw1e3h14gvh8vj.jpg.

Then I tried to get the corresponding execution path sequence. Through test I got its execution path sequence, which is B5-B4-B1-B3-B2-B4-B1-B3-B2-B4-B1-B3-B2-B4-B1-B3-B2. (In code_1.c, the variable arg, which determines the loop times, is a symbolic value. And I used the default max_loop value in Clang Static Anaylzer, which equals 4.)

I tried to do the same tests on code_2.c.

//code_2.c

void func(int arg) {

int v;

v = arg + 1;

if(arg < 314) {

for(int i = 0; i < arg; i++) {

v = arg + 33;

}

} else {

v = arg + 11;

}

int a, b;

a = 62951413;

b = v;

}

The CFG of code_2.c is http://ww2.sinaimg.cn/large/a74eed94jw1e3h13zu3raj.jpg. And its execution path sequence is B7-B2-B1-B6-B5-B1-B4-B3-B5-B1-B4-B3-B5-B1-B4-B3-B5-B1-B4-B3. The rule of tracing path in code_2.c is consistent with the rule in code_1.c.

However, when I tried to do the same tests on code_3.c.

//code_3.c

void func(int arg) {

int v;

v = arg + 1;

if(arg < 314) {

v = arg + 11;

} else {

for(int i = 0; i < arg; i++) {

v = arg + 33;

}

}

int a, b;

a = 62951413;

b = v;

}

The CFG of code_3.c is http://ww4.sinaimg.cn/large/a74ecc4cjw1e3h13ae2xtj.jpg. It seems that some strange things happened. Through test I got its execution path sequence, which is B7-B5-B4-B3-B2-B4-B3-B2-B4-B3-B2-B4-B3-B2-B6-B1. In fact, I expected the execution sequence should be B7-B5-B4-B1-B3-B2-B4-B1-B3-B2-B4-B1-B3-B2-B4-B1-B3-B2-B6-B1.

Are you saying that ‘v = arg + 11;’ is executed twice on the path you are observing?
How are you observing the execution path sequences? You cannot see the paths by observing the callbacks from the checkers since the analyzer does not make any guarantees on the order in which statements from different paths are processed. In this example, there are more than 2 paths through this program. You might be observing parts of different paths.

Cheers,
Anna.

Hi Anna,

I modified the CoreEngine.cpp and output the CFG-BlockID in the HandleBlockExit() method. So whenever a CFG Block is going to exit, it will output its BlockID. In this way, I could observe the execution path sequences.

I am confused on the CFG execution path of code_3.c. In my tests, I used the Clang Static Analyzer default loop time value, which equals four (defined in CompilerInvocation.cpp:221). So the analyzer will execute a loop for four times. In the execution paths of code_1.c and code_2.c, the four loop executions are separated. For example, in code_2.c, the execution path sequence is B7-B2-B1-B6-B5-B1-B4-B3-B5-B1-B4-B3-B5-B1-B4-B3-B5-B1-B4-B3. B5 is a CFG block which stands for loop condition. It can also be regarded as a branch start point. In this execution sequence, after the execution of B5, the Analyzer traces B1, which can be regarded as the else branch of B5. After tracing the else branch path, the path traces back, and begins to trace the then branch path of B5, including B4 and B3. B4 and B3 are the real body of the for loop. In this way, the subsequence(B5-B1-B4-B3) is repeated for 4 times. The execution path of code_1.c also follows the same rule.

However, the execution path of code_3.c seems to break this rule. The execution path sequence of code_3.c is B7-B5-B4-B3-B2-B4-B3-B2-B4-B3-B2-B4-B3-B2-B6-B1. B4 is a CFG block which stands for this loop condition. B2 and B3 consists of the then path of B4, which can also be regarded as the real body of the for loop. The most strange thing here is that, after tracing the B4, the analyzer begins to trace the then path(B3 and B2), instead of tracing the else path. And the analyzer repeats four times to tracing the then path of the for loop(B4-B3-B2-B4-B3-B2-B4-B3-B2-B4-B3-B2). If we use the same tracing rule to trace code_3.c, I think the execution path should be B7-B5-B4-B1-B3-B2-B4-B1-B3-B2-B4-B1-B3-B2-B4-B1-B3-B2-B6-B1. That is, after tracing B4, analyzer should trace the else path(B1), and then begins to trace the then path(B3-B2).

So, can you help me explain the execution path produced by the Analyzer in code_3.c? Thank you.

Arthur,

The analyzer explores the exploded graph (the graph that contains all path) in SOME order (currently DFS, but could be anything). It does NOT trace paths one after another. You are not tracing the paths but just look at the order in which the nodes of the graph are explored.

Anna.

Jordan has just pointed out that in the third example, the analyzer knows that the loop will execute more than 4 times (we know that arg is more or equal to 314). Because of this, the analyzer is not going to explore the path on which you analyze the loop and then get to the end of the function. That is probably the unusual issue with ex3 you are seeing.

Cheers,
Anna.

Hi Anna,

Thank you and Jordan for your reply to answer my questions. Now I can understand the execution path sequence of ex3. However, it seems that I met another problem with loop execution path.

Here is my test code ex4:

1 void func(int arg) {

2 int v;

3 v = 0;

4 int i;

5 for(int i = 0; i < arg; i++) {

6 v = v + 1;

7 }

8

9 if(arg < 0) {

10 v = 5566;

11 }

12

13 int c;

14 c = v;

15 }

The corresponding CFG of ex4 is http://ww3.sinaimg.cn/large/a74ecc4cjw1e3kuhrthqij.jpg

With analyzer, I get its execution path sequence: B7-B6-B3-B1-B2-B1-B5-B4-B6-B3-B1-B5-B4-B6-B3-B1-B5-B4-B6-B3-B1-B5-B4. It shows that the analyzer traces all possible paths in its CFG since arg(in line 5) is a symbolic value. Analyzer has no idea about whether arg is greater than zero.

Then I replaced the arg in line 5 with a concrete number(say 64, actually any number which is greater than three).

for(int i = 0; i < arg; i++) → for(int i = 0; i < 64; i++)

Then the analyzer gave me the following execution path sequence: B7-B6-B5-B4-B6-B5-B4-B6-B5-B4-B6-B5-B4.

It is right that the loop has been executed for four time, but it seems that the analyzer didn’t

cover all possible paths. In other words, it means that the analyzer didn’t analyze the range from line 9 to line 14 in ex4’s source code. I can’t find the corresponding CFG blocks which represent the part of source code from line 9 to line 14 of ex4 in this execution path sequence.

In addition, I did another four tests. In these four tests, I replaced the loop condition with 0, 1, 2 and 3 for each time. The corresponding execution path sequences are below.

for(int i = 0; i < 3; i++): B7-B6-B5-B4-B6-B5-B4-B6-B5-B4-B6-B3-B1-B2-B1

for(int i = 0; i < 2; i++): B7-B6-B5-B4-B6-B5-B4-B6-B3-B1-B2-B1

for(int i = 0; i < 1; i++): B7-B6-B5-B4-B6-B3-B1-B2-B1

for(int i = 0; i < 0; i++): B7-B6-B3-B1-B2-B1

With these four execution path sequences above, I can get the conclusion that for these four cases, the analyzer analyzed the whole ex4 source code and traced all possible paths in ex4.

So I don’t know why the analyzer didn’t analyze the range from line 9 to line 14 in ex4’s source code when the loop time is more or equal to 4?

Arthur,

You are seeing the behavior I’ve described in the previous reply. When the analyzer knows that the loop will be executed more than 4 times (or whatever the loop unrolling bound is), it is NOT going to continue exploring the path after unrolling the loop 4 times. In order to continue exploring, it needs so simulate/approximate the remaining iterations. Simulating the remaining iterations is a feature that the analyzer does not currently have.

When we don’t know that the loop has to execute at least 4 times, the rest of the function will be explored. In particular, we would explore a path on which exactly one iteration of the loop is executed following by the lines 8-15.

Anna.