Static Analyzer symbolic execution path

Hi all,

Recently I am working on developing a checker of Clang Static Analyzer. Because Clang Static Analyzer has a powerful symbolic execution engine, I choose Clang Static Analyzer as the platform for my work. My Checker mainly does some analysis work for C source code files. The checker will make use of the program states during its analysis execution and it will generate a report for each analysis.

In my understanding, symbolic execution(Clang static analyzer) will cover all possible paths. For example, here is an sample source code which will be analyzed by static analyzer:

1 void func(int arg) {
2 int v;
3 if(arg > 0)
4 v = arg + 1;
5 else
6 v = arg + 999;
7 int a, b;
8 a = 99;
9 b = a;
10 }

The symbolic execution path can be roughly represented as follow(based on ExplodedGraph):
http://ww2.sinaimg.cn/large/a74ecc4cjw1e1fipmgpy5j.jpg

And the execution sequence is 1-2-3-(1)-4-5. However, is there any possible means to change the execution sequence to be linear? That is to say, is there any possible means to change the execution sequence to be 1-2-(1)-4-5?
Thanks for the help.

Hi, Arthur. The static analyzer is very much built on finding all possible paths through a program – right now there’s no way to tell it that two paths “merge” together in this way. However, in this case it may actually be able to figure this out: because ‘arg’ and ‘v’ are not referenced after the full if-statement, they could get cleaned out of the state being tracked, which means that the second time through the analyzer will say “oh, I’ve seen this state before”.

We would like to do more things like this (“widening” to encompass/join two paths), but it’s a hard problem to decide that two states are “the same”, and if they aren’t the same how to merge them with minimal loss of precision.

Apart from that, though, the analyzer’s not even guaranteed to traverse paths in a depth-first search. Currently it does, but that decision has gone back and forth, and we might try BFS or some mixed search order again in the future. This currently isn’t convenient to change: you have to tweak the constructor of CoreEngine in CoreEngine.h and then recompile.

Finally, though, if you are trying to generate notes along the path of execution, you should do this after the fact using a BugReporterVisitor. This is sort of like a checker for a bug path—you get to see all the nodes on one path that led to a particular bug report.

Hope that helps,
Jordan

Hi all,

Recently I am working on developing a checker of Clang Static Analyzer. Because Clang Static Analyzer has a powerful symbolic execution engine, I choose Clang Static Analyzer as the platform for my work. My Checker mainly does some analysis work for C source code files. The checker will make use of the program states during its analysis execution and it will generate a report for each analysis.

In my understanding, symbolic execution(Clang static analyzer) will cover all possible paths. For example, here is an sample source code which will be analyzed by static analyzer:

1 void func(int arg) {
2 int v;
3 if(arg > 0)
4 v = arg + 1;
5 else
6 v = arg + 999;
7 int a, b;
8 a = 99;
9 b = a;
10 }

The symbolic execution path can be roughly represented as follow(based on ExplodedGraph):
http://ww2.sinaimg.cn/large/a74ecc4cjw1e1fipmgpy5j.jpg

And the execution sequence is 1-2-3-(1)-4-5. However, is there any possible means to change the execution sequence to be linear? That is to say, is there any possible means to change the execution sequence to be 1-2-(1)-4-5?

The analyzer is designed in away such that the checkers should not assume the order in which the exploded graph is built/processed. The checkers can store the information required for processing a successor inside the ExplodedNode instead of relying on the order. The order is decided when CoreEngine is constructed and it should not matter which order is chosen (apart from performance and coverage metrics). Why do you need the order to be different?

Thank you for your reply, Jordan.

As you said, what I want to do is to merge the two paths together. You're
right that in my previous source code sample, 'arg' and 'v' are not
referenced after the full if-statement. And as you said, they could get
cleaned out of the state being tracked. So I've made a little modification
on the code this time and it may be better:

* 1 void func(int arg) {*

* 2 int v;*

* 3 if(arg > 0) *

* 4 v = arg + 1;*

* 5 else*

* 6 v = arg + 999;*

* 7 int a, b;*

* 8 a = 99; *

* 9 b = v;*

*10 }*

Thus, the symbolic execution path can be roughly represented as
follow(based on ExplodedGraph):

http://ww1.sinaimg.cn/large/a74ecc4cjw1e203yily40j.jpg

As you said, merging paths may lead to the loss of precision. I agree with
it. However, in my work, the precision requirement is not very high. For
example, for the same variable 'v', I don't care whether the values of 'v'
conflict after merging. In my work, the performance requirement is
relatively high. So if the if-else blocks are not merged, the analyzer's
original traversing will cost too much time, especially for complex source
code files.

In this way, is there any way to tell the analyzer that "after finish
traversing the block2, you should go to the block4 rather than the block3"?
If this idea is not feasible, is there any other way to realize my goal
even if I have to modify the CoreEngine. And if there was, which part
should I modify? That is, if I want to implement 'merge', is there any
other part of Clang needed to be modified in addition to CoreEngine? CFG,
AST or any other possible part?

What' more, any explanation on the mechanism of CoreEngine and the
construction of ExplodedGraph/ExplodedNode will help me.

Am I doing something stupid here? Thank you for the help. :slight_smile:

Best regards,
Arthur Yoo

Sorry for not responding there… Not stupid, just uncharted territory for this particular program.

For a very high-level view of the analyzer, you can check out our talk from last year’s LLVM Developer Meeting at http://llvm.org/devmtg/2012-11/. It doesn’t have so much of the internals, though.

Right now we have a very strong sense of whether or not a given node has been seen before: if its hash exactly matches a previous node. The hash is built up of the Profile methods you see in the various analyzer classes: the ProgramPoint, to say where we are in execution, and the ProgramState, made up of the Environment, Store, and “Generic Data Map” used by checkers. Each of these is essentially a map.

In order to merge paths, you’d have to find places where you voluntarily throw away information, by going through the state and taking it out. (Block edges do make sense for this.) One of the problems here is that the analyzer has poor reasoning about other paths, e.g. there’s currently no way to ask for “all nodes in this location context at the entrance to this block”. So the bad news is there’s not a great way to tell what information is necessary to throw away. The good news is that the CFG and AST are probably perfectly good for your needs, so you’d just be modifying the analyzer core.

I have to say that while we’ll accept patches that make this sort of thing more possible, it’s not in our near-term plans and we probably can’t help that much. We haven’t even started to scope this work in general ourselves. However, it’s definitely possible that you’ll be able to make faster progress than us for the cases you’re interested in.

Good luck, and feel free to ask us specific questions about how the parts of the analyzer core work. It should probably be better documented anyway. (Watch the presentation first, though. :slight_smile: )

Jordan