Extend Static Analyzer with Abstract Interpretation

Hi Team,

    I'm a research assistant in IIS, AS, Taiwan. Our lab is working on
topics in abstract interpretation,

and we are looking for a framework to implement and test our ideas. Then, I
noticed this project.

I've read the developer's manual on website, the slides in 2008, and the
discussion thread below.

http://lists.cs.uiuc.edu/pipermail/cfe-dev/2009-January/003996.html

Also I found this paper, "A Memory Model for Static Analysis of C Programs".

The Front-end and the memory model are quite attractive, so I want to build
our work based on them.

The website says the engine do symbolic execution. I think it can be
modified for abstract interpretation.

Hence, I need some advice on where should I hack to achieve followings.

(i) Apply another algorithm to traverse CFG for finding "fixed-point".

(ii) Specify "abstract domain" which to my knowledge is similar to symbolic
value and constraints.

(iii) Define state and transfer function based on the domain.

(iv) Other things I didn't consider, e.g. Someone has already done this for
me, and I should use his.

Thank you.

Hsieh, Chiao
bridge@iis.sinica.edu.tw

Hi Team,

I’m a research assistant in IIS, AS, Taiwan. Our lab is working on
topics in abstract interpretation,

and we are looking for a framework to implement and test our ideas. Then, I
noticed this project.

I’ve read the developer’s manual on website, the slides in 2008, and the
discussion thread below.

http://lists.cs.uiuc.edu/pipermail/cfe-dev/2009-January/003996.html

Also I found this paper, “A Memory Model for Static Analysis of C Programs”.

We’ve recently gave another talk on the static analyzer. See “Building a checker in 24 hours” at http://llvm.org/devmtg/2012-11/ The talk is targeted at someone writing a new check extension to the analyzer so it only has a very high level description of the internals; but I am sure you will find it useful.

The Front-end and the memory model are quite attractive, so I want to build
our work based on them.

The website says the engine do symbolic execution. I think it can be
modified for abstract interpretation.

Hence, I need some advice on where should I hack to achieve followings.

(i) Apply another algorithm to traverse CFG for finding “fixed-point”.

The ExprEngine/CoreEngine are responsible for walking the CFG. We unroll loops several times by default, but that is configurable. We stop when reaching the fixed point, by caching out on a previously seen state.

Take a look at AnalysesConsumer.cpp, which initiates the traversal.

(ii) Specify “abstract domain” which to my knowledge is similar to symbolic
value and constraints.

ConstraintManager/SimpleConstraintManager/RangeConstraintManager is the module responsible for keeping track of constraints on the symbolic values. I would suggest taking a look at it to see if you could implement an abstract domain by replacing/creating your own the ConstraintManager. (It’s hard to say if that would be enough/relevant without the details about the abstract domain.)

(iii) Define state and transfer function based on the domain.

If you are only interested in domain specific transfer functions, you should be able to implement them by writing a “checker”. (See the talk, which I’ve referenced above.) That would be the least intrusive way of using/interacting with the analyzer. I’d recommend writing a checker to get a feeling of how the analyzer works as well.

While elements of the symbolic execution employ concepts from abstract interpretation, the analyzer engine itself does not do a fix point computation to compute the set of possible abstract states at a given program point. Building that kind of solver is completely doable, but a very different approach than the one we tool. If you are interested in just local analysis on a single CFG, the elements of the CFG can serve as program points, and a worklist algorithm can be used to do the fix point analysis. Of course you will need to define widening operations and the like, but that depends completely on the domain that you want to define.

Thanks a lot for the advice, Anna and Ted.

I must agree that the core concept of the Analyzer is quite different from
abstract interpretation,

I've disucssed with my professors, and we will check if our idea can be
modified to fit the Analyzer.

Hence, I would like to know if there are some documents describing
theoretical basis of the project.

Also I found some papers below in the mentioned thread, but I'm not sure if
they are outdated.

"Symbolic Path Simulation in PathSensitive Dataflow Analysis", and its
previous work

"ESP: Path-sensitive program verification in polynomial time"

Thanks again for the help.

Bridge

These papers are still relevant, although they there exact methodology is slightly different from what the analyzer does. The idea of path-sensitive dataflow analysis is a fairly generic one, and it can be tackled with different specific techniques, but they all boil down to trying to compute a set of reachable program states.

Another good paper is:

A System and Language for Building System-Specific, Static Analyses (Hallem et al)

and the more theoretical paper that talks about path-sensitive analysis as a graph reachability problem (not specific to finding bugs) is:

Precise interprocedural dataflow analysis via graph reachability (Reps et al)

There are tons of other papers in this area, and the analyzer is inspired by many of them.