New automated decision procedure for path-sensitive analysis

Dear LLVMers,

This email is intended for those interested in path-sensitive analysis,
integer overflow analysis, static analysis, and (perhaps) loop invariant
computation.

Traditionally, such analyses have been considered too expensive to be
practical, and were mostly an academic curiosity. The core of the
problem is the lack of adequate automated decision procedures which
could quickly determine whether a set of constraints is satisfiable or
not, and if it is satisfiable, find a solution.

Recently, I've released Spear -- automated modular arithmetic theorem
prover, which has proven to be very scalable in my setting.

A nice feature of Spear is that it supports all LLVM integral
instructions, including SDIV/UDIV/MUL/..., which makes it really easy to
use in combination with LLVM. However, Spear itself is not LLVM-based
because many people that are interested in such theorem provers do not
use LLVM.

Here I provide two simple examples to give you a flavour of Spear:

------------ Example 1 ---------------

Assume that you want to generate an instance that corresponds to the
following C-like sequential code:

int f(int a, int b) {
int a1;
if (a%2) { a1 = a + 1; }
else { a1 = a; }
// a1 is even
int c = a1 * a1;
// Square of an even number is divisible by 4
assert(c % 4 == 0);
}

You could check the validity of the assertion by checking the
unsatisfiability of the negated formula (corresponds to checking that
the assertion can never be FALSE):

# Checking a simple assertion
v 1.0
d a:i32 a1:i32 c:i32 inca:i32 tmp1:i32 tmp2:i1 tmp3:i32 assert:i1
c inca + a 1:i32
c tmp1 %s a 2:i32
c tmp2 trun tmp1
c a1 ite tmp2 inca a
c c * a1 a1
c tmp3 %s c 4:i32
c assert = tmp3 0:i32
p = assert 0:i1

Spear proves this query to be unsatisfiable in 0.02 sec on AMD 64 X2
4600+ for 32-bit integers, and in 0.08 sec for 64-bit integers with the
default heuristics.

------------ Example 2 ---------------

The last Fermat's theorem says that for integer n>2 the equation
a^n+b^n=c^n has no solutions for non-zero integers a, b, and c.

However, if a,b,c are bounded integers, the equation can have
non-trivial (non-zero) solutions. Let's try to find one such example:

# Finding a solution of a^4 + b^4 = c^4
v 1.0
d a:i64 b:i64 c:i64 a2:i64 b2:i64 c2:i64 a4:i64 b4:i64 c4:i64 sum:i64
c a2 * a a
c b2 * b b
c c2 * c c
c a4 * a2 a2
c b4 * b2 b2
c c4 * c2 c2
c sum + a4 b4
p = sum c4

Spear finds a non-trivial solution of this query in 24.08 sec on AMD
64 X2 4600+ with the default heuristics and in 3.18 sec with
the fh_1_1 heuristic.

I think the real difficult thing in path sensitive program analysis (or symbolic execution) is not the lack of decision procedures, but the translation of arbitrary pointer operations and library function calls in C/C++ program into the mathematics supported by the automated theorem prover.
Almost every concept in the computer program except memory address has a counterpart in mathematics. I have tried to simulate memory by arrays in symbolic execution. But I found it is inadequate.

Hi Zhongxing,

I think the real difficult thing in path sensitive program analysis (or
symbolic execution) is not the lack of decision procedures, but the
translation of arbitrary pointer operations and library function calls in
C/C++ program into the mathematics supported by the automated theorem
prover.
Almost every concept in the computer program except memory address has a
counterpart in mathematics. I have tried to simulate memory by arrays in
symbolic execution. But I found it is inadequate.

There has been some progress lately (see Rustan Leino's work on the weakest
precondition transformer) on handling arrays. Some automated theorem provers
even support the theories of arrays (like CVC, Simplify, Yices,...).
However, those
thm provers do not have a very good support for modular arithmetic. In fact,
they most often approximate bounded integers with rationals (reals), and that's
one of the reasons why they can't handle operators like MUL ( as they rely
on linear arithmetic solvers, like simplex ).

Spear takes a different approach - it is bit-precise, handles all operators, but
currently doesn't handle arrays directly. However, the other mentioned
thm provers
handle arrays by encoding them as UIFs + several axioms. As UIFs can be encoded
to SAT, I think that the theory of arrays could be as well. So, with a
bit of effort,
you should be able to use Spear for reasoning about arrays.

Cheers,
Domagoj Babic

Spear takes a different approach - it is bit-precise, handles all operators, but
currently doesn’t handle arrays directly. However, the other mentioned
thm provers
handle arrays by encoding them as UIFs + several axioms. As UIFs can be encoded
to SAT, I think that the theory of arrays could be as well. So, with a
bit of effort,
you should be able to use Spear for reasoning about arrays.

Thank you. I will try it.
But what I really mean is that array simulation is not an adequate solution to pointer problems.

Hi,

Thank you. I will try it.

Np.

But what I really mean is that array simulation is not an adequate solution
to pointer problems.

If you are only interested in a single abstract memory location per allocation
site, that can be handled efficiently. For structured graphs, symbolic
execution can be done very efficiently. I'd give you a reference, but
it's in submission
. Once computed, definitions of abstract memory locations are bit-precise,
and you can pass such queries to Spear.

Proving properties for some more complex data structures than arrays can be
done, but I'm not an expert in that field, so can't give you good
references. The
best I can tell you is to google for shape analysis and separation logic.

Anyways, this is off-topic. If you have further questions/comments on
handling arrays/pointers/memory, please email me directly.

Domagoj Babic