I’ve been trying to track down a false positive with symbolic execution of an array assignment within a loop, but I’m having some trouble figuring out the root cause. In the attached loop.c file, the analyzer identifies the LHS of the assignment equal operation on line 11 to be a garbage value, but this is actually initialized on line 6.
Initially, I thought that this was an issue with dead symbols being incorrectly purged, because the trimmed ExplodedGraph shows that the analyzer identifies the array “c” to be an ElementRegion “c: &element{c, 0 S64b,int}” in an earlier node before PreStmtPurgeDeadSymbols. But this is unchanged even after eliminating calls to collectNode(), so I’m confused if the graph is representing the actual symbolic execution exploration state, or just simply the ordering of explored states from the worklist?
The other place that I’ve been looking at is the branch condition handling inside ExprEngine::processBranch(). When the symbolic execution engine assumes either of the false/true branches, does that also entail the body of the loop?
This happens because "nc = na + nb - 1" is too complicated for our extremely simple and fast solver (RangeConstraintManager) to handle. Because of that, we need to *conjure* a completely new symbol for "nc", and then it assumes that "nc" may be equal to 2 even when "na" and "nb" are equal to 1. Essentially, we can construct a correct SymSymExpr for "nc", but we'd need to do some bigger work to let the solver solve the equations.
I'm all for producing more SymSymExpr objects and simplifying them (through an alpha-renaming-like procedure) whenever their components get simplified out, just need time to implement this.
Ah, so it's an issue with the constraint manager. I couldn't find much
documentation on the internals of the static analyzer, so it wasn't
clear how the components interacted with one another.
More generally, why does the static analyzer implement its own
constraint solver? I noticed that there also used to be a
BasicConstraintManager, but was removed quite a while back for not being
useful in practice. Would replacing the internal constraint solver with
a wrapper around e.g. CVC4 or Z3 significantly increase the precision of
the analysis?
Yep, we're using an extremely simple solver; a powerful solver should significantly increase precision, and probably significantly hurt performance. The analyzer often explores thousands of paths through the same point of the program, and many bugs are found in thousands of copies (on different paths on which they occur, and they get deduplicated by end-of-path point), which would put the solver under heavy stress.
Thanks for writing the workbook! I wish I had seen it earlier. Do you
know why the STP solver backend was never merged in? Looking through the
code, it seems that the status was fairly experimental?
Thanks for writing the workbook! I wish I had seen it earlier. Do you
know why the STP solver backend was never merged in? Looking through the
code, it seems that the status was fairly experimental?
The STP backend was experimental quality but could serve as a good starting point to explore adding more powerful solvers. (If choosing now, I’d prefer CVC4.) The main reason it was not merged is that the community asked not to introduce a dependency on the STP library in the build system and the build system integration has not been finished. Feel free to resurrect the patches if you are interested. Someone tried to do that a couple of months ago…
As Artem has mentioned, just replacing the range-based solver with SMT will make the analyzer much slower. One way of addressing the issue would be to run with the faster solver and use SMT on the second slower pass to refute false positives. This would be a much more challenging project than what the STP patch was trying to achieve.
Let me add a concrete data point regarding performance of SMT solvers for static analysis. See slide 45 here: http://web.ist.utl.pt/nuno.lopes/pres/smt_school16.pptx
Basically it shows that SMT solvers are many orders of magnitude slower than specialized abstract domains. Even simplex implemented in Prolog (CLPQ) is faster than Z3 for simple queries.
STP is faster for smaller & simpler queries than Z3, but you'll still see this big gap.
LLVM itself has multiple abstract domains (e.g., ConstantRange) that can be used by clang static analyzer. That would probably require implementing a combined constraint solver in clang, so that it can use a pipeline of solvers.