[StaticAnalysis] Determine dereference values

Hello

We are looking into using the clang front-end for static analysis.

The goal is to find memory accesses on the source code level whose addresses can be statically determined or constrained. This should work across functions and even translation units.

Example:
main.c:
     int main() {
       for (int i = 0; i < 4; i++)
         access(((int*)0x1234) + i); // pass 0x1234, 0x1238, 0x123c, 0x1240
       access(*(int**)0x4444); // pass statically unknown value
     }
other.c:
     void access(int* p) {
       // Want output: read at addr (0x1634|0x1638|0x163c|0x1640|unknown) from clang::Expr*.
       ((volatile int*)p)[0x100];
     }

The clang StaticAnalysis library does a lot of the work we are interested in. That is, determining what values an expression is constrained to, while understanding stores, loads and running a symbolic execution engine.

How scalable is this approach? Even though we would require inter-TU analysis, the problem could be reduced by only looking at accesses that have the volatile qualifier since we are looking at hardware accesses of a bare-metal program. Some retries without inlining are fine, because we assume the accesses are not separated by the constant with significant complexity in between.

Will this be decently reliable? We are interested in cases where a constant is dragged across a couple of low bounded loops with a bit of arithmetic. What are typical cases where the engine gives up because of exploding complexity? I have found that loops are explored in a very limited scope. Is there an easy way to relax these limits a bit at the cost of much higher execution time?

I noticed the engine does not take the value of a file scoped constant pointer "T* const" into account. Is there a technical limitation that prevents doing this?

I also tried to hack a bit on the DereferenceChecker and DivZeroChecker to try and get the symbolic or even concrete value of a Loc, but only got the initialized value and not the value it should be at the dereference. When plotting a graph from a source that does basic arithmetic on a pointer, the expression value never changes. It seems to me that symbolic values of Locs are not fully tracked. Is this true and is there a way to fully track them?

A backwards data-flow analysis on IR level is probably a more reasonable approach in general, but getting the exact clang::Expr that does the access is valuable to us.

Overall, is this problem reasonably solvable with clang static analysis? Any feedback is greatly appreciated!

Best Regards
Rafael

Thank you for the reply. I will check those out.

The FixedAddress checker is definitely a good start too, but we are more interested in the actual dereferences and compare the deduced SVals to known address ranges.

What I meant is that for example in the following it is expected to point out a null dereference bug:

int main()
{
     int* p = (int*)sizeof(int);
     p -= 1;
     return *p;
}

I just finished debugging the issue and found the implementation of pointer arithmetic in SimpleSValBuilder::evalBinOpLN was missing some logic. The "Multiplicand" was not initialized and therefore always zero. The following fix was doing it for me. Should I copy this to the commits list or can someone take a look at it here?

diff --git a/SimpleSValBuilder.cpp b/SimpleSValBuilder_fix.cpp
index f09f969..da31fc0 100644
--- a/SimpleSValBuilder.cpp
+++ b/SimpleSValBuilder_fix.cpp
@@ -927,6 +927,8 @@ SVal SimpleSValBuilder::evalBinOpLN(ProgramStateRef state,

        // Offset the increment by the pointer size.
        llvm::APSInt Multiplicand(rightI.getBitWidth(), /* isUnsigned */ true);
+ QualType PteeTy = resultTy.getTypePtr()->castAs<PointerType>()->getPointeeType();
+ Multiplicand = getContext().getTypeSizeInChars(PteeTy).getQuantity();
        rightI *= Multiplicand;

        // Compute the adjusted pointer.