Hello

We are looking into using the clang front-end for static analysis.

The goal is to find memory accesses on the source code level whose addresses can be statically determined or constrained. This should work across functions and even translation units.

Example:

main.c:

int main() {

for (int i = 0; i < 4; i++)

access(((int*)0x1234) + i); // pass 0x1234, 0x1238, 0x123c, 0x1240

access(*(int**)0x4444); // pass statically unknown value

}

other.c:

void access(int* p) {

// Want output: read at addr (0x1634|0x1638|0x163c|0x1640|unknown) from clang::Expr*.

((volatile int*)p)[0x100];

}

The clang StaticAnalysis library does a lot of the work we are interested in. That is, determining what values an expression is constrained to, while understanding stores, loads and running a symbolic execution engine.

How scalable is this approach? Even though we would require inter-TU analysis, the problem could be reduced by only looking at accesses that have the volatile qualifier since we are looking at hardware accesses of a bare-metal program. Some retries without inlining are fine, because we assume the accesses are not separated by the constant with significant complexity in between.

Will this be decently reliable? We are interested in cases where a constant is dragged across a couple of low bounded loops with a bit of arithmetic. What are typical cases where the engine gives up because of exploding complexity? I have found that loops are explored in a very limited scope. Is there an easy way to relax these limits a bit at the cost of much higher execution time?

I noticed the engine does not take the value of a file scoped constant pointer "T* const" into account. Is there a technical limitation that prevents doing this?

I also tried to hack a bit on the DereferenceChecker and DivZeroChecker to try and get the symbolic or even concrete value of a Loc, but only got the initialized value and not the value it should be at the dereference. When plotting a graph from a source that does basic arithmetic on a pointer, the expression value never changes. It seems to me that symbolic values of Locs are not fully tracked. Is this true and is there a way to fully track them?

A backwards data-flow analysis on IR level is probably a more reasonable approach in general, but getting the exact clang::Expr that does the access is valuable to us.

Overall, is this problem reasonably solvable with clang static analysis? Any feedback is greatly appreciated!

Best Regards

Rafael