Need help in implementing custom static analysis


I am new to clang development and I would like to have your
opinion on how I can do a specific task.

I want to add a static analysis to the compilation of C++ part of
Android applications (clang is the default compiler).

During this analysis I want to locate the call of specific functions
and then determine the type of the right value of the last
assignation of their arguments.

For example, if I track functions f1 and f2 in the following snippet:
unsigned long x1 = 0;
unsigned int x2 = 0;
unsigned char x3 = 0;

x1 = malloc(…);
x2 = 42;
x3 = ‘x’;
x2 = x3;

The analysis should return me “f1, void*” and “f2, unsigned char”.

Ideally, this analysis should generate a warning during the
compilation process (depending on other conditions not mentioned
here). However, if it is an external tool it is fully acceptable.

I don’t know if this kind of analysis is already present in clang but
I think that it will be easier to implement it over CFG of llvm IR
than over clang AST.

I have looked at clang and llvm documentation but the different
methods that I have seen do not seem to fulfill my requirements:

  • libclang or clang plugin: it seems that I can only access to the AST.
  • llvm pass: I won’t be able to generate a warning.

Do you have any advice about which interface I should use? Do you know
any project/tool that could be good example and inspire me?

Thank you very much,

Pierre GRAUX


Such analysis is trivial to perform with a custom Clang Static Analyzer checker. Just subscribe to checkPreCall and explore the symbolic values (SVals) of function arguments on possible execution paths. SVals capture a lot of information about where does the value come from and you don’t need to manually track all re-assignments, as the analyzer does this for you, sometimes even across function calls. You can lookup what classes of SVals does it track and what kind of information they capture on our Doxygen:

       In your example in case of ‘f1(x1)’ the symbolic value will be loc::MemRegionVal of SymbolicRegion of SymbolConjured of type void *, which you can extract from the SVal by doing V.getAsSymbol(true)->getType(), where V is your SVal. In case of ‘f2(x2)’ you will only know that the value is equal to ‘x’, but the type of the original literal will be erased. You can still ultimately recover it via trackExpressionValue(), but that’s not entirely convenient. That said, i’m not sure you really want it as long as you have the value anyway. See also:   — The only downside of the Static Analyzer is that it doesn’t explore all possible execution paths, but only the ones it has time to carefully investigate (it intentionally suffers from “path explosion”). If your purpose is to make a tool that will find bugs in existing code, this is perfect. If you really really want to explore all execution paths no matter what, then you’ll have to write your own analysis, and then one of your options will be to use Clang CFG:    Clang CFG is different from LLVM IR; it consists of Clang AST node pointers, so it still captures the information present in the original source code pretty much perfectly. There is a variety of existing analyses over Clang CFG available in Clang’s lib/Analysis that you can use as an example or possibly even re-use. That’s much more work than a Static Analyzer checker though, and you’ll have to deal with a lot more false positives due to lack of path sensitivity. It’ll also be a much bigger challenge to find bugs across function calls.