General query : Alpha security checkers and taint analysis

Hello,

Is the source code of alpha checkers available ? I see a GenericTaintChecker.cpp file in the list of checkers of StaticAnalyzer but not the code of specific checkers.

My main requirement is to write a checker myself , defining a set of taint sources from specific functions, propagating the taint as well as gathering all the instructions either at the source or assembly level where the tainted variables are accessed. All these should be possible with clang static analyzer I hope?

Regards,
Ashwin

Source for pretty much everything is available. To see what code corresponds to what checker, consult the Checkers.td file; for the TaintPropagation checker, GenericTaintChecker.cpp is the correct code file.

With clang static analyzer, you don't need to (though you may) implement taint propagation manually in every checker - the TaintPropagation checker already does a pretty good job, just enable it and code parts that it doesn't support out of the box.

The static analyzer works only with the source code in C/C++/Objective-C.

You should have no problem dumping all accesses to the tainted values, similarly to TaintTester, though the current diagnostic engine isn't very suitable for massive dumps of big data for further analysis - it was designed to report small numbers of actual bug reports and provides useful facilities for that.

  1. I went through GenericTaintChecker.cpp and I am not sure which part of the code propagates the taint.For example,

x = getchar();
char y = x + 1;

Which part of the code taints y? Every part of the code i see seem to deal with functions.

  1. Secondly , how do i use the debugging checkers like TainTesterChecker? I can’t quite seem to locate this in the documentation

Regards,
Ashwin

I figured out how to add debugging checkers. I just need some help regarding the first questions. Thanks.

Regards,
Ashwin

For example,

> x = getchar();
> char y = x + 1;
> Which part of the code taints y?

Propagation of taint through the symbol hierarchy is done by the core automatically. In fact, no propagation is done in any continuous manner - the core just looks at the symbol that you're interested in and finds tainted sub-symbols inside it. This mechanism is implemented in the ProgramState::isTainted() methods, and it relies on the assumption that symbols on which the taint originally appears are always atomic (of SymbolData class).

In your particular example, the following happens:
1. getchar() returns a SymbolConjured - an atomic symbol that represents the return value. Technically, it returns an SVal of nonloc::SymbolVal class, but it is a simple wrapper around the symbol, so there isn't much difference. If you dump() the program state, you'd see it as something like "conj_$0<int>".
2. The conjured symbol is stored in the memory region (VarRegion) that represents AST variable 'x' in the analyzer's memory model. If you dump() the program state, you'd see a binding in the Store: "(x, 0, direct): conj_$0<int>".
3. In order to compute 'x + 1', the conjured symbol "conj_$0<int>" is loaded as (r)value of expression 'x'.
4. A SymIntExpr - a symbolic expression 'conj_$0<int> + 1' is stored in the memory region of variable 'y'.
5. Suppose then you ask if value of 'y' is tainted. Then symbol 'conj_$0<int> + 1' is taken to represent the value of 'y'.
6. In order to see if the value is tainted, ProgramState::isTainted(SymbolRef Sym, ...) iterates over all sub-symbols of the symbolic expression.
7. It finds 'conj_$0<int>' as one of such sub-symbols (and only; '1' is not a symbol).
8. Seeing that 'conj_$0<int>' was marked as tainted by the TaintPropagation checker, it decides that the whole symbol is tainted. Therefore it reports that value of the expression 'y' is tainted.

Great, Thanks for the detailed explanation. I started out directly with this tutorial http://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf and also read the basics in clang static analyzer developer manual. But , since I don’t have any prior knowledge about clang , do i need to go through any other tutorials to completely understand the code of various experimental checkers and also to write one of my own?

Another specific question I have is that , suppose i have a statement var = read_value() . can I directly add read_value function to be one of the taint sources by adding a line in addSourcesPost function of GenericTaintChecker ? And after changing the file , do i need to necessarily run ‘make clang’ inside build directory or is there any simple way to reflect the changes ,since the former takes way too much time.

Regards,
Ashwin

since I don't have any prior knowledge about clang ,

> do i need to go through any other tutorials to
> completely understand the code of various experimental
> checkers and also to write one of my own?

Emm, no, we don't yet have a single good tutorial for everything. Some useful reading includes:

- lib/StaticAnalyzer/README.txt is a veeery brief introduction.

- The link [2] from lib/StaticAnalyzer/README.txt is a good detailed description of the memory model (MemRegion class hierarchy).

- See docs/analyzer/IPA.txt for a quick introduction to how inter-procedural analysis works.

- See docs/analyzer/RegionStore.txt for a shorter introduction to the memory model, with some implementation caveeats.

You may want to get familiar with the clang abstract syntax tree, to just know how clang represents types etc., the good video is there: http://clang.llvm.org/docs/IntroductionToTheClangAST.html .

Also, checker code is usually relatively simple. And the API is also relatively easy and intuitive - well, in most places. Just dump things often - or read the exploded graphs - and try to understand what's going on. Learning by example is what everybody does, i guess, even though not all examples are as good as i wish they were.

> Another specific question I have is that , suppose i
> have a statement var = read_value() . can I directly
> add read_value function to be one of the taint sources
> by adding a line in addSourcesPost function of
> GenericTaintChecker ?

It should work. Though if you want to share your work later, then probably it'd be inconvenient to have very specific functions in the generic tain checker, and we'd have to think how to separate them.

> And after changing the file , do i need to
> necessarily run 'make clang' inside build directory
> or is there any simple way to reflect the changes
> ,since the former takes way too much time.

You do. There are some usual tricks to speed up compilation - use the shared libraries option, use a faster compiler (clang?), use a faster linker (gold?), maybe use a release build if you don't want to have a debugger. Try to reduce the number of linkers running in parallel, otherwise they may eat up all the RAM and begin to swap.

For developing new analyzer checkers, there's one more option: load them as a clang plugin (eg. 'clang -cc1 -load checker.so <...>'), see examples/analyzer-plugin/ for an example. In this case you don't need to rebuild clang, just the checker, but running becomes a bit more tricky - not sure if, say, the scan-build script supports this method.

So probably it's a good idea for you to copy GenericTaintChecker, change it to a plugin, and go ahead extending it.