[analyzer] Problem tracking taint applied to regions

Hello cfe-dev,

I am currently trying to write a custom checker for the static analyzer.

The idea is to perform taint analysis on C++ code with sources, sinks and filters being provided by configuration. A checker which does something similar exist already,e.g. [1]. However, in our use-case, we want to mark data/values (rather than functions) as sources and sinks. Consider for example a piece of security-relevant configuration data which must not be writable by an potential attacker. For now, I use the following example for testing:

class Foo {
public: int x; // Foo::x marked as tainted (e.g. source)
};
static int y; // Marked as sink

int main() {
  Foo f;
  y = f.x;
  return y;
}

Now, clang already provides some infrastructure for taint analysis, e.g. `ProgramState::addTaint()` and `ProgramState::isTainted()`.The internal taint map used in the store tracks symbolic expressions. I decided to use this existing infrastructure for obvious reasons, e.g. benefitting from existing and future taint propagation infrastructure. However, it bit me and now I'm stuck.

The taint is introduced in a `checkLocation()` implementation. The first thing I do is unwrapping a `MemberExpr` (e.g. removing casts) and decide whether to taint it or not based on the referred declaration. This far, it works as intended. However, it appears that the `SVal loc` provided as a parameter to this handler does not refer to a `SymExpr`, neither does `loc.getAsRegion()` return a `SymbolicRegion`. In order to taint the piece of data, I create my own symbolic expression as well as an associated `SVal` and bind it to the member expression `S`:

// The taint map can only be used for tracking symbolic expressions.
auto symExpr = loc.getAsSymExpr();

if (!symExpr) {
  // The current `loc` is not suitable for carrying taint. Construct a new one.
  if (const auto region = dyn_cast_or_null<TypedValueRegion>(loc.getAsRegion())) {
    if (symExpr = C.getSymbolManager().getRegionValueSymbol(region)) {
      loc = C.getSValBuilder().makeLoc(symExpr);
      state = state->BindExpr(S, C.getLocationContext(), loc);
    }
  }
}

// Add taint if possible
if (symExpr) {
  state = state->addTaint(symExpr);
  C.addTransition(state);
}

Invoking `dumpTaint()` on ` state` shows that the expression did indeed make it into the taint map, `state->isTainted(S, C.getLocationContext())` does return `true` and `state->dump()` reveals that `f.x` is now bound to a symbolic expression looking just like I'd expect (`f.x : &SymRegion{reg_$1<int f->x>}SVal `).

Btw: I did not use a `check::PreStmt` specialized to a `MemberExpr` because I do not (yet) understand the exact semantics of the ` SVal `s and `Loc`s which I can get from the state (partly because of the utter lack of higher-level documentation on these things). They appear to be different: the code above does fail in a `checkPreStmt()`, one appears to refer to the location where the value is originating while some other appears to refer to something completely different although the method's name suggests its just the same.

Using the information in subsequent post-statement checks is where I'm stuck. First, I noted that the taint is not propagated anywhere. I assumed that simple assignments were, for whatever reason, not yet considered (I also did not find anything in the generic taint checker), so I started implementing a `check::PostStmt<BinaryOperator>` checker. Since `isTainted()` on the statement behaved as expected on the last handler, I assumed the taint detection to be somewhat easier. Sure, I have to strip both the LHS and RHS of casts and paranthesis, but detecting taint should be easy, I thought. Turns out `isTainted()` never returns true.

Now, dumping both the taint and the overall state using `ProgramState::dump()` and `ProgramState::dumpTaint()` reveals that the symbolic expression which was previously added to the taint map is still tainted, but the binding of this expression to `f.x` is gnone (`f.x: Undefined`).

Does anybody on this list have an idea or an explanation why the binding vanishes? Or can naybody point me to move exhaustive resources on the excact semantics of the different kinds of `SVal`s turning up in different contexts? Help would be much appreciated.

Btw: I'm stuck with LLVM/clang 4.0 provided by the package manager at the moment. Self-compiled clang 6.0 refuses to load modules (cannot resolve the `CheckerBase` symbol for some reason) although I ran `cmake` with `-DCLANG_PLUGIN_SUPPORT=ON`, `-DLLVM_BUILD_LLVM_DYLIB=ON` and `-DLLVM_ENABLE_MODULES=ON`.

Greetings and thanks in advance,
Julian

[1] GitHub - franchiotta/taintchecker: Clang static checker that carries out tainting analysis.

PS: Sorry for the missing line-wrapping. I did not yet manage to make Outlook behave like a decent eMail-Client

Hello cfe-dev,

Using the information in subsequent post-statement checks is where I'm stuck. First, I noted that the taint is not propagated anywhere. I assumed that simple assignments were, for whatever reason, not yet considered (I also did not find anything in the generic taint checker), so I started implementing a `check::PostStmt<BinaryOperator>` checker. Since `isTainted()` on the statement behaved as expected on the last handler, I assumed the taint detection to be somewhat easier. Sure, I have to strip both the LHS and RHS of casts and paranthesis, but detecting taint should be easy, I thought. Turns out `isTainted()` never returns true.

Now, dumping both the taint and the overall state using `ProgramState::dump()` and `ProgramState::dumpTaint()` reveals that the symbolic expression which was previously added to the taint map is still tainted, but the binding of this expression to `f.x` is gnone (`f.x: Undefined`).

It looks like the value is collected by the `SymbolReaper`, which I find strange since the statement containing the member expression is clearly still processed.

By now I suspect that I should have used a `SymbolMetadata` and somehow made sure it lives long enough (e.g. using `SymbolReaper::markInUse()`).

Greetings,
Julian

Whoops +cfe-dev.