pointer values, taint propogation

I didn’t realize the screen captures would make things difficult. Here’s a version of the e-mail with text instead.

I would like to write a checker to make sure that pointers from untrusted sources are not dereferenced. So I am playing around with the alpha.security.taint checker to try to understand how taint propagation works. I put together some simple test cases:

#include <stdio.h>

#include <string.h>

void test(float f, char *sp1, char *sp2, char *sp3) {

char *sv1, *sv2, *sv3;

long long ll;

scanf(“%lld”, &ll); // ll is read from stdin, so is untrusted

if (ll) {

printf((char *)ll, f); // flag: ll is untrusted

sv1 = (char *)ll;

printf(sv1, f); // flag: sv1 = ll is untrusted

sp1 = (char *)ll;

printf(sp1, f); // flag: sp1 = ll is untrusted

printf((char *)NULL+ll, f); // flag: ll is untrusted

unsigned long u = 0;

u += ll;

printf((char *)u, f); // flag: u = 0+ll is untrusted

sv2 = NULL;

sv2 += ll;

printf(sv2, f); // flag: sv2 = NULL+ll is untrusted

sp2 = NULL;

sp2 += ll;

printf(sp2, f); // flag: sp2 = NULL+ll is untrusted

printf(sp3, f);

sv3 = sp3;

sv3 += ll;

printf(sv3, f); // flag: sv3 = sp3+ll is untrusted

sp3 += ll;

printf(sp3, f); // flag: sp3 = sp3+ll is untrusted

}

}

The indicated calls to printf should be flagged because an untrusted value is passed as the format string (the first argument). When I run the alpha.security.taint checker on this, it only flags the last two. Looking into this, I found that the static analyzer often assigns an unknown value to the result of casting from an integer to a pointer. Taint can’t be attached to an unknown value, so I need a known value for the pointer. Although I’m not sure if this is really what we want to do, for now I added a post-check call-back on cast expressions to my checker that casts the integer to unsigned and then assigns the result as the pointer value.

void UntrustedDerefChecker::checkPostStmt(const CastExpr *CE, CheckerContext &ChCtx) const {

ProgramStateRef St = ChCtx.getState();

const LocationContext *LCtx = ChCtx.getLocationContext();

if (CE->getCastKind() == CK_IntegralToPointer && St->getSVal(CE, LCtx).isUnknown()) {

// if cast to pointer resulted in an unknown value, try casting to unsigned instead

const Expr *SubExpr = CE->getSubExpr();

QualType CastTy = ChCtx.getASTContext().UnsignedLongTy;

QualType SubExprTy = SubExpr->getType();

SVal Val = St->getSVal(SubExpr, LCtx);

Val = St->getStateManager().getSValBuilder().evalIntegralCast(St, Val, CastTy, SubExprTy);

ChCtx.addTransition(St->BindExpr(CE, LCtx, Val));

}

}

When I run alpha.security.taint together with this on the test file, it flags the calls to printf where (char *)l and (char *)u are passed, but not the calls where sv1, sp1, (char *)NULL+l, sv2, or sp2 are passed, even though these are just different expressions with the same value. I tried looking into this using the debugger, and found that when the analyzer calls ExprEngine::evalStore for the assignment statement (say “sv1 = (char *)ll”), it gives the pointer the appropriate value, but when the analyzer calls GenericTaintChecker::checkUncontrolledFormatString from the pre-check call-back for the call to printf on the following line of the test file, the pointer has an UnknownVal. (See excepts from the debugging session below.)

Can anyone help me understand what is going on here?

Thanks,

Ray

(These debugger session excepts are inherently screen captures, but the gist of them is described above.)

Hello,

While the analyzer tries to cover all sides of the C's "everything is everything" attitude (pointers are integers, integers are pointers, pointer types do not matter), it seems that casting integers to pointers is the least covered part of it.

--= IntegerAsLoc values =--

First, there is currently no accepted way to represent a cast from a *symbolic* integer to a pointer in the SVal hierarchy. For the opposite cast (from a pointer to an integer) there's nonloc::LocAsInteger SVal sub-class to represent the result. However, nobody made a "loc::IntegerAsLoc" for us yet.

I'd probably suggest to represent those as SymbolicRegion of an integer-type symbol. In fact, the following patch fixes all of your cases except sv2/sp2:

```
diff --git a/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp b/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
index 28b43dd..6e894f3 100644
--- a/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
+++ b/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
@@ -97,7 +97,9 @@ SVal SimpleSValBuilder::evalCastFromNonLoc(NonLoc val, QualType castTy) {

      if (!isLocType)
        return makeNonLoc(se, T, castTy);
- return UnknownVal();

Artem,

Thank you for your detailed answer -- it was illuminating and very helpful. Representing a pointer as an unsigned integer was causing problems, but creating a new symbolic region seems to be working. I am now wondering if we can do something similar for the sv2/sp2 cases: when we add a symbolic integer value to a null pointer, make the value a new symbolic region based on that integer value. I tried to write a pre-check/post-check pair to do this, but am running into trouble even with the pre-check to catch this situation. Here's what I wrote:

void UntrustedDerefChecker::checkPreStmt(const BinaryOperator *Bop, CheckerContext &ChCtx) const {
   ProgramStateRef St = ChCtx.getState();
   const LocationContext *LCtx = ChCtx.getLocationContext();
   BinaryOperator::Opcode Op = Bop->getOpcode();
   if (Op != BO_Add && Op != BO_AddAssign
       && Op != BO_Sub && p != BO_SubAssign)
      return;
   Expr *LHS = Bop->getLHS();
   if (LHS->getType()->isPointerType()) {
      SVal ValL = St->getSVal(LHS, LCtx);
      ProgramStateRef LIsNullSt, LIsNotNullSt;
      std::tie(LIsNotNullSt, LIsNullSt) = St->assume(ValL.castAs<DefinedOrUnknownSVal>());
      if (LIsNullSt & !LIsNotNullSt) {
         ChCtx.addTransition(LIsNullSt->set<NullPtrArith>((unsigned)LeftArg));
         return;
      }
   }
  if (Po != BO_Add && Op != BO_Sub)
      return;
   ... // handle pointer on right symmetrically to pointer on the left case above
}

But when this call-back is called for the sv2 += ll statement, LIsNuLLSt is null and LIsNotNullSt is a valid state, so it thinks sv2 must be non-null even though the state has it constrained to be null. I traced it in the debugger and it looks like the issue is in SimpleSValBuilder::evalCastFromLoc, where the non-symbolic case falls through to the GotoLabel case, making any ConcreteInt-as-Loc value (even NULL) evaluate to true.

Is this a bug, or am I misunderstanding what is going on? Any suggestions for a work-around?

Thanks,
Ray

Hello,

Sorry, it took me some time to get to reproducing this issue.

I think everything works correctly, because left-hand side of operator += (unlike operator +) is actually an lvalue - the variable sv2 (sp2) itself, not its contents. The relevant SVal is .dump()'ed as &sv2, and "means" a pointer to the variable sv2. Null pointer usually doesn't point to stack variables, so the null-state is modeled to be impossible.

To get the rvalue of the left-hand side expression, you'd need to perform an extra getSVal(), this time from the Store - which essentially dereferences the pointer. Something like this:

   if (LHS->isLValue()) // or just check the opcode
     if (auto ValLAsLoc = ValL.getAs<Loc>()) // auto resolves to Optional<Loc>
       ValL = St->getSVal(*ValLAsLoc);
     else
       // Handle what went wrong. Perhaps an UndefinedVal or an UnknownVal.

Generally, it's a very good idea to .dump() the SVal's you encounter, there's a lot of useful info. There's also this SValExplainer class which may help with understanding the SVal hierarchy.