[Analyzer] Obtain MemRegion corresponding to an pointer expression that has been cast to a different type

Hi All,

I’m analyzing something like the following code:

struct S {
int a;
char b;
int c;
}

void foo() {
struct S x;
bar((uint8_t *)&x);
}

When I reach the CallEvent corresponding to the call to bar(), I would like to extract the MemRegion corresponding to x, i.e. by ignoring the (uint8_t *) cast. My code looks something like this:

const Expr *arg = Call.getArgExpr(0);

SVal addrVal = State->getSVal(arg, LCtx);
Optional l = addrVal.getAs();
if (!l) // must be a null pointer
return nullptr;

QualType T = getPointedToType(E);
return State->getSVal(*l, T).getAsRegion();

where getPointedToType() is defined as

getPointedToType(const Expr *E) {
assert(E);
if (!isPointer(E))
return QualType();
if (const CastExpr *cast = dyn_cast(E))
return getPointedToType(cast->getSubExpr());

const PointerType *Ty =
dyn_cast(E->getType().getCanonicalType().getTypePtr());
if (Ty)
return Ty->getPointeeType();
return QualType();
}

Everything seems to work just fine, until the call to State->getSVal(*l, T), which returns a NonLoc. If I instead call State->getSVal(*l) without the pointed-to type, then I do get a MemRegion, but it’s an element region of type uint_8, NOT what I want.

Am I doing something wrong? Is there a much easier way to do this?

~Scott Constable

Hi Scott,

I don’t actually see a reason here why you need to even look at the structure of the AST here. The analyzer does a full symbolic execution, so there is a powerful separation between syntax and semantics right at your fingertips.

I would approach this from a different angle. Once you have the location, in this case, ‘l’, it should be an ElementRegion. That will represent the cast from original MemRegion (a VarRegion) to uint8_t*. Then just strip off the ElementRegion. The MemRegion design captures how the casts were used to change the interpretation of a piece of memory. It’s all right there in the MemRegion hierarchy.

AST-based approaches like this are fundamentally very brittle. For example, you would need to do something different if the code was instead written like this:

  void foo() {
    struct S x;
   uint8_t *y = (uint8_t *)&x;
   bar(y);
  }

If you just use the MemRegions directly, these syntactic differences are irrelevant. The MemRegions capture the actual semantics of the value you are working with. In this case, the analyzer knows that the original memory address is for the VarRegion for ‘x’.

Typically if you find yourself going to the AST itself to do these kind of operations, the approach is inherently wrong. Syntactic approaches work reasonably well for the compiler, where cheap local analysis is all you have. For the static analyzer, there is so much semantics captured in the ProgramState that you can go far beyond the reasoning power of syntactic checks like this.

Cheers,
Ted

Thanks Ted,

The solution was to write the “dereference” function like this:

const MemRegion *
Util::getPointedToRegion(SVal addrVal, bool ignoreElemCast) {
Optional l = addrVal.getAs();
if (!l) // must be a null pointer
return nullptr;
const MemRegion *MR = l->getAsRegion();
if (!MR)
return nullptr;
const ElementRegion *ER = dyn_cast(MR);
if (ER && ignoreElemCast)
MR = ER->getSuperRegion();

return MR;
}

It’s essentially just stripping off the ElementRegion, just like you suggested.

~Scott Constable