[analyzer] Binding address-of globals

Hi,

continuing my effort to make the analyzer understand more constants, I did take a look at the following case:

struct SubS {
int *p;
};

struct S {
struct SubS *sub;
};

struct SubS const gsubs = {
.p = 0x80008000
};
struct S const gs = {
.sub = &gsubs
};

int main() {
struct SubS subs = {
.p = 0x80008000
};
struct S s = {
.sub = &subs
};

 \*s\.sub\->p;
 \*gs\.sub\->p;

}

Here, the analyzer recognizes the dereference via s, but not gs. This seems to be the case because region information will be stored for subs, but not for gsubs.

I'm not sure how to solve this issue. Could we retroactively create the region information whenever we encounter constants like this? Or rather add something to the getBinding functions that manually resolves this case? For the latter it seems like the analyzer should already understand what is happening without many additions, but it's unclear to me how it connects.

Best regards
Rafael

Hmm. It sounds as if we need to fix both things here, and both of them are something that you already know how to solve:

  1. Be able to constant-fold “gs.sub” to “&gsubs”,
  2. Be able to constant-fold “(&gsubs)->p” to “0x80008000”.

I guess the confusion arises because steps 1 and 2 are separated in time; they are in fact two independent loads. They interact through the Environment: we compute the sub-expression, put its value into the Environment, then later when we need to perform the second load we can retrieve the value from the Environment. Once we perform the first load correctly, it becomes irrelevant that such load ever happened; ExprEngine, like checkers, is stateless. The problem becomes as easy as loading “gsubs.p” because the analyzer knows, in path-sensitive manner, that the sub-expression “gs.sub” has evaluated to “&gsubs”; that’d be already encoded in the MemRegion structure.

So i think we don’t need to retroactively create anything. Instead, we simply need to perform every step precisely. Which is anyway a good thing because there’s always code that never gets to the second step.

Sorry if the answer is not spot-on; i’m not sure i fully understood the question.

Alright thanks for the info. As I see it number 2 should already be solved, but number 1 is still not clear to me.

The issue is that there is no direct binding available, as is with the non-global case.

  • Non-global: Will return direct binding from getBindingForField. The initialization earlier in main caused this direct binding.

  • Global: Does not find direct binding in getBindingForField and cannot resolve FieldInit to a constant.

Now I could add some code to the case where getConstantVal fails to look at the FieldInit Expr and return a new FieldRegion in a loc::MemRegionVal if I find UnaryOp(&) → DeclRefExpr(FieldDecl). The issue is that this is very tailored to the example and does not work in general. I feel like the SVal for the FieldInit Expr should be available somewhere but I cannot figure out where.

There is ProgramState::getSVal(const Stmt*, const LocationContext*) but not sure if this is applicable here - also because the RegionStore doesn’t seem to have any ProgramState or LocationContext.

Rafael

Well, there never is a direct binding for anything unless it was put there during analysis, which is not the case for global initializers. That’s the exact problem you’re solving. I guess the difference here is that you can’t evaluate the initializer expression in compile time (because the actual numeric value for address of the global is not known before the program is run), but during analysis we don’t care about the precise numeric value of the address. The SVal that represents the address of a global variable (loc::MemRegionVal that wraps a VarRegion) says exactly that: “it’s the address of that global variable” without specifying what this address is. So you’d have to step away from the constant folding methods used by the compiler (eg. EvaluateAsInt) and implement analyzer-specific constant folding that works similarly but collapses the expression to a concrete value in the analyzer’s sense rather than to a compile-time constant value. So that DeclRefExpr(VarDecl) would collapse to a loc::MemRegionVal(VarRegion) which is State->getLValue(VarDecl, LCtx), where LCtx is obviously ignored for global variables.