At block level the for-body contains binary assignment expressions only. Now I just want to know, if and where lhs_x appears anywhere in rhs_y (with x < y, let's ignore x > y for the moment). Then I could expand rhs_y by replacing lhs_x with rhs_x, perhaps recursively. This of course isn't that easy as it sounds: you have to resolve things like *&a <-> a or a->b <-> (*a).b.
Thus all I need is a unique identifier for lhs_x and a function which for a given arbitrary rvalue-expression finds the referred lhs_x identifier (if one exists). Given these my initial example could be solved quite elegant:
A* ptr = &temp; // var ptr gets ID1
temp.a = /*expr*/; // field var temp.a gets ID2
temp.b = /* another_expr */; // field var temp.b gets ID3
*sink = temp.a; // function detects ID2 for temp.a: expand to *sink = /*expr*/: done
*sink = ptr->b; // function detects ID1 for ptr: expand to *sink = (&temp)->b:
// function detects ID3 for (&temp)->b: expand to *sink = /* another_expr */: done
I think, that one feasible approach to do this is working on the AST level by unifiying expressions somehow (e.g. something like a unified string) and then just search for them in other unified expressions. However I'm afraid that this approach isn't very extensible. At least it means some work which possibly has been done.
OTOH I've seen the MemRegions which apparently are already designed so that they can act easily as these unique identifiers I mentioned above. Also the PostLoad node seems to designate sub-expressions for which a MemRegion exists. But the only clang part dealing with MemRegions so far is the GRExprEngine and as I said I'm note sure if it is what I need. Or it's me who is still not able to use GRExprEngine properly for my purpose.
Hi Olaf,
Sorry for the delay in my response.
MemRegions can be viewed as the "name" or "location" of a piece of memory. This is how GRExprEngine uses them, and it uses MemRegions to talk to StoreManager to reason about values bound to MemRegions.
MemRegion certainly can be repurposed by other clients, as MemRegionManager (which constructs MemRegion objects) doesn't rely on the rest of the path-sensitive engine. MemRegions conceptually represent abstract chunks of memory. For example, a VarRegion represents the memory associated with a given variable. Regions can be layered on top of each other to represent field and array accesses, casts, and so forth. MemRegions are constructing on demand by GRExprEngine when it evaluates casts, pointer decays, and so forth. MemRegions are uniqued by MemRegionManager. For example, when you request the MemRegion for a given VarDecl, you will always get the same MemRegion*.
In your example above, I'm not certain if ID* symbols are the values bound to locations or the locations themselves. If they are the locations, then you'd have:
ID1 => VarRegion(ptr)
ID2 => FieldRegion(VarRegion(temp), a)
ID3 => FieldRegion(VarRegion(temp), b)
Now if ID* represent values, you're going to need something else besides MemRegions to keep track of location -> value bindings. For example:
*sink = ptr->b; // function detects ID1 for ptr: expand to *sink = (&temp)->b:
To get this reasoning, you need to keep track of the binding:
VarRegion(ptr) => VarRegion(temp)
which would cause the l-value of 'ptr->b' to evaluate to:
FieldRegion(VarRegion(temp), b)
Then the r-value of 'ptr->b' expands to whatever value you track for the field 'b' of 'temp'.
In general, I'm not certain how much value flow logic you plan on implementing. MemRegions themselves don't represent the values of expressions, but rather memory locations (which some expressions may evaluate to). The path-sensitive engine uses SVals (short for "symbolic values") to represent what the symbolic result of evaluating an expression. As you'll see, SVals can represent "locations" and "non-locations", and "locations" include MemRegions.
I guess it's not clear to me how much symbolic evaluation you plan on doing. If your analysis isn't path-sensitive, how do plan on handling confluence points in the CFG (and loops)?
Ted