[StaticAnalyzer] Loc and NonLoc SVal

I understand that a SVAL is a kind of union wrapping a symbolic value , a mem region or a concrete value. However I don’t really understand the meaning of Loc and NonLoc sub classes.
Can someone explain their purposes?



Long story short: Loc ("location") is a single point in memory (normally a pointer, maybe a C++ reference value or a null pointer or a function pointer or even a goto label address that might be treated as a first-class value through a gcc language extension). Note that Loc represents a single pointer value; even though a loc::MemRegionVal value may wrap a whole memory region, which in turn is a segment in memory, Loc itself represents the start of that region. Everything else - such as numbers or bools, or maybe structure ("compound") values regardless of their contents - is a NonLoc. A compound value (nonloc::CompoundVal or nonloc::LazyCompoundVal) represents the whole contents of the structure regardless of where it is currently stored, hence it's a NonLoc.

NonLoc is always an rvalue; Loc may be either an lvalue or a pointer-type rvalue. For example, in

01 int x = 4;
02 int y = 5;
02 x += y;

on line 03 the left-hand side of oeprator+= is a Loc that represents a pointer to local variable x - that's where the result will be stored - and the right-hand side is a NonLoc that represents concrete integer 5. If you want to find the concrete number 4, you'd need to notice that left-hand side is an lvalue expression in the sense of Expr::isLValue(), and then do State::getSVal(Loc) from the left-hand side. Then you'd finally be able to add 4 and 5 and compute the result of the operator. It is not adviced, however, to determine if you need to do getSVal() by seeing if left-hand side is a Loc. It may be a Loc for a different reason, eg:

01 void *x = malloc(BUF);
02 int y = 5;
03 void *z = x + y;

^here the left-hand side of operator+ is a Loc rvalue that represents a pointer to the start of the buffer, not &x like in the example above. If it was operator+=, it would have been &x and you'd have to do the extra getSVal(). But here you may accidentally shoot yourself in the foot if you make a redundant getSVal().

Because in C pointers can be casted to integers and vice versa, some subtleties arize. Consider:

01 void *x = malloc(2 * sizeof(int));
02 *((int *) x) = 1;
03 size_t i = foo();
04 intptr_t y = &((int *) x)[i];

The analyzer doesn't know what value would be returned by malloc(). It construct a SymbolConjured `conj_$0<void *>' to represent the numeric value of the pointer. Being a SymExpr, it isn't Loc or NonLoc. Technically it's possible to construct nonloc::SymbolVal with `conj_$0<void *>' to represent such numeric value as a NonLoc, but the analyzer wouldn't normally do that for pointer symbols.

Instead, the analyzer constructs a SymbolicRegion, which represents the segment of memory allocated by malloc(). The analyzer even knows the extent (size or length, in bytes) of this region (which would in our example be 8 - let's assume a 32-bit architecture). Then the analyzer represents the return value of malloc() as a loc::MemRegionVal, which is a Loc, of form "&SymRegion{conj_$0<void *>}". It means the pointer that points to the beginning of the region that was allocated by malloc.

SymbolicRegion is essentially a bridge between symbols and Locs. If the pointer points to a known value, eg. &x, it'd not be a symbolic region, and the analyzer wouldn't construct any symbol at all to represent its numeric value. But when a pointer appears from elsewhere, we have to denote its value with a symbol and represent the pointer value as a symbolic region's start.

On the second line, the analyzer sees the cast of the pointer to "int*". The SVal of the expression would be "&element{SymRegion{conj_$0<void *>}, 0 S32b, int}". It means that the analyzer interpreted the aforementioned symbolic region as an array of ints. The analyzer isn't sure if there are actually ints in this array (which is why SymbolicRegion doesn't inherit from TypedValueRegion). Then he took the first element of this array, which is a 4-byte chunk of the symbolic region. The SVal is a loc::MemRegionVal represents the pointer to this first element. This SVal is technically *equal* to the old one: "&SymRegion{conj_$0<void *>}" - both represent the same pointer value. They wouldn't be equal in terms of SVal::operator==(), but they'd be equal if you use SValBuilder to evaluate BO_EQ on them (which is the right way to compare SVals). However, the new SVal carries additional type information that you may want to extract from it by querying its .getAsRegion().

On line 04 computation of the right-hand side of the expression starts with constructing a similar ElementRegion, just with symbolic offset: "&element{SymRegion{conj_$0<void *>}, conj_$1<size_t>, int}". It represents a pointer to i'th element of the array. It's not necessarily within the malloc'ed region - may accidentally go out of bounds, but we still say it's a sub-region. However, the code is interested in the numeric value of the pointer. So the analyzer uses another bridge class - nonloc::LocAsInteger, which represents pointed-to addresses as non-locations. The analyzer can cast the LocAsInteger back to the initial pointer. The SVal stored in `y' may look like this: "&element{SymRegion{conj_$0<void *>}, conj_$1<size_t>, int} (as 32-bit integer)".

SVal hierarchy is a bit tricky, and it's generally very helpful to see hierarchies for SVal, SymExpr, MemRegion at clang's doxygen - they have examples of stuff that goes in each class. Also you can use SValExplainer to print out values in a human-friendly manner.

Thanks a lot for the very detailed answer.
I just recently saw the answer.
So to compare 2 SVal I have to use SValBuilder.
I think it is pretty clear how to do this if I want to assert equality with svalbuilder::evalEQ . How can I check if a given memory region whose symbolic expression is “decorated” with “Element” due to casting, is a sub-region of another one? Is SubRegionOf() is not working. How can I do this with SValBuilder? Any code example I can look at?

isSubRegionOf() should do the trick, it just walks up the chain of super-regions to find if it’s a child of the region. You can verify that this is true or not for the cases you’re looking at. I’ve hit issues where the super region I was looking at had a cast applied, so it was an ElementRegion{foo, 0, $cast_type} and was not a super-region of something like FieldRegion{foo, bar}. You might try to use SuperRegion->StripCasts()->isSubRegionOf() if that’s the case. I think this can especially be an issue with array references since you might have an ElementRegion{arr, 0} when in reality it’s a reference to the entire array.