Pointers as SVals

Hello,

I am trying to understand how to distinguish the value of the pointer itself and the pointed region. However, I experience some contradictions while testing. Look at the following piece of code:


const int* get_ptr();

void f() {

const int *p = get_ptr();

clang_analyzer_dump(p);

clang_analyzer_explain(p);

}

The output of this code:


ptr_dump_explain.c:8:3: warning: &SymRegion{conj_$2{const int *, LC1, S715, #1}} [debug.ExprInspection]

clang_analyzer_dump(p);

^~~~~~~~~~~~~~~~~~~~~~

ptr_dump_explain.c:9:3: warning: symbol of type 'const int *' conjured at statement 'get_ptr()' [debug.ExprInspection]

clang_analyzer_explain(p);

^~~~~~~~~~~~~~~~~~~~~~~~~

Is p a region or a symbol? clang_analyzer_dump() says it is a region, more specifically a symbolic region, but still a region. However, clang_analyzer_explain() says it is a symbol, which I think is wrong. According to SValExplainer.h it should print something like object at… or pointee of … but not explain the raw symbol without mentioning the region.

I tried to change the code to the following:


void f() {

const int *p = get_ptr();

++p;

clang_analyzer_dump(p);

clang_analyzer_explain(p);

}

The output changes:


ptr_dump_explain.c:9:3: warning: &Element{SymRegion{conj_$2{const int *, LC1, S715, #1}},1 S64b,int} [debug.ExprInspection]

clang_analyzer_dump(p);

^~~~~~~~~~~~~~~~~~~~~~

ptr_dump_explain.c:10:3: warning: pointer to element of type 'int' with index 1 of pointee of symbol of type 'const int *' conjured at statement 'get_ptr()' [debug.ExprInspection]

clang_analyzer_explain(p);

^~~~~~~~~~~~~~~~~~~~~~~~~

This is even stranger, because here clang_analyzer_dump() says it is an element region, thus a region of the array element. However, here clang_analyzer_explain() says it is a pointer to the element, thus not the element itself. According to SValExplainer.h the output for an element region should begin with element of type…. What is wrong here? Both functions take the same type of parameter:


void clang_analyzer_dump(const int*);

void clang_analyzer_explain(const int*);

What do I misunderstand here?

Regards,

Ádám

If a symbol (SymExpr object) $p is an unknown numeric value of a memory address, then a symbolic region (i.e., SymbolicRegion object) SymRegion{$p} represents the segment of memory that starts at address $p and ends at another unknown position, and a pointer value (loc::MemRegionVal object) &SymRegion{$p} represents, well, a value of a pointer to the beginning of symbolic region SymRegion{$p}.

All three are basically the same thing. SymRegion{$p} is slightly different because it implies the existence of the other end of the segment (even if it’s unknown) but &SymRegion{$p} is basically the same thing as $p, just represented as an object of a different type (SVal as opposed to SymExpr).

Think of SymbolicRegion and loc::MemRegionVal as adaptors; they don’t change the meaning behind the object, they only represent it in a different manner, like a different point of view on the same entity. The important technical difference between &SymRegion{$p} and $p is that the former is Loc and the latter is NonLoc.

There’s another such adaptor, nonloc::SymbolVal, that represents SymExprs as SVals directly. For any symbol $p of pointer type, nonloc::SymbolVal of $p is ill-formed; it is always going to be canonically represented as loc::MemRegionVal &SymRegion{$p} instead. So nonloc::SymbolVal can only be used on regular integers. This ensures that Loc values are always used for representing pointers (or references, or values of glvalue expressions) and NonLoc values are always used for representing integers and other prvalues of non-pointer type.

This entire system of adaptors might seem unnecessarily complicated and it probably is but i can’t say we suffer too much from its existence and i don’t have anything better in mind and i believe it adds a bit of type safety that helps us avoid introducing bugs in the code.

See also > clang_analyzer_dump() says it is an element region It doesn’t. It says “&Element”, not “Element”. This should be read as “address of element” and indicates that the dumped value is a loc::MemRegionVal, i.e. a pointer value. That’s exactly how explainer works as well, which is why it says “pointer to”.

Hello,

Thank you for your very detailed answer. The main point what was not clear before was that we never store a pointer in a symbolic value ($p) but in a memory region which points to the symbolic region instead (&SymRegion{$p}).

Regards,

Ádám