Static Analyzer: pointer alias assignment

During the analysis of a test code, the following two bindings happen (checkBind), with their respective source lines:

(Bind: location <= value)

Bind: &fp <= &SymRegion{conj_$4{struct Foo *}}

Code: Foo* fp = getFooPtr();

Bind: &ap <= &SymRegion{conj_$4{struct Foo *}}

Code: Foo* ap = fp;

In the second line, I need to detect that ‘ap’ is in fact the alias of ‘fp’. Unfortunately, I cannot seem to find any way to get Clang SA to tell me that “&SymRegion{conj_$4{struct Foo *}}” is stored in “fp”, which seems weird, because the source code is very clear.

Some of the information I extracted, but is not really useful to me:

  • original SVal: &SymRegion{conj_$4{struct Foo *}}

  • getAsRegion(): SymRegion{conj_$4{struct Foo *}}

  • state->getSVal(): &SymRegion{reg_$6<element{SymRegion{conj_$4{struct Foo *}},0 S32b,struct Foo *}>} – in fact, I have no idea what this is

  • getAsSymbol(): conj_$4{struct Foo *}

As a workaround, I can keep track of this information myself, but there must be a built-in way to do this.
Any help would be appreciated. Many thanks!

During the analysis of a test code, the following two bindings happen (checkBind), with their respective source lines:

(Bind: location <= value)

Bind: &fp <= &SymRegion{conj_$4{struct Foo *}}

Code: Foo* fp = getFooPtr();

Bind: &ap <= &SymRegion{conj_$4{struct Foo *}}

Code: Foo* ap = fp;

In the second line, I need to detect that ‘ap’ is in fact the alias of ‘fp’. Unfortunately, I cannot seem to find any way to get Clang SA to tell me that “&SymRegion{conj_$4{struct Foo *}}” is stored in “fp”, which seems weird, because the source code is very clear.

As you observe the two binds you see that the same value is stored in both.

The analyzer does not perform alias analyzes as in it does not build sets of aliases. As it models the execution in presence of aliases, we did not find a need for the alias sets. Can you give a bit more background on why you need this info? Maybe your goal can be achieved differently?

Hi Anna,

I’m building a checker that detects inconsistent pointer usages, for example when a pointer is dereferenced, then along the same path is null-checked (without its value changing in between, obviously). Code example:

Foo* f = getFoo();

f->bar();

if(f) // warn
{ … }

I want to be able to do this with aliases as well, for example:

Foo* f = getFoo();

f->bar();

Foo* g = f;
if(g) // warn
{ … }

What I need is to be able to get the SVals representing ‘f’ and ‘g’ when checkBind is called on the Foo* g = f; line. Currently, instead of ‘f’, Clang gives me the value that was bound to ‘f’.

Thanks for your help!

Hi Anna,

I’m building a checker that detects inconsistent pointer usages, for example when a pointer is dereferenced, then along the same path is null-checked (without its value changing in between, obviously).

Can this be designed as an extension of the Dereference checker?

Also, the Dereference checker does work fine in presence of aliases… Is there a reason why it’s different in your setting?

int foo(int *p) {
int *q = p;
if (q)
;
return *p;
}

zaks$ clang --analyze ~/tmp/ex.c -Xclang -analyzer-output=text
/Users/zaks/tmp/ex.c:5:10: warning: Dereference of null pointer (loaded from variable ‘p’)
return *p;
^~
/Users/zaks/tmp/ex.c:3:7: note: Assuming ‘q’ is null
if (q)
^
/Users/zaks/tmp/ex.c:3:3: note: Taking false branch
if (q)
^
/Users/zaks/tmp/ex.c:5:10: note: Dereference of null pointer (loaded from variable ‘p’)
return *p;
^
1 warning generated.

Hi Gabor,

I think the problem should be looked at in a different fashion. You should not need to reason about aliases at all. The analyzer engine handles this for you. It’s all symbolic execution.

One way to reason about this problem is that a pointer value has a notion of trust.

If it is trusted, it is safe to dereference. If it isn’t trusted, it isn’t safe to dereference. The trust level is a tri-state: { Trusted, NotTrusted, Unknown }. A pointer value needs to start in one of those states. It goes from { NotTrusted | Unknown } to { Trusted } by a null pointer check. The checker then fires if a { Trusted } pointer is checked for null.

There are details of course. First, this formalism isn’t necessary to model explicitly because the analyzer essentially already does this. “Trust” essentially means “constrained to be not-null”, “not trusted” means “known to be null” and “unknown” means unconstrained. Second, when does the check fire? A null check might not always be redundant. On some paths the null check may be redundant and others it won’t be. Thus there is a dominance relationship here that needs to be checked. Essentially, all paths need to show that the pointer is always non-null before the pointer is checked. Otherwise you’ll get false positives. Doing this correctly is hard, because not all paths are guaranteed to be traced. You’ll need to handle that too.

In this scheme, the pointer value is the symbolic region. You’ll essentially want to monitor the actual null checks of a symbolic region, and see if that region is constrained to always be non-null.

Ted

Anna,

Can this be designed as an extension of the Dereference checker?

That was my original thought, but looking at the Dereference checker source code, it writes that it’s a built-in check in ExprEngine, which discouraged me to try to extend it. I also have no idea how it works or what it does - I’ve been unable to have it produce any warnings, except for this trivial case:

Foo* fp = nullptr;
fp->baz();

And now I tried with your code example, and it also produces a warning. By the way, when analyzing a translation unit, which function(s) are treated as entry points? It appears to me that all of them are, but are there any restrictions, or ways to control this? I do not see any promising args with clang++ -cc1 --help.

Anyway, this code snippet still does not cause the Dereference checker to raise a warning, even though it is trivial:

Foo* fp = getFooPtr(); // based on runtime data, either returns nullptr, or an allocated object

if(!fp)
{

fp->baz();

}

To the best of my knowledge, the if(!bar) line creates two code paths: one on which the condition is true (i.e. fp == nullptr), and one on which the condition is false (i.e. fp != nullptr). Obviously the first statement on the former code path is the fp->baz() line, analyzing which the Static Analyzer tells me that ‘fp’ is constrained to be null (using ProgramState::isNull). Therefore, my checker can easily find this issue.

Browsing through the source code of the DereferenceChecker, I can tell it does not use ProgramState::isNull, but instead ProgramState::assume. However, as far as I can tell, it does exactly nothing - I can see that it’s supposed to either raise a warning, or dispatch an ImplicitNullDerefEvent, but neither happens on the code example above (I tested the latter by running both checkers with -analyzer-checker, and in my checker, I subscribe to check::Event).

So why is ProgramState::assume used instead of ProgramState::isNull?

Also, the Dereference checker does work fine in presence of aliases… Is there a reason why it’s different in your setting?

I’m using Clang 3.3, so unless significant changes have been made to the Static Analyzer or the Dereference checker, things should work the same. I have not modified the Clang source code, except for adding a bunch of custom checkers.

Gabor

Hi Ted,

I think the problem should be looked at in a different fashion. You should not need to reason about aliases at all. The analyzer engine handles this for you. It’s all symbolic execution.

Yes, I realize the Static Analyzer will recognize that the line p = q means that ‘p’ and ‘q’ now has the same value, but it does not copy the user state (REGISTER_*_WITH_PROGRAMSTATE data) - which makes total sense, since it does not know how that data is structured. Therefore, I need to record the fact the aliasing has happened using checkBind. This is where I encountered the problem described in my original e-mail.

Reading through the checker dev manual, I found the relevant section:
“When x is evaluated, we first construct an SVal that represents the lvalue of x, in this case it is an SVal that references the MemRegion for x. Afterwards, when we do the lvalue-to-rvalue conversion, we get a new SVal, which references the value currently bound to x. That value is symbolic; it’s whatever x was bound to at the start of the function.”

I need the reverse: not the value currently bound to x, but the SVal that references the MemRegion for x itself. I searched the documentation, but all I could find was ProgramState::getLValue, none of whose overloads give me what I need. So how could I achieve this?

Back to the aliasing problem: I think it would be very useful if the Static Analyzer exposes functions to retrieve aliasing information. For example, I would like to ask whether a given SVal is an alias of another. My original thought was that this alias-analysis could be implemented as a checker itself, which would dispatch an event when it detects an aliasing, but I do not see how it could expose methods to the other checkers (e.g. for them to be able to ask “is x an alias of y in this ProgramState?”). Does this make sense?

A null check might not always be redundant. On some paths the null check may be redundant and others it won’t be. Thus there is a dominance relationship here that needs to be checked. Essentially, all paths need to show that the pointer is always non-null before the pointer is checked. Otherwise you’ll get false positives. Doing this correctly is hard, because not all paths are guaranteed to be traced. You’ll need to handle that too.

Yes, this is a problem, but I have absolutely no clue as to how it could be solved. I would naturally want to keep the number of false-positive at minimum, even at the cost of some bugs going undetected.

In this scheme, the pointer value is the symbolic region. You’ll essentially want to monitor the actual null checks of a symbolic region, and see if that region is constrained to always be non-null.

My only vague idea is to use the checkEndAnalysis callback to explore the ExplodedGraph, but I don’t even know how to get started on it, i.e. what I should look for. Any pointers (haha) you could give me would be extremely useful.

Thanks for your answer!

Gabor

I’ve been looking through the implementations of existing Clang Static Analyzer checkers, and I now understand that ProgramState::assume returns a nullptr for a state if that state is unfeasible because of existing constraints. Some experimentation also showed that when ProgramState::isNull says that a value is constrained to be null, then ProgramState::assume on the value will also return nullptr for the non-null state, and a valid state for the null case.

And so looking at DereferenceChecker.cpp lines 206-210, a warning should be issued:

if (nullState) {
if (!notNullState) {
reportBug(nullState, S, C);
return;
}

I’ll try to figure out tomorrow why not (something in reportBug()?) - but please let me know if I’m missing/misunderstanding something.

Anna,

Can this be designed as an extension of the Dereference checker?

That was my original thought, but looking at the Dereference checker source code, it writes that it’s a built-in check in ExprEngine, which discouraged me to try to extend it.

I am not sure which code comment you are referring to but Dereference checker is a very good checker to learn from. I highly recommend to understand how it works before extending it.

I also have no idea how it works or what it does - I’ve been unable to have it produce any warnings, except for this trivial case:

Foo* fp = nullptr;
fp->baz();

And now I tried with your code example, and it also produces a warning. By the way, when analyzing a translation unit, which function(s) are treated as entry points?

AnalysisConsumer.cpp contains code that decides on the order of the functions being analyzed, which ones are analyzed as top level, etc.

It appears to me that all of them are, but are there any restrictions, or ways to control this? I do not see any promising args with clang++ -cc1 --help.

You might find this one helpful:

clang -cc1 -help | grep “function”
-analyze-function
Run analysis on specific function
Specify the function selection heuristic used during inlining
Force the static analyzer to analyze functions defined in header files
Analyze the definitions of blocks in addition to functions
Maximum depth of recursive constexpr function calls
Parse templated function definitions at the end of the translation unit

-analyzer-display-progress
Emit verbose output about the analyzer’s progress

Also, AnalyzerOptions.cpp is a good reference for analyzer options.

Anyway, this code snippet still does not cause the Dereference checker to raise a warning, even though it is trivial:

Foo* fp = getFooPtr(); // based on runtime data, either returns nullptr, or an allocated object

if(!fp)
{

fp->baz();

}

This example definitely works for me:

zaks$ cat ~/tmp/ex.cpp
struct Foo {
void baz();
};
Foo* getFooPtr();
void f() {
Foo* fp = getFooPtr(); // based on runtime data, either returns nullptr, or an allocated object
if(!fp) {
fp->baz();
}
}

$ clang --analyze ~/tmp/ex.cpp
/Users/zaks/tmp/ex.cpp:8:7: warning: Called C++ object pointer is null
fp->baz();
**^~~~~~~~~**
1 warning generated.

To the best of my knowledge, the if(!bar) line creates two code paths: one on which the condition is true (i.e. fp == nullptr), and one on which the condition is false (i.e. fp != nullptr). Obviously the first statement on the former code path is the fp->baz() line, analyzing which the Static Analyzer tells me that ‘fp’ is constrained to be null (using ProgramState::isNull). Therefore, my checker can easily find this issue.

Browsing through the source code of the DereferenceChecker, I can tell it does not use ProgramState::isNull, but instead ProgramState::assume. However, as far as I can tell, it does exactly nothing - I can see that it’s supposed to either raise a warning, or dispatch an ImplicitNullDerefEvent, but neither happens on the code example above (I tested the latter by running both checkers with -analyzer-checker, and in my checker, I subscribe to check::Event).

So why is ProgramState::assume used instead of ProgramState::isNull?

If you drill down the implementation of isNull, you’ll see that it’s similar to assume, possibly more efficient and specialized for NULL.

Also, the Dereference checker does work fine in presence of aliases… Is there a reason why it’s different in your setting?

I’m using Clang 3.3, so unless significant changes have been made to the Static Analyzer or the Dereference checker, things should work the same. I have not modified the Clang source code, except for adding a bunch of custom checkers.

We assume that you are developing against TOT unless stated otherwise. Is there a reason why you don’t?

Thanks for the info! I missed -analyze-function, because I grep-ed for “analyzer” (-analyzer-checker, -analyzer-checker-help, etc. - I assumed it’s a general prefix).

This example definitely works for me:

Something weird is definitely going on here (see my last e-mail). I’ll try to investigate tomorrow.

> We assume that you are developing against TOT unless stated otherwise. Is there a reason why you don’t?

What do you mean by “TOT”?
If you’re asking why I’m not using the latest SVN version, it’s because my boss is concerned about stability. How stable is the current SVN build, in general? Is it reliable, in terms of existing features and functionality? Obviously I don’t expect work-in-progress features to work, but it’d be nice if they were disabled by default until they are stable - is this currently done, or what is your policy in this?

Thanks for the info! I missed -analyze-function, because I grep-ed for “analyzer” (-analyzer-checker, -analyzer-checker-help, etc. - I assumed it’s a general prefix).

This example definitely works for me:

Something weird is definitely going on here (see my last e-mail). I’ll try to investigate tomorrow.

> We assume that you are developing against TOT unless stated otherwise. Is there a reason why you don’t?

What do you mean by “TOT”?
If you’re asking why I’m not using the latest SVN version, it’s because my boss is concerned about stability. How stable is the current SVN build, in general? Is it reliable, in terms of existing features and functionality? Obviously I don’t expect work-in-progress features to work, but it’d be nice if they were disabled by default until they are stable - is this currently done, or what is your policy in this?

We definitely recommend developing new analyzer features on the latest SVN revisions, even though they might be less stable than a released version. We constantly test the compiler and the analyzer and revert any commits that cause regressions. The release compiler receives more testing, but we do not do any additional analyzer tests for the release. The experimental features usually do not get turned on until ready.

Anna.

I’m not working on analyzer features, just on new checkers for the analyzer. Anyhow, the info is definitely interesting! I forwarded your comments about this to my boss. Thank you!

I’m still stuck though with the original problem I wrote about, most recently in answer to Ted’s e-mail. (Sorry, I’m just worried it will get buried under all the others e-mails.)

Yes, I realize the Static Analyzer will recognize that the line p = q means that ‘p’ and ‘q’ now has the same value, but it does not copy the user state (REGISTER_*_WITH_PROGRAMSTATE data) - which makes total sense, since it does not know how that data is structured. Therefore, I need to record the fact the aliasing has happened using checkBind. This is where I encountered the problem described in my original e-mail.

Reading through the checker dev manual, I found the relevant section:
“When x is evaluated, we first construct an SVal that represents the lvalue of x, in this case it is an SVal that references the MemRegion for x. Afterwards, when we do the lvalue-to-rvalue conversion, we get a new SVal, which references the value currently bound to x. That value is symbolic; it’s whatever x was bound to at the start of the function.”

I need the reverse: not the value currently bound to x, but the SVal that references the MemRegion for x itself. I searched the documentation, but all I could find was ProgramState::getLValue, none of whose overloads give me what I need. So how could I achieve this?

It can be done in some cases. Take a look at DereferenceChecker.cpp when it generates an error report. There it walks up the ExplodedGraph to determine what variable/array/field that a null pointer value got loaded from. This is used by the diagnostic machinery.

Back to the aliasing problem: I think it would be very useful if the Static Analyzer exposes functions to retrieve aliasing information. For example, I would like to ask whether a given SVal is an alias of another.

I think the terminology is confusing. SVals don’t alias each other. That concept doesn’t even apply here.

For example, suppose I had the expression:

x + y

and x and y were each bound to the value 1. When the analyzer evaluates “x” and “y” separately we get an SVal of 1 for each subexpressions. Those SVals are the same value, but they don’t alias each other. Aliasing doesn’t even make sense here.

In the end, SVals are just values. They can wrap different kinds of values, be it references to symbolic pieces of memory, actual constants such as 1 and 2, addresses of goto labels, and so on.

What you want to know is if two pointer variables, say “p” and “q”, point to the same piece of memory. In the static analyzer they will be bound to a value which represents their respective pointer values. If “p” and “q” refer to MemRegions that are completely disjoint (say two separate VarRegions) then they cannot alias each other. If they both refer to two SymbolicRegions than they may alias each other.

Right now most of the analyzer assumes that two SymbolicRegions always refer to separate chunks of memory. That’s an optimistic assumption (that a compiler could never make for optimization), but it works well in practice. There are opportunities to improve the analyzer here. Suppose we saw something like:

int *p = foo();
int *q = bar();

if (p == q) { … }

Currently the analyzer doesn’t reason about the condition accurately because ‘p’ and ‘q’ are assumed to essentially not alias because they will refer to two different symbolic pieces of memory. To solve this problem we would need the ability to “unify” two memory regions along a path. That’s a complicated problem, and nobody has gotten to implementing it yet.

But I think you aren’t concerned about this problem. I think you are more thinking of the following scenario:

int *p = foo();

int *q = p;

In this case, ‘p’ and ‘q’ alias. Given two variables, it’s easy to determine if they currently alias if the resolve to the same MemRegion. You can even chop off region offsets if you want to make things more course grained.

But my point is that for your checker this isn’t relevant. The analyzer already reasons about all of this for you. Your checker doesn’t really care about ‘p’ or ‘q’, but rather what it points to. For example, suppose you had:

int *p = foo();

int *r = p;

int *q = r;

*r = 1;

if (q) { … }

In this case, we have ‘p’, ‘q’, and ‘r’ all aliasing each other. The analyzer engine tracks all of this for you without you doing anything. What’s important here is that the pointer value that is loaded from ‘q’ at the if statement is perfectly constrained to be non-null since it was already dereferenced earlier. That’s all that matters. The aliasing is irrelevant.

My original thought was that this alias-analysis could be implemented as a checker itself, which would dispatch an event when it detects an aliasing, but I do not see how it could expose methods to the other checkers (e.g. for them to be able to ask “is x an alias of y in this ProgramState?”). Does this make sense?

I really don’t see how aliasing is relevant.

A null check might not always be redundant. On some paths the null check may be redundant and others it won’t be. Thus there is a dominance relationship here that needs to be checked. Essentially, all paths need to show that the pointer is always non-null before the pointer is checked. Otherwise you’ll get false positives. Doing this correctly is hard, because not all paths are guaranteed to be traced. You’ll need to handle that too.

Yes, this is a problem, but I have absolutely no clue as to how it could be solved. I would naturally want to keep the number of false-positive at minimum, even at the cost of some bugs going undetected.

It’s a real problem, and unless you have a solution it probably makes the checker unusable.

This is essentially an “all paths” problem, and we have a few checkers, such as the IdempotentOperations checker, which try and address this kind of problem.

Roughly speaking, the checker has to be implemented in two stages:

(1) Keep on the side a map from “condition checks” to a tri-state: { always null, may-be-null, no value }. The “no value” is the default, and it essentially means you have no data for a given condition.

For example, suppose you had:

if ( pointer value )

if ( pointer value )

You would have a side-map mapping from each of these IfStmts to the tri-state.

(2) Monitor checks of the pointer value using the analyzer visitor interface. If the pointer value is null and the current map value not “may-be-null”, mark it “null”. Otherwise, mark it “may-be-null”, which essentially means that the given condition statement is no longer in contention for the warning.

(3) After the analyzer finishes exploring paths, go over this map and find the entries that are marked “always null”. Those are the places to emit a warning.

It can be done in some cases. Take a look at DereferenceChecker.cpp when it generates an error report. There it walks up the ExplodedGraph to determine what variable/array/field that a null pointer value got loaded from. This is used by the diagnostic machinery.

I guess you mean bugreporter::trackNullOrUndefValue. I’ve been looking at its source code, but I don’t yet even understand what it’s meant to do, let alone how it works. I’ll work on this though, thanks!

I think the terminology is confusing. SVals don’t alias each other. That concept doesn’t even apply here.

You’re correct, of course. I still get confused around what is an SVal and what is a MemRegion - even after e.g. Jordan’s explanation.
For example, when I have a Foo* fp, and ‘fp’ appears somewhere (e.g. in a null-checking condition), then:

  • I get an SVal that says ‘&fp’

  • I can .getAsRegion() to get a MemRegion that dumps to ‘fp’

  • I can then ProgramState::getSVal() on this MemRegion to get e.g. ‘&SymRegion{conj_$3{struct Foo *}}’

As far as I can tell, this last in the ‘value’ of the fp variable, as in, it is what was last bound to it; then ‘&fp’ is the memory region in which the value of the variable is stored, and finally ‘fp’ is pointer value to the previous region. Is this roughly correct?

Currently the analyzer doesn’t reason about the condition accurately because ‘p’ and ‘q’ are assumed to essentially not alias because they will refer to two different symbolic pieces of memory. To solve this problem we would need the ability to “unify” two memory regions along a path. That’s a complicated problem, and nobody has gotten to implementing it yet.

Out of curiosity, conceptually, what makes this complicated? I realize you must hate this question, and it will probably be evident once I start studying how the Static Analyzer is implemented, but I just couldn’t refrain from asking. You can ignore me here. :slight_smile:

This is essentially an “all paths” problem, and we have a few checkers, such as the IdempotentOperations checker, which try and address this kind of problem.
Roughly speaking, the checker has to be implemented in two stages: […]

Thank you very much, I implemented your approach, and it worked! It was quite simple - following your idea -, and I’m actually quite embarrassed that I had to ask for help on this. I was thinking along the lines of traversing the ExplodedGraph in checkEndAnalysis, which I suppose would have worked as well, except it would have been way more complicated.

Once again, thank you very much for your help!

Gabor