[StaticAnalyzer] Questions about SymbolDerived class and checkDeadSymbol callback


I am developing a resource leak checker that is very similar to SimpleStreamChecker and I noticed an issue when tracking symbols that is SymbolDerived type.

The source code of my checker can be downloaded at this link: https://gist.github.com/zeroomega/2fcabf5979a0e42db1ccb8f30472d071

An example of resource leaks that my checker is trying to find is like this:

typedef uint32_t handle_t;
typedef uint32_t status_t;

status_t request(
uint32_t flags,
handle_t* out0,
handle_t* out1);

status_t release(handle_t handle);

void checkLeak1(int tag) {
handle_t sa, sb;
request(0, &sa, &sb);
if (tag > 0) {

Function call “request(0, &sa, &sb)” will allocate 2 resource descriptors (handles) to “sa” and “sb”. The resource will be leaked if someone forgot to call “release(sa)” on a resource descriptor (handle).

In my checker, I override “checkPostCall” callback to map the SymbolRef of arguments “sa” and “sb” in any “request(0, &sa, &sb)” call to “Allocated” state. And I use “checkPreCall” callback to map the SymbolRef of argument “sa” in any “release(sa)” call to “Released” state. If a tracked SymbolRef is dead when using “SymbolReaper.isDead()” API in “checkDeadSymbol” callback and its state is “Allocated”, this SymbolRef will be considered as leaked and will be reported.

In the example code “checkLeak1” , there is an early return “if (tag > 0) { return;}” that will not release any handles, which will cause leaks. My checker can successfully detect this leak. However, when I change the leak example to this:

void checkLeak2(int tag) {
handle_t ha[2];
request(0, ha, ha+1);
if (tag > 0) {

The checker will not report any leaks at all. Further debugging shows that the static analyzer allocates a SymbolConjured symbol for array “ha” and SymbolDerived symbols for item "ha[0]’ and “ha[1]”. These two SymbolDerived symbols has parent symbol points to SymbolConjured symbol of “ha”. In the path of “if (tag > 0) { return;}”, the SymbolDerived symbols of “ha[0]” and “ha[1]” are not considered dead when using “SymbolReaper.isDead()”. So no leak is reported in this path. These two SymbolDerived symbols are considered dead by “SymbolReaper.isDead()” at the end of path “release(ha[0]);release(ha[1]);”.

Another two examples can show this behavior more clearly:

void checkLeak3(int tag) {
handle_t ha[2];
request(0, ha, ha+1);

The checker will not report any leak on the code of “checkLeak3” example because the SymbolDerived symbols of “ha[0]” and “ha[1]” are not considered dead when “checkDeadSymbol” callback is invoked at “return;” But if I add two fake assignments like this:

void checkLeak4(int tag) {
handle_t ha[2];
request(0, ha, ha+1);
ha[0] = *ha;
ha[1] = *(ha+1);

The checker will report leaks on “ha[0]” and “ha[1]” because the SymbolDerived symbols of “ha[0]” and “ha[1]” are considered as dead this time.

My question is:

Is this the intended behavior of clang static analyzer when processing SymbolDerived symbols?
If so, what should I do to reliably detect the resource leaks like the example in “checkLeak2”?

I already tried the method that check the liveness of the parent SymbolConjured symbols of each SymbolDerived symbols. If a SymbolDerived symbol’s parent symbol is “dead”, it will considered as dead as well. However, this method does not work in the “checkLeak2” example, as the SymbolConjured symbol “ha” is dead after analyzing “release(ha[0])” and before “release(ha[1])”, the analyzer will report “ha[1]” as leak because “release(ha[1])” is not analyzed yet and parent symbol of “ha[1]” is already dead and “ha[1]” is not in “Release” state.

Thanks for any help,


Before anything - my blind guess is that you've found another instance of the "zombie symbols" bug, could you try if any of the following solutions helps:

(1) could you see if changing "SymReaper.isDead(Sym)" to "!SymReaper.isLive(Sym)" helps?
(2) or, instead of (1), could you implement checkLiveSymbols() callback in your checker and blindly say SymReaper.maybeDead(Sym) on all symbols your checker tracks?

Long story short, there's currently a counter-intuitive behavior in SymbolReaper, which we discovered recently but didn't come up with a good fix, due to which dead symbols are a different notion from non-live symbols. Generally, dead symbols are symbols that were explicitly released during dead symbol collection by one of the program state's entities (Environment, Store, Constraints, or one of the checkers through checkLiveSymbols() callback) - it's a plain finite set of symbol inside the SymbolReaper object, while non-live symbols are symbols that could not be proven live anymore by deep-scanning the current program state, regardless of whether someone has explicitly released them (and the set of such non-live symbols might essentially be infinite; however isLive() is a relatively fast operation since scan results are already pre-computed).

When you obtained derived symbols for ha[0] and ha[1], only your checker knows that these symbols have been constructed and are being tracked. Technically, these symbols were constructed by StoreManager, but he doesn't record them in the store, because he doesn't need to do that to provide them again. Similarly to SymbolRegionValue, SymbolDerived would be produced automatically as long as the default-bound conjured symbol is there, so the Store doesn't need to explicitly remember that SymbolDerived is "bound" to the array elements. So your checker is the only entity in the program state that is aware of these symbols. And therefore there's nobody who could add these symbols to the dead set for you to query. So you can either add them to the dead set by trick (2) (if they're proven live they won't be added to the dead set) or by trick (1) don't rely on the dead set.

SimpleStreamChecker is known to be broken in a similar manner. We noticed this before when we figured out that he doesn't find problems of form

   FILE *fp = fopen("foo.txt", "r");
   fp = 0;

where the tracked value is overwritten. The reason is similar: once the value is overwritten in the Store, it's no longer visible to any entitiy but the checker, and there's nobody to add it to the dead set.

Most of the checkers are unaffected though, because symbols they're interested in are also tracked in other parts of the program state. For instance, pointers returned by malloc() are known to have a known region-extent and default-bound to UndefinedVal, so they're tracked by store and constraint manager. Considering that we don't have an awesome short-term solution, it seems that we have to advice checker developers to take steps (1) or (2) as a workaround.

Additional info:
- http://lists.llvm.org/pipermail/cfe-dev/2016-March/047922.html
- https://reviews.llvm.org/D18860

Another affected user:
- https://reviews.llvm.org/D29419

Thanks for your detailed explanation. I tried the solution (1) and it works in my scenario.