Static Analyzer: LiveVariables and SymbolReaper

Hi,
(long read, sorry in advance)
i’m looking into the internals of SymbolReaper and LiveVariables and have
several questions on my mind. Since I have not built a consistent understanding of what’s going on there yet + assume I might be missing smth, I’d like to start with some very basic example to show how the (potential) issue manifests itself there. Any explanations / suggestions / comments would be greatly appreciated.

Example s.c:

typedef struct _IO_FILE FILE;
extern FILE *fopen(const char *path, const char *mode);

void g() {};
void h() {};

void f(int c) {
FILE *p = fopen(“foo.c”, “r”);
g();
h();
return;
}

clang -cc1 -analyze -analyzer-checker=debug.ViewCFG s.c

says that there 3 blocks: B2(entry) → B1 → B0(exit), B2 and B0 are essentially empty and all the interesting things are inside B1.
Now let’s try to run clang -c --analyze -Xanalyzer -analyzer-checker=alpha.unix.Stream s.c

s.c:8:9: warning: Value stored to ‘p’ during its initialization is never read
FILE *p = fopen(“foo.c”, “r”);
^ ~~~~~~~~~~~~~~~~~~~
s.c:9:3: warning: Opened File never closed. Potential Resource leakv
g();
^~~
2 warnings generated.

What looks suspicious here - the location of the warning

“s.c:9:3: warning: Opened File never closed. Potential Resource leak”
- it points to g() instead of the end of function or return statement.

Now let’s run the analyzer under lldb and set the breakpoint at
LiveVariables::isLive(const Stmt *Loc, const Stmt *S)

178 bool LiveVariables::isLive(const Stmt *Loc, const Stmt *S) {

(lldb) p D->dump()
VarDecl 0x11286c178 <s.c:8:3, col:31> col:9 p ‘FILE *’ cinit
`-CallExpr 0x11286c350 <col:13, col:31> ‘FILE *’

-ImplicitCastExpr 0x11286c338 col:13 ‘FILE ()(const char *, const char *)’
-DeclRefExpr 0x11286c1d8 <col:13> 'FILE *(const char *, const char *)' Function 0x11286bce0 'fopen' 'FILE *(const char *, const char *)' -ImplicitCastExpr 0x11286c3a0 <col:19> 'const char *' <BitCast> -ImplicitCastExpr 0x11286c388 col:19 ‘char *’
-StringLiteral 0x11286c238 <col:19> 'char [6]' lvalue "foo.c" -ImplicitCastExpr 0x11286c3d0 col:28 ‘const char *’
-ImplicitCastExpr 0x11286c3b8 <col:28> 'char *' <ArrayToPointerDecay> -StringLiteral 0x11286c2a8 col:28 ‘char [2]’ lvalue “r”

(lldb) p getImpl(impl).stmtsToLiveness[S].isLive(D)
(bool) $0 = false

(lldb) p S->dump()
CallExpr 0x11286c470 ‘void’
-ImplicitCastExpr 0x11286c458 'void (*)()' <FunctionToPointerDecay> -DeclRefExpr 0x11286c400 ‘void ()’ Function 0x11286be18 ‘g’ ‘void ()’

(lldb) fr sel 7
frame #7: 0x0000000106ed4397 clang-6.0`clang::ento::ExprEngine::ProcessStmt(this=0x00007fff5fbf97c0, S=const clang::CFGStmt @ 0x00007fff5fbf8f50, Pred=0x0000000114004a30) at ExprEngine.cpp:497

Here getImpl(impl).stmtsToLiveness[S].isLive(D) returns false and
Static Analyzer thinks that the variable “p” is dead and will remove the binding (from the store). Since this variable is marked as dead,
the corresponding checkers are called (in ExprEngine::removeDead getCheckerManager().runCheckersForDeadSymbols …) and the stream checker reports a leak.
In fact, if i’m not mistaken, at this point the variable “p” is not dead yet and the leak should be reported much later at the end of the function.

The problem does not appear to be specific to StreamChecker, after playing with several other simple examples, it looks like LiveVariables::isLive returns false in most cases
(and so does SymbolReaper::isLive(const VarRegion *VR, bool includeStoreBindings) ).
This results in loosing information about variables or emitting diagnostics at the wrong locations, for example, see FIXMEs at
llvm/tools/clang/test/Analysis/unions.cpp +96,
llvm/tools/clang/test/Analysis/self-assign.cpp +41,
llvm/tools/clang/test/Analysis/pr22954.c in “int f19(int i) {…}”, llvm/tools/clang/test/Analysis/pr22954.c in “int f31() {…}” etc.

So the question here - if CSA works as expected for this example
(in particular, I’m wondering if LiveVariables::computeLiveness
does the right thing or, alternatively, if it’s used properly).

P.S. After playing a little bit more with different toy code snippets and dumping the full state
of LiveVariables (everything: stmtsToLiveness, blocksEndToLiveness etc) it looks like in most cases it’s almost empty.
P.P.S. The following experiment might be of some interest: if one switches the method
SymbolReaper::isLive to
(VarContext == CurrentContext) || VarContext->isParentOf(CurrentContext) || isAlwaysAlive(…)
the behavior for some very simple tests (i was looking at the toy tests with 3-4 blocks in CFG) becomes more predictable (and a bunch of FIXMEs get corrected),
but this, of course, would not be the correct implementation in the general case.

Kind regards,
Alexander Shaposhnikov

ohh, typo above: i meant
bool isLive(const Stmt *S, const VarDecl *D)
(instead of LiveVariables::isLive(const Stmt *Loc, const Stmt *S))

The analyzer is trying to report the leak at the point. It uses live variable analysis for that:

I see,
many thanks for the explanation,
yeah, it makes a lot of sense.

Wanted to add to that, just in case, that our liveness analysis is sensitive to the current path. Like, the syntactic variable dies when it's no longer referenced anywhere below this point in the CFG (which is not path-sensitive), but the *symbol* that was representing the value of this variable (such as *the* file descriptor in your example) may still live in other variables on some paths, and it wouldn't be reported dead until all these variables die.

By definition, *symbols* die when they cannot be extracted from the current ProgramState through any events that happen on subsequent analysis. Most commonly, this means that dead symbols aren't mentioned in the ProgramState at all, but there may be exceptions when the information in the ProgramState is redundant (eg. LazyCompoundVal may mention a bunch of dead symbols which cannot be extracted from it through any program code, simply because we were too lazy to clean this up).

Additionally, dead symbol doesn't automatically mean a leak, because the checker needs to additionally track "escape" events, when a symbol disappears from the analyzer's point of view, but may still be tracked by other parts of the program that the analyzer doesn't know anything about (eg. passed as an argument to a function we know nothing about).