[analyzer] Zombie symbols finally getting buried!

Just wanted to give a heads up that i'm finally planning to land the ancient patch https://reviews.llvm.org/D18860 which makes sure that dead symbols are no longer accessed from "properly written" checkers (explained below) and that the respective leak warnings are issued.

This is an API breaking change, so downstream checkers are likely to be affected, so i guess i should explain how to update the checkers, because it might be hard to figure this out from the patch, as the original problem is quite complicated. The patch is already updating the existing checkers accordingly, so you can use it as an example.

The good news is, the API is simplified and the amount of boilerplate is reduced and now there are less ways of doing the same thing and/or shooting yourself in the foot :slight_smile:

Of course there may be unexpected things happening in your checker that make these advice not applicable, but in a nutshell, here's what changed and how checkers should be updated:

** SymReaper.isDead(Sym) is now hard-wired to mean !SymReaper.isLive(Sym) **

This is intuitive and not really API-breaking. But i guess it's important to realize that these two no longer mean different things. In particular, there can no longer be symbols that are neither Dead nor Live, and there can no longer be symbols that are both Dead and Live at the same time (the latter are explained in ⚙ D18860 [analyzer] Fix the "Zombie symbols" issue.).

** SymbolReaper::maybeDead() is removed **

It is no longer necessary to mark symbols that the checker is no longer interested in as "maybe dead" in checkLiveSymbols(). If your code actively marks symbols as dead with the help of this method, just remove that code. Marking live symbols as live is, of course, still as necessary as it used to be. Use SymbolReaper::isDead() when you previously used SymbolReaper::imaybeDead()'s return value to figure out whether the symbol is dead or not.

** SymbolReaper's dead set iterators (dead_begin(), dead_end()) are removed. **

This is the most important part, i guess. It is no longer possible to iterate through dead symbols like we usually did, eg.:

 REGISTER\_SET\_WITH\_PROGRAMSTATE\(Trait, SymbolRef\);
 \.\.\.
 for \(auto I = SymReaper\.dead\_begin\(\), E = SymReaper\.dead\_end\(\); I \!= E; \+\+I\) \{
   SymbolRef Sym = \*I;
   if \(State\->contains<Trait>\(Sym\)\) \{
     reportLeak\(Sym\);
   \}
 \}

This used to be a common idiom, but it never made sense in the first place because the number of dying symbols is potentially infinite and it's impossible to enumerate all of them.

Additionally, calling isLive() or maybeDead() within the outer loop over Sym might have invalidated iterator I, causing a crash, as explained in ⚙ D18860 [analyzer] Fix the "Zombie symbols" issue. - but i doubt anybody ever wrote such code.

The proposed approach, that is equivalent to the old approach up to zombie symbols and is as such more correct, would be to re-order the loops "inside out":

 for \(auto Sym : State\->get<Trait>\(\)\) \{
   if \(SymReaper\.isDead\(Sym\)\) \{
     reportLeak\(Sym\);
   \}
 \}

The performance of the new idiom is slightly different from what it used to be. It's faster when the checker tracks few symbols (with the noticeable example of checker not tracking anything at all, which is most checkers on most of the code), but it's slower when the checker tracks a lot of symbols. In practice, performance gains and losses turned out to be marginal on all real-world code i've tested this patch so far.

Usage of this new idiom is what i meant by "properly written" checkers in the beginning. If the checker always cleans up its traits in checkDeadSymbols() or keeps symbols alive in checkLiveSymbols() for as long as it needs to, it should no longer ever encounter dead symbols, and neither would the rest of the Static Analyzer.

** SymbolReaper::hasDeadSymbols() is removed **

It is impossible to figure out if there are dead symbols. So any code of hasDeadSymbols() should be optimized out into "true", with subsequent dead code elimination. I doubt it ever meant anything in the first place because the checkDeadSymbols() callback will anyway be skipped entirely if there are no dead symbols. And also hasDeadSymbols() never had any well-defined meaning in checkLiveSymbols() because dead symbols are in the middle of being computed.

** The checkDeadSymbols callback is never skipped **

It used to have been skipped when there were no dead symbols, i.e., when SymReaper.hasDeadSymbols() yielded false. Now it cannot yield false (it doesn't even exist, as explained above), so the callback is never skipped. This is presumed to be the main reason why doesn't this change give us performance gains.

While it sounds like a safe change, it did actually cause a slight change in behavior in MallocChecker (⚙ D54013 [analyzer] NFC: MallocChecker: Avoid redundant transitions.) and in NullabilityChecker (⚙ D54017 [analyzer] NullabilityChecker: Invariant violation should only be triggered for symbols.). Which means that if your checker does something more than state cleanup or leak report in checkDeadSymbols(), especially if it does things unconditionally regardless of whether any symbols are dead, you might need to audit the code to see if it behaves as desired and doesn't rely on the callback not firing under certain circumstances (which most likely indicates a bug anyway).

** I think that's it. **

Thanks to everybody who discussed and pushed this patch forward :slight_smile: