(@jvoung, lead for Google’s C++ Nullability project)
Chad, I’m very supportive of the idea, in general, of getting flow-sensitive Nullability checking into Clang. However, at Google, we’ve been developing a flow-sensitive checker for a few years now and, based on our experience, we think it’s not a simple task to get it into the compiler with acceptable performance. But, it definitely seems worth a try.
For our part, we’ve developed a ClangTidy check as an alternative. It still gives good quality diagnosis, but separate from the compiler. You can find our code at crubit/nullability at main · google/crubit · GitHub. Next year, one of the major goals for the project is to upstream the ClangTidy check into the clang/llvm repo so it is generally available. We would love your help if you’re interested in working with us! It’s about 10k lines of check code and 10k lines of tests.
A few more points:
- From our analysis, Nullable is the wrong default. Of the hundreds of millions of lines of code we’ve analysed, Nonnull is far more prevalent. In third-party libraries we import, the ratio is 4:1 and in our own code (for historical reasons) 6:1. So, Nonnull as the default is preferred from the perspective of reducing syntactic noise.
- To support incremental adoption in legacy code, you also need an “unknown” state, which identifies a pointer as being neither nullable nor nonnull, and is treated optimistically. This avoids overwhelming developers with false positives for pointers that are “really” nonnull, but have not been annotated yet. For us, this is the right default to start with. Once code has been annotated, the file should be marked (say, with a pragma) as “nonnull default” so that going forward pointers are in only one of 2 states. For an excellent discussion about the need for an explicit, 3rd, unknown state (and the inspiration for our design), see [2105.06081] Gradual Program Analysis for Null Pointers.
- We’ve found that a simple boolean model results in too much noise for legacy code, because developers use complex reasoning about their pointers. We use a relational model for our booleans, modeling state as formulas and using a SAT solver to reason about Nullability. This is costly, but also reduces false positives by about 6x. It’s unclear though if its strictly necessary – for some codebases, the higher false positive rate (or, for new code, the stricter constraints) will be worth paying for if it means the feature can be integrated directly in the compiler.
I discussed our system at a little more length in my C++ Now talk earlier this year: https://www.youtube.com/watch?v=3zQ4zw4GNV0.
I’ll leave it @jvoung to continue the discussion. But, overall, we’d love to see this become a standard Clang feature!