Hi!
Our stance has long been that despite being visible to the user, core checkers shouldn’t be enabled/disabled by hand, because they do important modeling that any pathsensitive checker may depend on, potentially causing incorrect reports and crashes. However, since then, a series of patches gave us the ability to express dependencies and hide modeling checkers.
The core doesn’t only do modeling, however, it emits diagnostics as well, and these diagnostics may contain false positives, sometimes to the degree where getting rid of them is desirable, yet we explicitly state that it shouldn’t be disabled. And this problem doesn’t only affect the core itself: disabling any checker that emits diagnostics and also is a dependency of some another checker will disable dependent checkers, which isn’t always the intent.
When I originally implemented the checker dependency system, my immediate goal was to fix a bug causing inconsistent checker names, but I firmly believe its time to make the it even more rigid:
- Don’t allow dependency checkers to emit diagnostics. Since the list of these checkers can easily be retrieved through Checkers.td, assertions could be used to enforce this. This would solve the issue of forgetting to create CheckerProgramPointTags for subcheckers for good, and a helpful assertion message could guide checker developers about it.
- Make all dependency checkers hidden. If no dependent checkers are enabled, let the analyzer disable them. Disabling a non-hidden checker should only mean that the diagnostics from it are undesired.
Back when we were just discussing checker dependencies, i wasn’t sure we need them. My point was that if we instead split all checkers into checkers that emit actual warnings but don’t do modeling (and therefore don’t need to depend on each other) and checkers that only do modeling but never do checking (and therefore never need to be turned off), our hierarchy of checkers becomes fairly flat and there’s no need to write down dependencies at all.
Sounds about right! In this case, just as the const keyword in a disciplined codebase, dependencies wouldn’t be needed. However, we could use it to enforce this, among other things, like the use of checker tags. And now that I think about it, we do really need some sort of dependency system to keep the checker naming bug in the abyss, though I can’t confidently say this is the only solution. I’ll keep this in the back of my mind and either try to prove that we need it or mention alternatives.
With these first two bullets we get closer than ever to that solution, right?
Yup. We would only be able to enforce it with asserts, but I suspect we have at least a single testcase for each checker that emits a report, so theoretically, its not even “getting closer”, but rather actually nailing it! Theoretically.
Even if we ever need to turn off modeling-checkers (eg., they have horrible bugs in them and we suggest disabling them as a workaround), in most cases it’s fine to keep their respective checking-checkers on (they’ll simply not find any bugs: say, MallocChecker will be unable to find bugs if the memory is not ever marked as allocated or freed). If it’s not fine to keep them on - well, just turn them off manually, given that you are already smart enough to turn off the modeling-checker.
I vaguely remember reading in your workbook that checkers are relatively lightweight compared to what the core is doing (I also remember something like a double-digit percentage of time being spent in SymbolReaper alone), but why keep it enabled when we know it’ll do nothing? Also, if we know it wouldn’t do anything, -analyzer-list-enabled-checkers would be more precise by not displaying it there.
That said, I agree that these are low level issues, and I wouldn’t worry much about them.
Situations when we truly need dependencies are currently fairly exotic. We may run into more such situations when we have more complicated checker interactions (say when we seriously try to model the C++ standard library - i’m pretty sure that’s gonna be quite a mess), but for now our hierarchy of checkers is fairly flat in practice.
I agree, but I guess when the time comes, an already mature dependency system would be one less thing to worry about, right? Also, we currently have 46 dependencies registered, and have a total of 164 checkers, meaning that a good percentage of them is affected – would you say that a significant portion of these shouldn’t depend on one another? If its not too much trouble, do you have an example on top of your head where you believe its appropriate, and one where its unnecessary?
(You can always see the list of dependencies in /tools/clang/include/clang/StaticAnalyzer/Checkers/Checkers.inc)
- Allow checkers to depend on packages.
- Create the hidden(!) coreModeling package, separate all the modeling from core to it, leaving core as a set of highly recommended, but no longer mandatory checkers. coreModeling would be a dependency of all pathsensitive checkers.
This is equivalent to annotating path-sensitive checkers as such. This would allow, say, clang-tidy enable path-insensitive checkers without bringing in path-sensitive core checkers.
I’m worried that i need to remember to add a dependency every time i make a path-sensitive check. Can we enforce it somehow, or even do that automatically? 'Cause at run-time we already know whether any path-sensitive checkers are enabled (by looking at how many subscribers do path-sensitive callbacks have). Can we use that information to automatically bring in path-sensitive core checkers when path-sensitive analysis was requested, or is it too late at this point?
Aye, it sounds totally possible! Resolving dependencies has to be done in order, but coreModeling would be a special case – no checker should directly depend on it, so we would totally be fine registering them after every other checker, depending on CheckerManager::hasPathSensitiveCheckers().