[analyzer][RFC] Our stance on checker dependencies and disabling core checkers

Hi!

Our stance has long been that despite being visible to the user, core checkers shouldn’t be enabled/disabled by hand, because they do important modeling that any pathsensitive checker may depend on, potentially causing incorrect reports and crashes. However, since then, a series of patches gave us the ability to express dependencies and hide modeling checkers.

The core doesn’t only do modeling, however, it emits diagnostics as well, and these diagnostics may contain false positives, sometimes to the degree where getting rid of them is desirable, yet we explicitly state that it shouldn’t be disabled. And this problem doesn’t only affect the core itself: disabling any checker that emits diagnostics and also is a dependency of some another checker will disable dependent checkers, which isn’t always the intent.

When I originally implemented the checker dependency system, my immediate goal was to fix a bug causing inconsistent checker names, but I firmly believe its time to make the it even more rigid:

  • Don’t allow dependency checkers to emit diagnostics. Since the list of these checkers can easily be retrieved through Checkers.td, assertions could be used to enforce this. This would solve the issue of forgetting to create CheckerProgramPointTags for subcheckers for good, and a helpful assertion message could guide checker developers about it.
  • Make all dependency checkers hidden. If no dependent checkers are enabled, let the analyzer disable them. Disabling a non-hidden checker should only mean that the diagnostics from it are undesired.
  • Allow checkers to depend on packages.
  • Create the hidden(!) coreModeling package, separate all the modeling from core to it, leaving core as a set of highly recommended, but no longer mandatory checkers. coreModeling would be a dependency of all pathsensitive checkers.

I think this would allow users to opt out of any undesired diagnostics, without the fear of causing instabilities. It would simultaneously create a far more precise representation of dependencies, since no checker depends on another’s diagnostics. Also note that while this API change is significant, its also totally backward-compatible, as nothing would change on the user-facing side.

Would love to hear your feedback on this!

Cheers,
Kristóf

Back when we were just discussing checker dependencies, i wasn’t sure we need them. My point was that if we instead split all checkers into checkers that emit actual warnings but don’t do modeling (and therefore don’t need to depend on each other) and checkers that only do modeling but never do checking (and therefore never need to be turned off), our hierarchy of checkers becomes fairly flat and there’s no need to write down dependencies at all. With these first two bullets we get closer than ever to that solution, right? Even if we ever need to turn off modeling-checkers (eg., they have horrible bugs in them and we suggest disabling them as a workaround), in most cases it’s fine to keep their respective checking-checkers on (they’ll simply not find any bugs: say, MallocChecker will be unable to find bugs if the memory is not ever marked as allocated or freed). If it’s not fine to keep them on - well, just turn them off manually, given that you are already smart enough to turn off the modeling-checker. Situations when we truly need dependencies are currently fairly exotic. We may run into more such situations when we have more complicated checker interactions (say when we seriously try to model the C++ standard library - i’m pretty sure that’s gonna be quite a mess), but for now our hierarchy of checkers is fairly flat in practice. This is equivalent to annotating path-sensitive checkers as such. This would allow, say, clang-tidy enable path-insensitive checkers without bringing in path-sensitive core checkers. I’m worried that i need to remember to add a dependency every time i make a path-sensitive check. Can we enforce it somehow, or even do that automatically? 'Cause at run-time we already know whether any path-sensitive checkers are enabled (by looking at how many subscribers do path-sensitive callbacks have). Can we use that information to automatically bring in path-sensitive core checkers when path-sensitive analysis was requested, or is it too late at this point?

Hi!

Our stance has long been that despite being visible to the user, core checkers shouldn’t be enabled/disabled by hand, because they do important modeling that any pathsensitive checker may depend on, potentially causing incorrect reports and crashes. However, since then, a series of patches gave us the ability to express dependencies and hide modeling checkers.

The core doesn’t only do modeling, however, it emits diagnostics as well, and these diagnostics may contain false positives, sometimes to the degree where getting rid of them is desirable, yet we explicitly state that it shouldn’t be disabled. And this problem doesn’t only affect the core itself: disabling any checker that emits diagnostics and also is a dependency of some another checker will disable dependent checkers, which isn’t always the intent.

When I originally implemented the checker dependency system, my immediate goal was to fix a bug causing inconsistent checker names, but I firmly believe its time to make the it even more rigid:

  • Don’t allow dependency checkers to emit diagnostics. Since the list of these checkers can easily be retrieved through Checkers.td, assertions could be used to enforce this. This would solve the issue of forgetting to create CheckerProgramPointTags for subcheckers for good, and a helpful assertion message could guide checker developers about it.
  • Make all dependency checkers hidden. If no dependent checkers are enabled, let the analyzer disable them. Disabling a non-hidden checker should only mean that the diagnostics from it are undesired.

Back when we were just discussing checker dependencies, i wasn’t sure we need them. My point was that if we instead split all checkers into checkers that emit actual warnings but don’t do modeling (and therefore don’t need to depend on each other) and checkers that only do modeling but never do checking (and therefore never need to be turned off), our hierarchy of checkers becomes fairly flat and there’s no need to write down dependencies at all.

Sounds about right! In this case, just as the const keyword in a disciplined codebase, dependencies wouldn’t be needed. However, we could use it to enforce this, among other things, like the use of checker tags. And now that I think about it, we do really need some sort of dependency system to keep the checker naming bug in the abyss, though I can’t confidently say this is the only solution. I’ll keep this in the back of my mind and either try to prove that we need it or mention alternatives.

With these first two bullets we get closer than ever to that solution, right?

Yup. We would only be able to enforce it with asserts, but I suspect we have at least a single testcase for each checker that emits a report, so theoretically, its not even “getting closer”, but rather actually nailing it! Theoretically.

Even if we ever need to turn off modeling-checkers (eg., they have horrible bugs in them and we suggest disabling them as a workaround), in most cases it’s fine to keep their respective checking-checkers on (they’ll simply not find any bugs: say, MallocChecker will be unable to find bugs if the memory is not ever marked as allocated or freed). If it’s not fine to keep them on - well, just turn them off manually, given that you are already smart enough to turn off the modeling-checker.

I vaguely remember reading in your workbook that checkers are relatively lightweight compared to what the core is doing (I also remember something like a double-digit percentage of time being spent in SymbolReaper alone), but why keep it enabled when we know it’ll do nothing? Also, if we know it wouldn’t do anything, -analyzer-list-enabled-checkers would be more precise by not displaying it there.

That said, I agree that these are low level issues, and I wouldn’t worry much about them.

Situations when we truly need dependencies are currently fairly exotic. We may run into more such situations when we have more complicated checker interactions (say when we seriously try to model the C++ standard library - i’m pretty sure that’s gonna be quite a mess), but for now our hierarchy of checkers is fairly flat in practice.

I agree, but I guess when the time comes, an already mature dependency system would be one less thing to worry about, right? :slight_smile: Also, we currently have 46 dependencies registered, and have a total of 164 checkers, meaning that a good percentage of them is affected – would you say that a significant portion of these shouldn’t depend on one another? If its not too much trouble, do you have an example on top of your head where you believe its appropriate, and one where its unnecessary?

(You can always see the list of dependencies in /tools/clang/include/clang/StaticAnalyzer/Checkers/Checkers.inc)

  • Allow checkers to depend on packages.
  • Create the hidden(!) coreModeling package, separate all the modeling from core to it, leaving core as a set of highly recommended, but no longer mandatory checkers. coreModeling would be a dependency of all pathsensitive checkers.

This is equivalent to annotating path-sensitive checkers as such. This would allow, say, clang-tidy enable path-insensitive checkers without bringing in path-sensitive core checkers.

I’m worried that i need to remember to add a dependency every time i make a path-sensitive check. Can we enforce it somehow, or even do that automatically? 'Cause at run-time we already know whether any path-sensitive checkers are enabled (by looking at how many subscribers do path-sensitive callbacks have). Can we use that information to automatically bring in path-sensitive core checkers when path-sensitive analysis was requested, or is it too late at this point?

Aye, it sounds totally possible! Resolving dependencies has to be done in order, but coreModeling would be a special case – no checker should directly depend on it, so we would totally be fine registering them after every other checker, depending on CheckerManager::hasPathSensitiveCheckers().

Absolutely!! I mean, let’s take, say, MallocChecker. It only does modeling within its own isolated GDM trait; it’s invisible to other families of checkers. In this case if the model is turned off, the checker has no effect - it effectively gets silently disabled, we don’t need an additional facility to get it disabled. Most of the time we also don’t need an additional facility to make the model enabled when the checker is enabled - because it doesn’t make much sense to disable the model in the first place. Such dependencies should still be there, just for the sake of discipline and sanity (and also for sorting out problems with checker names), but they aren’t an indication that our dependency trees are actually complex enough to motivate a facility for dependency management. On the contrary, the dependency between MoveChecker and SmartPtrModel is non-trivial: we made the latter specifically to fix common false positives of the former, so we recommend enabling the SmartPtrModel when the user wants MoveChecker. This is an example where things truly get complicated, and it will get even more complicated in the future. I think this is the actual motivation we have for checker dependencies and i’m very happy that it’s in place.

Also, we currently have 46 dependencies registered, and have a total of 164 checkers, meaning that a good percentage of them is affected – would you say that a significant portion of these shouldn’t depend on one another? If its not too much trouble, do you have an example on top of your head where you believe its appropriate, and one where its unnecessary?

(You can always see the list of dependencies in /tools/clang/include/clang/StaticAnalyzer/Checkers/Checkers.inc)

I mean, let’s take, say, MallocChecker. It only does modeling within its own isolated GDM trait; it’s invisible to other families of checkers. In this case if the model is turned off, the checker has no effect - it effectively gets silently disabled, we don’t need an additional facility to get it disabled. Most of the time we also don’t need an additional facility to make the model enabled when the checker is enabled - because it doesn’t make much sense to disable the model in the first place. Such dependencies should still be there, just for the sake of discipline and sanity (and also for sorting out problems with checker names), but they aren’t an indication that our dependency trees are actually complex enough to motivate a facility for dependency management.

On the contrary, the dependency between MoveChecker and SmartPtrModel is non-trivial: we made the latter specifically to fix common false positives of the former, so we recommend enabling the SmartPtrModel when the user wants MoveChecker. This is an example where things truly get complicated, and it will get even more complicated in the future. I think this is the actual motivation we have for checker dependencies and i’m very happy that it’s in place.

Ah, very interesting. I actually think the contrary – subcheckers are the ones that really drive this dependency development forward, because for them, its a feature, it it provides a safe, easy to use interface. Adding more functionalities to it could further reduce boilerplate code, improve stability and consistency.

On the other hand, MoveChecker’s dependence and SmartPtrModel is far simpler, as they don’t interact at all directly. For them, the now gone inter-checker-api header would’ve been sufficient as long as it gets the job done.

We seem to agree on that an even more thought-out dependency system would be good, and that separating checkers into modeling and diagnostics portions would be ideal, so I’ll detail my proposal further:

I googled it, apparently “modelling” is British, and “modeling” is American, so for this mail I’m just gonna roll with American. :^)

  1. Don’t allow dependency checkers (and implicitly regard them as modeling checkers) and modeling checkers to emit diagnostics
    a.) This implies that if we imagine checkers dependencies as a directed graph where any given checker’s parent is the checkers it depends on, any given checker that emits diagnostics is a leaf and has either no parent, or it’s parent is a modeling checker, and the path leading to the root of the directed graph contains only modeling checkers. Since Checkers.inc contains every checker that is a dependency, assert()ing whether the ProgramPointTag associated with the report is a dependency checker (given that we have at least one testcase for each checker that emits a report) would enforce that any report is emitted with the correct tag.
    b.) The problem that this wouldn’t solve is modeling subcheckers, Making addTransition take the tag as a mandatory argument would help on this, but wouldn’t enforce it, and woud result in a significant code duplication. We define subcheckers as checkers that do not have their own checker objects (they are basically glorified checker options), and I don’t even have any ideas how we could catch these cases. There are two things that make this a tad bit better: For one, we don’t have a checker like that just yet. For another, this would only affect developers when they are inspecting the ExplodedGraph, and with some experience, they may be able to catch and fix this.
    c.) This would pretty much make it so that disabling a checker would mean suppressing it’s diagnostics, nothing else.

  2. Making all modeling checkers hidden
    a.) Hidden checkers are defined as development only checkers. Riding on this excuse, I envision no expectations from end users regarding this. We mentioned that maybe if a dependency checker gets disabled but if later in the invocation a dependent checker is enabled it should reenable the dependency, but I think there is no need to implement this functionality if we enforce this rule along with the first one. Anybody who disables a development-only checker should know what that means, and any non-developer advised to do so is probably doing it to get rid of a crash or similar bad behavior.
    b.) Modeling checkers really are implementation details, and this move would make sure that whatever the user enables/disables, the analyzer would be responsible for any undesired side effects.

  3. Allow checkers to depend on packages
    Does this need any justification? We should be able to depend on whether the sun is out or not, but definitely on packages.

  4. Create the hidden(!) coreModeling package, separate all the modeling from core to it, leaving core as a set of highly recommended, but no longer mandatory checkers. coreModeling would be a dependency of all pathsensitive checkers.
    a.) Remember how we need to remember to enable core in all test cases where at least a single pathsensitive checker is enabled? Well, from now on, not enabling core would be an even better idea, because we would have all the necessary modeling without the obstruction of core diagnostics. Also, one less thing to keep in mind!
    b.) “Make an interface easy to use correctly, and hard to use incorrectly” (Scott Meyers): This would be a drastic improvement in that department.
    c.) Don’t allow any checker to directly depend on a coreModeling checker. This would allow us to evaluate whether there are any pathsensitive checkers enabled (which we can only do after we loaded everything else into CheckerManager) and enable coreModeling according to that. This would of course increase the pressure on codeModeling: if an error in this package slips into a release, we’re screwed, though this has pretty much been the case ever since, my proposal only formalizes it a little better.

  5. Regard some operation modifying the ProgramState as modeling operations, and regard checkers doing modeling operations as modeling checkers.
    a.) This would both make sure that the ExplodedGraph would remain the same aside from the diagnostic reports regardless of what non-developer, non-alpha checkers are enabled. We could come up with other heuristics as well, such as regarding checkers that add regular range constraints to a value to be regarded as modeling checkers.
    b.) Regard the creation of sink nodes as modeling operations. I bet we could somehow combine the function to create regular sink nodes and sink error nodes to allow bug reports to be both sink nodes and not be regarded as modeling checkers.
    c.) Modifications of regular range constraints are also modeling operations.
    d.) Document this well, make adding new modeling operations simple, and easy to remember for reviewers.

What do you think? This is more of a checklist of things I’d like to see implemented accompanied with some of my design philosophies, rather than an order in which I’m planning to implement it.

We seem to agree on that an even more thought-out dependency system would be good, and that separating checkers into modeling and diagnostics portions would be ideal, so I’ll detail my proposal further:

I googled it, apparently “modelling” is British, and “modeling” is American, so for this mail I’m just gonna roll with American. :^)

  1. Don’t allow dependency checkers (and implicitly regard them as modeling checkers) and modeling checkers to emit diagnostics
    a.) This implies that if we imagine checkers dependencies as a directed graph where any given checker’s parent is the checkers it depends on, any given checker that emits diagnostics is a leaf and has either no parent, or it’s parent is a modeling checker, and the path leading to the root of the directed graph contains only modeling checkers. Since Checkers.inc contains every checker that is a dependency, assert()ing whether the ProgramPointTag associated with the report is a dependency checker (given that we have at least one testcase for each checker that emits a report) would enforce that any report is emitted with the correct tag.
    b.) The problem that this wouldn’t solve is modeling subcheckers, Making addTransition take the tag as a mandatory argument would help on this, but wouldn’t enforce it, and woud result in a significant code duplication. We define subcheckers as checkers that do not have their own checker objects (they are basically glorified checker options), and I don’t even have any ideas how we could catch these cases. There are two things that make this a tad bit better: For one, we don’t have a checker like that just yet. For another, this would only affect developers when they are inspecting the ExplodedGraph, and with some experience, they may be able to catch and fix this.

Realistically, this is such a low level issue, I don’t think it’s worth wasting development time on.

c.) This would pretty much make it so that disabling a checker would mean suppressing it’s diagnostics, nothing else.

  1. Making all modeling checkers hidden
    a.) Hidden checkers are defined as development only checkers. Riding on this excuse, I envision no expectations from end users regarding this. We mentioned that maybe if a dependency checker gets disabled but if later in the invocation a dependent checker is enabled it should reenable the dependency, but I think there is no need to implement this functionality if we enforce this rule along with the first one. Anybody who disables a development-only checker should know what that means, and any non-developer advised to do so is probably doing it to get rid of a crash or similar bad behavior.
    b.) Modeling checkers really are implementation details, and this move would make sure that whatever the user enables/disables, the analyzer would be responsible for any undesired side effects.

  2. Allow checkers to depend on packages
    Does this need any justification? We should be able to depend on whether the sun is out or not, but definitely on packages.

  3. Create the hidden(!) coreModeling package, separate all the modeling from core to it, leaving core as a set of highly recommended, but no longer mandatory checkers. coreModeling would be a dependency of all pathsensitive checkers.
    a.) Remember how we need to remember to enable core in all test cases where at least a single pathsensitive checker is enabled? Well, from now on, not enabling core would be an even better idea, because we would have all the necessary modeling without the obstruction of core diagnostics. Also, one less thing to keep in mind!
    b.) “Make an interface easy to use correctly, and hard to use incorrectly” (Scott Meyers): This would be a drastic improvement in that department.
    c.) Don’t allow any checker to directly depend on a coreModeling checker. This would allow us to evaluate whether there are any pathsensitive checkers enabled (which we can only do after we loaded everything else into CheckerManager) and enable coreModeling according to that. This would of course increase the pressure on codeModeling: if an error in this package slips into a release, we’re screwed, though this has pretty much been the case ever since, my proposal only formalizes it a little better.

  4. Regard some operation modifying the ProgramState as modeling operations, and regard checkers doing modeling operations as modeling checkers.
    a.) This would both make sure that the ExplodedGraph would remain the same aside from the diagnostic reports regardless of what non-developer, non-alpha checkers are enabled. We could come up with other heuristics as well, such as regarding checkers that add regular range constraints to a value to be regarded as modeling checkers.

Oops, meant to delete the second sentence add leave it as a separate point. Should proof read my emails better :slight_smile:

b.) Regard the creation of sink nodes as modeling operations. I bet we could somehow combine the function to create regular sink nodes and sink error nodes to allow bug reports to be both sink nodes and not be regarded as modeling checkers.
c.) Modifications of regular range constraints are also modeling operations.
d.) Document this well, make adding new modeling operations simple, and easy to remember for reviewers.

Let’s actually internalize this point, because know that I think about it, it rather talks about detecting whether a checker should reside in apiModeling, rather then simply categorizing checkers into modeling/diagnostic portions. Like, I plan to split UninitializedObjectChecker into a checker that emits the current default diagnostics, and turn the pointer chasing option into its own subchecker, while the actual logic would be moved to the hidden UninitializedObjectModeling checker. It’s clear that the modeling portion wouldn’t affect the rest of the analysis at all (hence we can disable it if both diagnostic checkers are also disabled), while a checker like DynamicMemoryModeling creates sinks and affects the analysis even when none of its subcheckers is enabled.

14.08.2019 1:12, Artem Dergachev via cfe-dev пишет:

Hi!

Our stance has long been that despite being visible to the user, core checkers shouldn't be enabled/disabled by hand, because they do important modeling that any pathsensitive checker may depend on, potentially causing incorrect reports and crashes. However, since then, a series of patches gave us the ability to express dependencies and hide modeling checkers.

The core doesn't only do modeling, however, it emits diagnostics as well, and these diagnostics may contain false positives, sometimes to the degree where getting rid of them is desirable, yet we explicitly state that it shouldn't be disabled. And this problem doesn't only affect the core itself: disabling any checker that emits diagnostics and also is a dependency of some another checker will disable dependent checkers, which isn't always the intent.

When I originally implemented the checker dependency system, my immediate goal was to fix a bug causing inconsistent checker names, but I firmly believe its time to make the it even more rigid:

* Don't allow dependency checkers to emit diagnostics. Since the list of these checkers can easily be retrieved through Checkers.td, assertions could be used to enforce this. This would solve the issue of forgetting to create CheckerProgramPointTags for subcheckers for good, and a helpful assertion message could guide checker developers about it.
* Make all dependency checkers hidden. If no dependent checkers are enabled, let the analyzer disable them. Disabling a non-hidden checker should only mean that the diagnostics from it are undesired.

Back when we were just discussing checker dependencies, i wasn't sure we need them. My point was that if we instead split all checkers into checkers that emit actual warnings but don't do modeling (and therefore don't need to depend on each other) and checkers that only do modeling but never do checking (and therefore never need to be turned off), our hierarchy of checkers becomes fairly flat and there's no need to write down dependencies at all.

With these first two bullets we get closer than ever to that solution, right?

Just some field notes. This design has a very important feature - reproducible analysis. If we had this split, the warnings were always the same independently on the checkers enabled or disabled. Unfortunately, without this feature, the analysis depends on what checkers can terminate the analysis path or add additional transitions making warnings of the same checker different if different checkers are enabled within it.

I very much agree with you, though this issue is a tough nut to crack. Say that my esoteric codebase manages to fool a checker that emits fatal error nodes, I may want to disable that checker altogether, diagnostics and modeling included, as it screws with the rest of the analysis as well. I lean on the side of not directly exposing modeling checkers to the users to tinker with, but unfortunately we’re far off from being so precise that we could confidently make this a developer only feature.

On the other hand, I only heard of such an issue with one of our internal checkers, that introduced practically infinite state splits, so I can’t confidently say that this is a common problem either. Do you guys have any experience with this?

23.08.2019 15:50, Kristóf Umann пишет:

    Just some field notes. This design has a very important feature -
    reproducible analysis. If we had this split, the warnings were always
    the same independently on the checkers enabled or disabled.
    Unfortunately, without this feature, the analysis depends on what
    checkers can terminate the analysis path or add additional
    transitions
    making warnings of the same checker different if different
    checkers are
    enabled within it.

I very much agree with you, though this issue is a tough nut to crack. Say that my esoteric codebase manages to fool a checker that emits fatal error nodes, I may want to disable that checker altogether, diagnostics and modeling included, as it screws with the rest of the analysis as well. I lean on the side of not directly exposing modeling checkers to the users to tinker with, but unfortunately we're far off from being so precise that we could confidently make this a developer only feature.

On the other hand, I only heard of such an issue with one of our internal checkers, that introduced practically infinite state splits, so I can't confidently say that this is a common problem either. Do you guys have any experience with this?

Yes, we often seen this problem when introducing a new path-sensitive checker or disabling some. This usually lead to some warnings of other checkers appear or disappear. The only solution we had is to fix the checker set we run analyzer with but it didn't work if a new checker is added.