[analyzer] Alpha checker statuses

I wanted to give more visibility to the discussion on the status of different Static Analyzer "alpha" (unfinished) checkers that's going on on Phabricator (⚙ D57858 [analyzer] Add a new frontend flag to display all checker options). Story so far: We're trying to figure out how many of them can be finished with relatively little effort (and therefore we should probably ask around to gather feedback) and how many are clearly not ready for any sort of use and should be hidden until the most glaring bugs are fixed. For now we officially treat all alpha checkers as the latter - so don't use them!

This discussion is important because different people's codebases are ridiculously different, so it's almost impossible to estimate the quality and usefulness of static analysis unless as many varied codebases as possible are involved.

>>! In D57858#1500668, @Szelethus wrote:
> `IteratorChecker` is a prime example that still suffers from not ideal FP/TP ratio, but users at Ericsson value it plenty enough not to be bothered by it. Many of the changes here and in private directly address user bug reports (That's just one, but I do remember having others around too!).

Once it has visitor path notes about where did it get its iterators from, some of the iterator checks should definitely be considered for being turned on by default. Especially the mismatched iterator check that doesn't rely on hardcore constraint solving. The current upstream version is not in good shape though; i just tried it on LLVM and it currently crashes all over the place with "Symbol comparison must be a `SymIntExpr`" assertion (pls ask me for repros if they aren't obvious). Also it has false mismatched iterator positives on `A.insert(B.begin(), B.end())`.

>> In ⚙ D57858 [analyzer] Add a new frontend flag to display all checker options, @dkrupp wrote:
> These are the alpha checkers that we are testing in Ericsson:

Let me undig my last year's attempt to take a look at alpha checkers. The most common "limb" to "miss" in the alpha checkers is the "bug visitor" functionality that'd add checker-specific path notes to the report, which is almost inevitably necessary for any path-sensitive checker. Bug reports without path notes are hard to understand, but that's one thing that your users won't tell you: they often just don't have their good taste to realize that bug reports shouldn't be so hard to understand. The users often take it for granted that they have to figure out themselves where do these values come from, but it's still our job to not force them to.

> alpha.core.BoolAssignment

Yes, i agree that this one's pretty useful. It's currently missing a visitor that explains why does the analyzer think that the value is non-true/false, which is often necessary to understand the warning.

> alpha.core.CastSize

This one's indeed relatively quiet, but i'm seeing ~50 false positives caused by stuffing metadata at the beginning of a dynamically allocated buffer. I.e., allocate a buffer that's 4 bytes larger than necessary, use these 4 bytes for our own bookkeeping and provide a pointer to the rest of the buffer to be used for the actual value. I don't see an easy way to fix these false positives, so i don't see how to move this out of alpha.

> alpha.core.Conversion

Interestingly, i haven't seen this one trigger on our codebases. So i don't have an opinion here. There's a chance it might be a good opt-in check. Do you have an open-source project in mind on which this check is actually useful?

> alpha.core.DynamicTypeChecker

I really root for enabling this checker (and generally improving our dynamic type-checking), but for now i'm seeing ~600 false positives in projects that use custom RTTI (something like `dyn_cast`) and ~300 more Objective-C-specific false positives. I can take a look and try to reduce some of the custom RTTI ones if you're interested in figuring out how to fix them; i don't remember if they are easy to fix.

> alpha.core.SizeofPtr

This checker does indeed find interesting bugs sometimes, but i'm overwhelmed by ~300 false positives in which the sizeof of a pointer is taken intentionally. This is especially annoying when the pointer is hidden behind a typedef and the user doesn't need to know whether it's a pointer or not.

> alpha.core.TestAfterDivZero

I don't see any positives of this checker, but this checker is crazy and shouldn't have been done this way. It's a "must-problem" check and we don't have any sort of infrastructure to even display this kind of bug report properly after we find it, let alone to properly find it. We need a more-or-less full-featured data flow analysis engine before we make an attempt on such checker.

> alpha.cplusplus.DeleteWithNonVirtualDtor

I don't see any positives of this checker. Is it any better than the compiler warning that we have for all polymorphic classes that have no virtual destructors?

> alpha.security.MallocOverflow

This one's extremely loud for me (~1500 false positives). It looks as if it warns on every `malloc(x * sizeof(int))` (due to potential integer overflow during multiplication) so i just don't see it working as an AST-based check. We should probably rewrite it on top of taint analysis and then it'll need a constraint solver that knows how to multiply things.

Like, this is the point where i'd like to ask how does this happen that you don't see these false positives. Is this checker actually quiet for you? Or are your users addressing these warnings somehow?

> alpha.security.MmapWriteExec

I don't see any positives of this checker. It probably needs a visitor (which is trivial) and it definitely needs a solution for different values of macros on different platforms that'd be better than just setting them as a config flag.

> alpha.security.ReturnPtrRange

I don't see any positives of this checker. It definitely needs a visitor and we might as well join it with the generic array bound checker. Also need to figure out how to deal with, say, vector::end() which is supposed to return an out-of-bound pointer.

> alpha.security.taint.TaintPropagation

I believe that the generic taint engine is currently very solid, but specific checks that we have on top of it are very much unfinished: they simply don't react on any sort of validation that can be used to remove the taint. Normally that'll involve consulting the constraint manager (in case of tainted integers) or modeling validation routines (in case of more complicated objects).

> alpha.unix.BlockInCriticalSection

I have ~10 positives and i didn't have a look at them back then; might be good. This checker needs a visitor to explain why do we think we're in a critical section.

> alpha.unix.Chroot

I don't see any positives of this checker. This checker needs a visitor to highlight chroot.

> alpha.unix.PthreadLock

Uhm, i have a few patches on this checker that i never committed:
- ⚙ D37806 [analyzer] PthreadLock: Fix return values of XNU lock functions.
- ⚙ D37807 [analyzer] PthreadLock: Add the other XNU rwlock unlock functions.
- ⚙ D37809 [analyzer] PthreadLock: Refactor, use PostCall API. NFC.
- ⚙ D37812 [analyzer] PthreadLock: Escape the pointers.
- ⚙ D37963 [analyzer] PthreadLock: Don't track dead regions.
And it still needs a visitor. And support for more APIs, 'cause there are still false positives caused by unobvious POSIX APIs that release the mutex (sometimes conditionally). And once it's done, i'll be seeing no positives of this checker; it sounds like a good checker to have, but it doesn't seem to find mistakes that are *easy to make*.

> alpha.unix.SimpleStream
> alpha.unix.Stream

Those need, at least:
- A bug visitor.
- A suppress-on-sink behavior. Otherwise they warn on every assert between open and close (~200 false positives for me).
- Pointer escape support.
Also i vaguely remember that the non-simple checker is known to cause more state splits than necessary.

> alpha.unix.cstring.NotNullTerminated

Hmm, this check looks like a walking .addTransition() bug (unintended state split) when invoked from getCStringLength(). Also it doesn't seem to be disabled when the checker is disabled, so i guess we're kinda "using" it too. But it's also too quiet to matter, as it pretty much only warns when you're trying to compute a strlen() of a function pointer.

> alpha.unix.cstring.OutOfBounds

https://bugs.llvm.org/show_bug.cgi?id=41729 and it also needs a visitor for the index.

  • György Orbán

I wanted to give more visibility to the discussion on the status of
different Static Analyzer “alpha” (unfinished) checkers that’s going on
on Phabricator (https://reviews.llvm.org/D57858). Story so far: We’re
trying to figure out how many of them can be finished with relatively
little effort (and therefore we should probably ask around to gather
feedback) and how many are clearly not ready for any sort of use and
should be hidden until the most glaring bugs are fixed. For now we
officially treat all alpha checkers as the latter - so don’t use them!

Just to add to the summary: despite our “disapproval” of using alpha checkers, we always made them very visible, since if you wanted to list the available checkers, you inevitable came across them (and they are also lexicographically greater than the rest of them).

Due to the lack of branches in our SVN repository, we used the alpha package to make development incremental, which inevitable resulted in some checkers in there being unfinished and unstable, while others merely need some finishing touches, and could be used despite being rough around the patches.

The discussion came up in a patch that plans to expose checker options, that always existed but were never listable, unless you read the source code. However, just like alpha checkers, many of these also hide features under development, while other would genuinely be useful to fine tune the analyzer for a specific project.

This discussion is important because different people’s codebases are
ridiculously different, so it’s almost impossible to estimate the
quality and usefulness of static analysis unless as many varied
codebases as possible are involved.

! In D57858#1500668, @Szelethus wrote:
IteratorChecker is a prime example that still suffers from not
ideal FP/TP ratio, but users at Ericsson value it plenty enough not to
be bothered by it. Many of the changes here and in private directly
address user bug reports (That’s just one, but I do remember having
others around too!).

Once it has visitor path notes about where did it get its iterators
from, some of the iterator checks should definitely be considered for
being turned on by default. Especially the mismatched iterator check
that doesn’t rely on hardcore constraint solving. The current upstream
version is not in good shape though; i just tried it on LLVM and it
currently crashes all over the place with “Symbol comparison must be a
SymIntExpr” assertion (pls ask me for repros if they aren’t obvious).
Also it has false mismatched iterator positives on A.insert(B.begin(), B.end()).

In https://reviews.llvm.org/D57858#1501065, @dkrupp wrote:
These are the alpha checkers that we are testing in Ericsson:

Let me undig my last year’s attempt to take a look at alpha checkers.
The most common “limb” to “miss” in the alpha checkers is the “bug
visitor” functionality that’d add checker-specific path notes to the
report, which is almost inevitably necessary for any path-sensitive
checker. Bug reports without path notes are hard to understand, but
that’s one thing that your users won’t tell you: they often just don’t
have their good taste to realize that bug reports shouldn’t be so hard
to understand. The users often take it for granted that they have to
figure out themselves where do these values come from, but it’s still
our job to not force them to.

The fundamental problem here is, in my opinion, that “alpha” doesn’t describe many of these checkers too well. I think once a checker is stable and has a “reasonable” true positive/false positive ratio, we should move them out of alpha status: not only would we be able to gather invaluable feedback for these, but users might appreciate the feature even with a couple shortcomings. However, just because they are not falling apart on their own, these aren’t always production ready – how about we introduce a “beta” package?

Alpha checkers would be incomplete, incorrect and unstable by definition, and would be hidden from non-developers. Beta checkers would receive a disclaimer that they might emit too much false positives to be production ready and don’t emit ideal bug reports in terms of readability, but are considered stable and usable.

alpha.core.BoolAssignment

Yes, i agree that this one’s pretty useful. It’s currently missing a
visitor that explains why does the analyzer think that the value is
non-true/false, which is often necessary to understand the warning.

This would be an ideal candidate to be moved to a beta package.

alpha.core.CastSize

This one’s indeed relatively quiet, but i’m seeing ~50 false positives
caused by stuffing metadata at the beginning of a dynamically allocated
buffer. I.e., allocate a buffer that’s 4 bytes larger than necessary,
use these 4 bytes for our own bookkeeping and provide a pointer to the
rest of the buffer to be used for the actual value. I don’t see an easy
way to fix these false positives, so i don’t see how to move this out of
alpha.

alpha.core.Conversion

Interestingly, i haven’t seen this one trigger on our codebases. So i
don’t have an opinion here. There’s a chance it might be a good opt-in
check. Do you have an open-source project in mind on which this check is
actually useful?

alpha.core.DynamicTypeChecker

I really root for enabling this checker (and generally improving our
dynamic type-checking), but for now i’m seeing ~600 false positives in
projects that use custom RTTI (something like dyn_cast) and ~300 more
Objective-C-specific false positives. I can take a look and try to
reduce some of the custom RTTI ones if you’re interested in figuring out
how to fix them; i don’t remember if they are easy to fix.

alpha.core.SizeofPtr

This checker does indeed find interesting bugs sometimes, but i’m
overwhelmed by ~300 false positives in which the sizeof of a pointer is
taken intentionally. This is especially annoying when the pointer is
hidden behind a typedef and the user doesn’t need to know whether it’s a
pointer or not.

alpha.core.TestAfterDivZero

I don’t see any positives of this checker, but this checker is crazy and
shouldn’t have been done this way. It’s a “must-problem” check and we
don’t have any sort of infrastructure to even display this kind of bug
report properly after we find it, let alone to properly find it. We need
a more-or-less full-featured data flow analysis engine before we make an
attempt on such checker.

alpha.cplusplus.DeleteWithNonVirtualDtor

I don’t see any positives of this checker. Is it any better than the
compiler warning that we have for all polymorphic classes that have no
virtual destructors?

Maybe this one too.

How about this:

-analyzer-checker-help: Displays production ready (non-modeling) checkers, and beta checkers with a disclaimer for each description. Don’t forget to patch clang-tidy!
-analyzer-checker-help-developer: Displays only developer (modeling, debug) checkers (renamed from *-hidden).
-analyzer-checker-help-alpha: Displays only alpha (non-modeling) checkers with very scary disclaimers both around the list and for each checker description.

Note how each of these is mutually exclusive. I think we shouldn’t make a flag that displays all checkers, but should allow the following invocation to do so:

clang -cc1 -analyzer-checker-help -analyzer-checker-help-alpha -analyzer-checker-help-hidden

  • György Orbán

I wanted to give more visibility to the discussion on the status of
different Static Analyzer “alpha” (unfinished) checkers that’s going on
on Phabricator (https://reviews.llvm.org/D57858). Story so far: We’re
trying to figure out how many of them can be finished with relatively
little effort (and therefore we should probably ask around to gather
feedback) and how many are clearly not ready for any sort of use and
should be hidden until the most glaring bugs are fixed. For now we
officially treat all alpha checkers as the latter - so don’t use them!

Just to add to the summary: despite our “disapproval” of using alpha checkers, we always made them very visible, since if you wanted to list the available checkers, you inevitable came across them (and they are also lexicographically greater than the rest of them).

Due to the lack of branches in our SVN repository, we used the alpha package to make development incremental, which inevitable resulted in some checkers in there being unfinished and unstable, while others merely need some finishing touches, and could be used despite being rough around the patches.

rough around the edges*

Just wanted to also point out that help is very welcome with implementing the missing functionality in the checkers that we’ve been discussing here.

(more replies inline)

This makes sense to me and i think this is how it should have always been done. I’ll be comfy to move it straight to core as long as someone stuffs a bugreporter::trackExpressionValue() into it and we both test it and see no obvious FP patterns specific to this checker. Unless we do this, i probably won’t be comfy putting it into beta :slight_smile: Dunno, why duplicate a compiler warning that’s already on by default and more aggressive than the checker?

I implemented what we discussed here in the following manner: Each checker or checker option must either be in:

  • Alpha: Unfinished, users are strongly discouraged from using it.
  • Beta: Stable but might produce more false positives and the emitted reports might lack proper explanation
  • Released: Production ready, very few false positives with easy to understand bug reports.

Any checker or checker option in addition can be marked as hidden, meaning that it’s a developer feature.

Just wanted to also point out that help is very welcome with implementing the missing functionality in the checkers that we’ve been discussing here.

(more replies inline)

How about this:

-analyzer-checker-help: Displays production ready (non-modeling) checkers, and beta checkers with a disclaimer for each description. Don’t forget to patch clang-tidy!
-analyzer-checker-help-developer: Displays only developer (modeling, debug) checkers (renamed from *-hidden).
-analyzer-checker-help-alpha: Displays only alpha (non-modeling) checkers with very scary disclaimers both around the list and for each checker description.

Note how each of these is mutually exclusive. I think we shouldn’t make a flag that displays all checkers, but should allow the following invocation to do so:

clang -cc1 -analyzer-checker-help -analyzer-checker-help-alpha -analyzer-checker-help-hidden

This makes sense to me and i think this is how it should have always been done.

Note that hidden checkers may only show up in -analyzer-checker-help-developer. Options for implicit checkers are NOT implicitly hidden, but options for alpha checkers are implicitly marked alpha.

Mmm, i still don’t understand what’s the purpose of the beta package.

I can understand the idea of “here’s a new checker, we addressed all issues we know about but there may be other issues that we might have missed, so please try it out if you’re into this sort of stuff”. For now such checkers are usually outright turned on by default.

But you’re proposing to put checkers into beta when they have known classes of false positives that are relatively easy to fix. So it’s not for gathering feedback (we already have enough) and there’s no plan on addressing feedback rapidly either (because if you have the capability to address feedback rapidly, there’s no reason to start with feedback not addressed). It sounds as if we’ll still bounce back all the bug reports against beta checkers with “hey, this is beta, we know it’s broken, right?”. If that’s the case, then they’re not really different from alpha checkers and i wouldn’t be comfortable providing them to the users.

Mmm, i still don’t understand what’s the purpose of the beta package.

I can understand the idea of “here’s a new checker, we addressed all issues we know about but there may be other issues that we might have missed, so please try it out if you’re into this sort of stuff”. For now such checkers are usually outright turned on by default.

But you’re proposing to put checkers into beta when they have known classes of false positives that are relatively easy to fix. So it’s not for gathering feedback (we already have enough) and there’s no plan on addressing feedback rapidly either (because if you have the capability to address feedback rapidly, there’s no reason to start with feedback not addressed). It sounds as if we’ll still bounce back all the bug reports against beta checkers with “hey, this is beta, we know it’s broken, right?”. If that’s the case, then they’re not really different from alpha checkers and i wouldn’t be comfortable providing them to the users.

Hmm. I guess you do have a valid point here. My initial thoughts were to move alpha checkers we are actually using and found useful (which in itself imo is a good reason) to beta in order to display them under -analyzer-checker-help, and hide the ugliest stuff deep in the depths of -analyzer-checker-help-alpha. As you said very nicely, “let clang be the single source of truth”.

At the end of the day, we could just make CodeChecker gather the list of checkers by using both -analyzer-checker-help and -analyzer-checker-help-alpha, which would include the really nasty stuff as well, but we’ve managed it so far.

Should I abandon https://reviews.llvm.org/D62092 and remove the beta category in https://reviews.llvm.org/D62094, would be fine with me closing this project for good?

Sounds good! Let’s delay this until we have a clear vision of what should this package be about.

Hello,

I think that a typical beta checker could be the iterator checkers. They are stable, but some of them produces tons of false positives which are very difficult to fix. Enabling an alpha can crash your whole analysis on some of your files but there is no such danger for the beta. They are only very noisy but you can filter your results after running them. So you do not loose other findings because of them, but on some projects their results are of no real use. These checkers may or may not be enabled by default, but the users are not discouraged to using them.

Regards,

Ádám

Hi,

Yes, maybe this extra beta category would create another checker grouping confusion.

I think even alpha checkers should not crash (at least not on well known open source projects. Llvm clang, postgresql, openssl etc.).

So probably alpha checkers should not be accepted until they can be driven to crash easily. And more or less this is the current practice too.

What would be nice to have is the listing possibility of the checkers in this grouping:

  • be able to list stable checkers only

  • alpha checkers only (which may be giving some FPs or returning unreasonable explanations)

  • be able to list checkers that are meant for developers only: debug checkers, modeling checkers

Some users would like to use immature checkers to find some more true positives at the cost of receiving more FPs or hard to understand reports.

What we could do is to add Artem’s notes on the specific checkers to the checker documentation as limitation.

Then users would know what they can and what they should not expect from an alpha. And it would be a hint for developers too what can be improved.

I will start a patch for this…

Regards,

Daniel