Clang Static Analyzer: False Positive Suppression Support

Xazax-hun · August 18, 2016, 8:17am

Hi!

In the CodeChecker tool (https://github.com/Ericsson/codechecker) we support suppressing false positive bugs as a post processing step. After all the issues were reported, CodeChecker filters out the suppressed ones, and those will not be displayed to the user.

In our experience, however, and external tool only suppression suppport is not sufficient for certian reasons:

When a checker generates a sink node during analysis, the rest of the path will not be covered by the Static Analyzer. Unfortunately, if the user suppress the bug in an external tool, the coverage will not get better. This way false positive results can hide true positive results regardless of suppression.
The compiler could do better job diagnosing ill formed suppressions (invalid source range, typo in checker name etc).
Tools that are developed on top of the compiler need not to introduce their own customized solution for suppression.

It is beneficial to have a suppression format that is standard across all clang tools. So the same format could be used to:

Suppress clang warnings
Suppress clang static analyzer warnings
Suppress clang tidy warnings

There are two main approaches to suppress warnings that I can think of right now:

Suppress using comments (or pragma):

This is a good solution for code evolution. The comments are moving together with the code. Edits are less likely to invalidate comments.
This is an intrusive solution, the source code needs to be changed to suppress an issue. This is both good, because the suppressions are version controlled and bad, because suppressing in 3rd party libraries might be problematic.
For path sensitive checks, it is not always self evident at which point of the path should a suppression comment be added.

Suppress using hashes:

It is hard to come up with a reliable hash to identify bugs. There always might be corner cases when the hashes of different bugs collide or the hash of a bug changes due to an edit that in fact should not affect the bug.
It is non intrusive.
The user do not need to think about at which point should the suppression comment be inserted.

I would suggest the suppress comment road.

The syntax could be something like:

// clang suppress warning-or-checker-name [optional line offset][optional column range] [comment]

The warning-or-checker-name could be a regex, so multiple checks can be suppressed at the same time.

Column ranges are useful when multiple errors are reported for the same line.

Line offsets are useful when one do not want to break the flow of a multi line code snippet with a comment, so this way warnings could be suppressed without too much negative effects on the readability.

Comment is for documentation purposes, the user can document why does she think that this is a false positive.

What do you think?

Regards,

Gábor

Ben_Craig · August 18, 2016, 2:20pm

My company has an implementation of comment suppression that we haven’t tried to upstream yet. It doesn’t address all the issues you mentioned though.

The general comment format is something like: // clang_sa_ignore [deadcode.DeadStores, cplusplus] plus whatever other text the user wants

The comment must be on the same line as the last element in the path. The line offset that you proposed seems like a reasonable extension.

The comment only suppresses the diagnostic. This happens well after all the path sensitive nodes have been created. Stated differently, the comment suppression does not help the analyzer uncover new problems.

One thing to note is that it isn’t necessarily obvious which checker generated a warning, particularly when you only have the console output to work with. So we changed our analyzer to also emit the category of any static analyzer warnings.

I’ll start some internal conversations and see if we can start getting some of those patches upstreamed.

alexfh · August 18, 2016, 5:24pm

In clang-tidy we’re trying to provide natural ways to mute diagnostics (e.g. warnings related to implicit casts can be suppressed by adding the corresponding explicit cast). It’s the preferred way, but it’s not always possible, so additionally clang-tidy supports // NOLINT suppression comments without a way to silence specific diagnostics/checks. // NOLINT mutes all diagnostics on the same line.

Cpplint, where we took the idea from, additionally supports // NOLINT(category) that allows targeted suppression of diagnostics, and // NOLINTNEXTLINE (with an optional (category) as well), which mutes diagnostics on the following line.

NOLINT was a reasonable solution for clang-tidy, since many cpplint checks are implemented in clang-tidy as well and suppressions (where needed by both tools) should not be repeated with a different syntax.

I’m not particularly interested in significantly improving suppression mechanisms, since in most cases it’s better to either improve the analysis or modify the code. The frequency of suppressions should stay low to avoid warning blindness and letting the code with suppression comments.

For compiler diagnostics I’m against introducing a comment-based suppression system at all. It’s not the compiler’s business to analyze comments.

Aditya_K · August 18, 2016, 7:00pm

My company has an implementation of comment suppression that we haven’t tried to upstream yet. It doesn’t address all the issues you mentioned though.
The general comment format is something like: // clang_sa_ignore [deadcode.DeadStores, cplusplus] plus whatever other text the user wants
The comment must be on the same line as the last element in the path. The line offset that you proposed seems like a reasonable extension.

The comment only suppresses the diagnostic. This happens well after all the path sensitive nodes have been created. Stated differently, the comment suppression does not help the analyzer uncover new problems.
One thing to note is that it isn’t necessarily obvious which checker generated a warning, particularly when you only have the console output to work with.
So we changed our analyzer to also emit the category of any static analyzer warnings.
I’ll start some internal conversations and see if we can start getting some of those patches upstreamed.

Hi Ben,
It will be great to get that set of patches upstream. I wrote them a couple of years ago. Once they are upstreamed, I’ll get a chance to improve it further. For example, having hash-based implementation is also useful in some cases
where the project is unwilling to go the comment-based (intrusive) route.

Thanks,
-Aditya

p23_power · August 18, 2016, 8:41pm

Hi,

Aditya_K · August 19, 2016, 3:34pm

I had the same confusion while implementing the support for suppressing warnings because different teams would have different preferences. So I decided to implement both.

My implementation would take a YAML file which was generated by the clang static analyzer itself.

There are several advantages of hashing-based method:

It helps in tracking the history of reports by tracking just one file.
That YAML/JSON file can also be used by any IDEs to display a graphical view of control flow that led to the bug.
The suppression mechanism can be external to clang (if one choses to do so).
The hashing based method is easier to implement and maintainable both for clang source and the projects using it.

The comment based approach is intrusive to the project and binds the project to clang static analyzer tool in that sense. It is difficult to maintain that functionality in the sense it relies heavily on the format of error report generated by the clang static analyzer. Changing the error category or, moving the checker to another class would require a lot of change in the project. The one big advantage is that this approach enforces very active maintenance of the project.

I think it will be good to have both kinds of support so that projects would have more flexibility.

Thanks,
-Aditya

Xazax-hun · August 22, 2016, 8:28am

Hi!

It is great that there are several implementations lurking around. It is even better that some people are willing to upstream them. I would be happy to review them as well. I think other features like not letting suppressed bugs degrade the coverage can be added later on.

AnnaZaks · August 24, 2016, 11:05pm

While I agree with Alexander that it is best if users change their code to teach the analysis about it or make their intent more obvious by adding asserts or casts, unfortunately, not all static analyzer warnings could be suppressed that way. We do see requests for better suppression mechanisms come up from time to time.

The main in-source suppression mechanism that the clang static analyzer supports is the ‘clang_analyzer’ macro. I am not sure if you tried using it or not… It might be more difficult to use than other in-code suppressions, might not be as esthetically appealing, and is definitely not something that could be used by other tools. However, this is what is recommended right now: http://clang-analyzer.llvm.org/faq.html#exclude_code, so I am curious what are the main limitations of it that you are seeing.

The static analyzer already has support for suppression hashes in tree. The CodeChecker tool (https://github.com/Ericsson/codechecker) is using them to provide user workflows such as baselining and issue suppression. The Ericsson team has sent an email to this list about their plans on adding the tool to the llvm codebase, so please, take a look at that thread (http://lists.llvm.org/pipermail/cfe-dev/2016-April/048538.html) and comment if you are interested! I hope they will start that process soon.

With regards to suppression using comments or pragmas, even though they are not an ideal way of suppressing path-sensitive errors coming from the analyzer, there seems to be a high demand for this feature, so I think it would be a valuable addition. However, there are caveats here that I’d like to raise:

I really dislike using the checker names in the suppressions or in user code. The are not user visible or discoverable, as someone has already mentioned, but moreover they were not designed to be user-friendly. We did not think about them as API, and unfortunately, they leaked out. (For one, the package names and how the packages are split up is not very good.) Also, there is no reason why we should suppress on checker granularity and not on a bug type granularity. I would suggest exploring the route of giving user-friendly and meaningful names to bug types and possibly using those when we need to specify the type of a bug to suppress. On the other hand, suppressing reports of a certain type is an icing on the cake and I think it should be done as a second step after we agree on how the generic suppression should work.
The second problem is related to suppressing path-sensitive checks at the line of the report. Unfortunately, Gabor did not go into details about how the suppressions would actually work, so I’ll try to fill in some gaps here. The simplest solution is to change the diagnostic engine not to show the report if it occurs on the line that is suppressed. That will not work very well because often the cause of the path-sensitive issue is not at the line where the bug is reported. Consider a leak example where the analyzer triggers a report of a leak on different paths when the object becomes dead. If the user wanted to suppress such a leak, they’d potentially have to suppress it in many places - along each path where the object is leaked. A potential solution here is to use a “uniquing location” of the report, which could either be the location of the report or something else, such as the allocation site. Once the suppressed report is generated, we’d store that uniquing location and suppress all other reports with the same uniquing location.
I am not convinced that analyzing after a suppressed bug is a desirable feature. It might result in unexpected analysis results or cascading warnings. Having this might be similar to not having sinks at all. We can experiment with relaxing the sinks in some cases, but much more investigation is needed here.

Thanks for pushing this forward!
Anna.

p23_power · August 24, 2016, 11:56pm

Hi Anna,

devincoughlin · August 25, 2016, 1:54am

Hi!

In the CodeChecker tool (https://github.com/Ericsson/codechecker) we support suppressing false positive bugs as a post processing step. After all the issues were reported, CodeChecker filters out the suppressed ones, and those will not be displayed to the user.

In our experience, however, and external tool only suppression suppport is not sufficient for certian reasons:

When a checker generates a sink node during analysis, the rest of the path will not be covered by the Static Analyzer. Unfortunately, if the user suppress the bug in an external tool, the coverage will not get better. This way false positive results can hide true positive results regardless of suppression.

I’m a bit worried that in many cases suppressing one issue along a path and removing it a sink would simply cause another false positive along that path. I’m not convinced that the increase in coverage — along a path that we already know has a false positive — is worth the risk of requiring multiple suppressions.

The compiler could do better job diagnosing ill formed suppressions (invalid source range, typo in checker name etc).

Tools that are developed on top of the compiler need not to introduce their own customized solution for suppression.

It is beneficial to have a suppression format that is standard across all clang tools. So the same format could be used to:

Suppress clang warnings

Suppress clang static analyzer warnings

Suppress clang tidy warnings

There are two main approaches to suppress warnings that I can think of right now:

Suppress using comments (or pragma):

I think pragmas are definitely the way to go for in-source suppression (as compared to comments). They offer a nice way to specify a code region in which issues can be suppressed and are in a format that the compiler can easily understand. They could be wrapped in macros to provide a nicer interface if needed. Unlike tools that live outside of clang, if we stick the suppression in clang we don’t need to resort to comments for compatibility.

While in general I think it is better to add assertions to teach the analyzer about invariants rather than explicitly suppressing diagnostics, there are some cases where assertions don’t help:

Warnings about class/struct declarations (for example, Ben Craig’s PaddingChecker or the ObjC -dealloc checker). These declarations even be #ifdef’ed out when running under the analyzer because other code relies on them! Pragmas seem like the right choice here.
Leaks and other typestate-like issues. There is usually no way to assert “I already freed this thing”. Pragmas could help here. Another possibility is to add some kind of analyzer-specific function that when called means “Suppress any diagnostic involving the value passed to this call”.

The syntax could be something like:

// clang suppress warning-or-checker-name [optional line offset][optional column range] [comment]

I think it is probably better to do this on a per-diagnostic basis rather than a per-checker basis. This would let the user suppress the kind of bug and not be concerned with which particular checker happens to emit the diagnostic. While the checker names are already in some sense API (they are included in build scripts), exposing them in this was does increase the API surface.

Devin

devincoughlin · August 25, 2016, 5:31pm

Hi Anna,

I have found the text output to be not very helpful in understanding typical static analyzer reports — and in some cases actively harmful since an unhelpful report is more likely to be interpreted as a false positive. I view the text output as primarily a mechanism for testing the analyzer.

I also think a good goal is to eventually get the static analyzer itself out of the business of generating HTML. It seems to me that decisions about the user workflow of how to present/filter/navigate issues are much better implemented outside of clang itself. I agree that it is important to be able to run the analyzer and view reports without setup — but I think we can achieve this without sticking the UI logic in the compiler.

I think the biggest benefit of using a yaml/json file is for issue baselining rather than suppression and I would support incorporating this functionality in clang for that purpose. It is important that the format be easily diffable and stored in version control — i.e. not sqlite database.

In my view, the key different between baselining and suppression is that suppressions typically have lifecycle comments associated when them. These comments often include:

A justification for the suppression. This is typically an explanation of why the diagnostic is a false positive.
A criterion for when the suppression can be removed. This is often in the form of a bug filed against the static analyzer to fix the false positive or a bug filed against the codebase under analysis to follow proper coding conventions (e.g., for memory management and ownership).

Suppression lifecycle comments are important so that when the code in question changes someone can evaluate whether the suppression is still needed and make sure it is not hiding a true positive in the changed codebase. These comments can either be represented in the code itself (in which case the external yaml/json suppressions file would presumably not be needed) or viewed and stored by some external tool like CodeChecker (in which case this suppression/filtering logic is probably better implemented there).

Devin

Xazax-hun · August 29, 2016, 2:59pm

Hi!

I will try to summarize the discussion, let me know if I forgot something.

So far the consensus is that, people are not convinced that false positive suppression is useful for compiler warnings or clang tidy. For both the warnings and clang tidy checks the false positives can usually be suppressed by changing the code (often in a natural way). In case it is not possible to do so, or the warning makes no sense, one should report a bug. (And clang tidy already has a way to suppress results using comments, however that is not general enough for the static analyzer.) However most of us thinks that it is useful to have false positive suppression for the clang static analyzer. The reason is that, there are cases, where code can not be changed in a way that suppresses the warning. One can not use asserts to assert that something is freed. And one can not use the clang_analyzer macro to suppress some of the warnings.

There are some false positive suppression code for the static analyzer available privately and the might be open sourced.

There are two main approaches for suppression.

Hashes:

Good for 3rd party code or when teams share the same codebase and have different preferences regarding how to handle suppressions
It is better to store it in json/yaml/textual format, so it can be committed to source control systems
The user do not need to think about how to suppress an issue

Comments or pragmas:

For path sensitive checks either the uniqueing location must be suppressed or multiple locations which is noisy. In the former case the user might need to read the documentation to be able to properly suppress an issue
Using pragmas one can introduce convenience macros, it might also be better to suppress code regions
It is better to suppress based on bug type or something like that rather than using the name of the checker (which is not intended to leak out)
It is good idea to have line offsets to avoid too much noise when reading code
Anything against pragmas?

In both cases:

It is crucial to think about suppression lifetime management, so make it easy to review false positives later on (e.g.: a flag to report a note when a suppression comment or hash is unused? also, it is crucial to be able to add comments to suppressions, why is it a false positive, what is the criterion to remove the suppression)
We should not mix baselining with supression, we should think about them as separate problems.
Suppression inside clang increases the compatibility between 3rd party tools
In the long run clang might no longer provide HTML output, it might be generated by a 3rd party tool (or another tool in the repository). It might not be crucial to feed suppressions back to text output.

What about continuing the analysis after a sink node was suppressed?

It is more likely to find more false positives on such paths.
In some cases however (if the checker fails to understand the domain specific logic, but the analyzer still has a good understanding of the code), it might make sense. So maybe it might make sense to continue the analysis, when a non-core check result was suppressed, and turn off that check for the rest of the path?
In the former case maybe the better alternative to make the check not to generate a sink.

Some of us thinks that it might make sense to implement both hash and pragma or comment based suppression.

Regards,

Gábor

AnnaZaks · August 29, 2016, 4:49pm

Hi!

I will try to summarize the discussion,

Thanks!

let me know if I forgot something.

So far the consensus is that, people are not convinced that false positive suppression is useful for compiler warnings or clang tidy. For both the warnings and clang tidy checks the false positives can usually be suppressed by changing the code (often in a natural way). In case it is not possible to do so, or the warning makes no sense, one should report a bug.

Compiler warnings can already be suppressed with pragmas.

(And clang tidy already has a way to suppress results using comments, however that is not general enough for the static analyzer.) However most of us thinks that it is useful to have false positive suppression for the clang static analyzer. The reason is that, there are cases, where code can not be changed in a way that suppresses the warning. One can not use asserts to assert that something is freed. And one can not use the clang_analyzer macro to suppress some of the warnings.

There are some false positive suppression code for the static analyzer available privately and the might be open sourced.

There are two main approaches for suppression.

Hashes:

Good for 3rd party code or when teams share the same codebase and have different preferences regarding how to handle suppressions

It is better to store it in json/yaml/textual format, so it can be committed to source control systems

The user do not need to think about how to suppress an issue

Comments or pragmas:

For path sensitive checks either the uniqueing location must be suppressed or multiple locations which is noisy. In the former case the user might need to read the documentation to be able to properly suppress an issue

I believe using the uniquing location can be done automatically in the analyzer as it can be inferred from the regular location.

Using pragmas one can introduce convenience macros, it might also be better to suppress code regions

It is better to suppress based on bug type or something like that rather than using the name of the checker (which is not intended to leak out)

It is good idea to have line offsets to avoid too much noise when reading code

Anything against pragmas?

Another reason pragmas are better than comments is because this is what the compiler uses for suppressions.

Ben_Craig · August 29, 2016, 5:50pm

I’m not opposed to pragmas in concept, but I don’t like the typical way they are used for compiler warning suppression. In particular, I strongly dislike the following pattern as the “default” pattern for warning suppression:

#if clang_analyzer
#pragma static analyzer push
#pragma static analyzer disable deadcode
#endif
// bad code here
#if clang_analyzer
#pragma static analyzer pop
#endif

Things I don’t like about that pattern:

verbose
requires modifying code before and after the offending line of code
usually requires compiler specific guards.
if you forget the trailer, code still works, but you lose diagnostics
if you don’t know the push / pop pattern, you end up losing diagnostics

I’m fine with that pattern existing in addition to a much more terse and friendly syntax. I can see the push / pop pattern being useful for blocks of code. Just don’t make that the default.

Here’s a pattern that I’d prefer:
// Common header…
#ifdef clang_analyzer
#define CLANG_SUPPRESS_WARNING(category, line_offset) _Pragma(category, line_offset)
#else
#define CLANG_SUPPRESS_WARNING(category, line_offset)
#endif
// Source file…
CLANG_SUPPRESS_WARNING(deadcode, +1)
// bad code here

devincoughlin · August 29, 2016, 6:30pm

Using pragmas one can introduce convenience macros, it might also be better to suppress code regions

It is better to suppress based on bug type or something like that rather than using the name of the checker (which is not intended to leak out)

It is good idea to have line offsets to avoid too much noise when reading code

Anything against pragmas?

Another reason pragmas are better than comments is because this is what the compiler uses for suppressions.

I’m not opposed to pragmas in concept, but I don’t like the typical way they are used for compiler warning suppression. In particular, I strongly dislike the following pattern as the “default” pattern for warning suppression:

#if clang_analyzer
#pragma static analyzer push
#pragma static analyzer disable deadcode
#endif
// bad code here
#if clang_analyzer
#pragma static analyzer pop
#endif

Things I don’t like about that pattern:

verbose

I think a certain amount of verbosity is preferable here to steer users away from using it unless it is actually needed. This is a suppression of last resort.

requires modifying code before and after the offending line of code

usually requires compiler specific guards.

if you forget the trailer, code still works, but you lose diagnostics

if you don’t know the push / pop pattern, you end up losing diagnostics

We can always diagnose this. Well-named macros would also mitigate the problem:

CLANG_ANALYZER_SUPPRESSION_BEGIN()
CLANG_ANALYZER_SUPPRESSION_END()

I’m fine with that pattern existing in addition to a much more terse and friendly syntax. I can see the push / pop pattern being useful for blocks of code. Just don’t make that the default.

Here’s a pattern that I’d prefer:
// Common header…
#ifdef clang_analyzer
#define CLANG_SUPPRESS_WARNING(category, line_offset) _Pragma(category, line_offset)
#else
#define CLANG_SUPPRESS_WARNING(category, line_offset)
#endif
// Source file…
CLANG_SUPPRESS_WARNING(deadcode, +1)
// bad code here

Any offsets that aren’t +/-1 strike me as very fragile in the face of code evolution. They would make it very easy to have bad merges that lose the suppression. Also: what do you envision the meaning of “line” being within a macro expansion? Would this make it impossible to suppress diagnostics in an expansion itself?

Devin

Topic		Replies	Views
Going forward with [[clang::suppress]] Static Analyzer	8	598	December 12, 2023
Reporting false positives detected by Clang static analyzer Clang Frontend	6	102	April 13, 2016
[Analyzer] File based support for SA suppression Static Analyzer	0	108	October 24, 2017
Static Analyzer (annotations, path sensitive analysis) Clang Frontend	2	87	May 31, 2013
Help in suppressing warnings from static analyzer Clang Frontend	2	127	September 19, 2017

Clang Static Analyzer: False Positive Suppression Support

Related topics