[RFC] Emitting auditable SARIF logs from Clang

Motivation

Compiler warnings can identify potential security, reliability, and correctness problems in C and C++ source code. However, it can be challenging for a project to “get clean” and “stay clean” on a particular warning. There may be hundreds of existing instances of that warning in the codebase, and it could take weeks or months to fix them all. Meanwhile, developers are also writing new code, potentially introducing new instances of that warning. And throughout this process, security experts may want to keep an eye on the changes, to make sure that developers are only suppressing instances of a warning when it it truly safe to do so.

Scenarios

Auditing

Before expending the effort to fix a particular set of compiler warnings in a project, it is important to understand the current state of the project with respect to those warnings. If the project already enables the relevant warnings, then this is simply a matter of building the project and inspecting the diagnostics emitted in the build log. However, many projects enable only those warnings that they expect their developers to address immediately, so the build log will not contain any warnings that the project is not yet interested in.

In order for the project owners to observe the state of the project with respect to warnings that are not yet enabled, we propose extending the compiler to generate an “audit log”. The project owners will specify the set of warnings they are interested in auditing, and the compiler will emit a separate file listing all instances of those warnings found by the compiler, even if those warnings were disabled or suppressed by one of the existing diagnostic control mechanisms (e.g., -Wno-<id>, #pragma clang diagnostic ignore, etc.). The auditors can then scan this audit log to estimate how much effort it would be to adopt a particular compiler warning on this project, and how vulnerable the project might currently be to the defects detected by those warnings. The auditors can combine this data with other information about the project in making this decision. For example, security auditors might look for projects that are known to have a large attack surface area and have a significant number of “uninitialized variable” warnings, and focus their adoption efforts on such projects while saving lower-risk projects for later.

By making the audit log a separate file, the project’s developers continue to see the same set of warnings that were seeing before auditing was enabled, so their day-to-day workflow is not disrupted.

Adoption

Once the decision has been made to adopt a particular new set of compiler warnings for a project, the audit log can feed into additional tooling to help the project’s developers track their progress on addressing the new warnings. The project would enable the new warnings in its build configuration, but then suppress existing instances of those warnings as a “baseline”. Any new instances of the warning would show up in the build as usual, making it easy to stop new issues from being introduced while the developers are trying to clean up the existing ones. Meanwhile, the developers can work through the backlog of baseline warnings over time, using the audit log to verify that those previously-suppressed warnings are now fixed.

Requirements

To build tooling to help with this process, we need information about the warnings beyond what Clang currently provides. In particular, we need to know:

  • All of the warnings that are present in a particular build of a project, including those warnings that were suppressed via a #pragma clang diagnostic ignored "...", a warning suppression mapping, or other such mechanism. This lets the project’s developers track which of the warning instances that they marked as “baseline” have been fixed, and which remain in the codebase. It also helps auditors track the overall state of those warnings in the project, and lets them review the decision to suppress particular warning instances, either at PR time, or at a later point.
  • For each warning that is suppressed, the location of the suppression itself. This makes it easier to tell how the warning was suppressed, especially when the suppression is located far away from the code where the warning was reported.
  • The set of warnings that were checked for in each translation unit (TU), regardless of whether any instances of a particular warning were actually emitted for that TU. This lets us ensure that the right warnings are enabled in the first place. A compilation that produces no warnings about unsafe API usage has two very different meanings depending on whether that warning was enabled or not.

Clang has most of the above information available internally, or can readily compute it. We just need a way to emit it so that a warning management system can consume it later.

This RFC proposes emitting this information using the Static Analysis Results Interchange Format (SARIF). SARIF is a JSON-based standard for representing the results of static analysis tools, including information to support auditing of those results.

Usage

To generate a SARIF audit log, the user specifies the new --sarif-log option, plus zero or more instances of the new -Waudit=id option, where id is the name of a warning group (the same as for the existing -Wid option).

clang -c file.cpp -Waudit=uninitialized -Waudit=deprecated-declarations -Wuninitialized --sarif-log

This will instruct the compiler to generate a SARIF log file that contains all instances of the warnings specified by -Waudit=id, including instances whose diagnostic level was ignored. In addition, a SARIF rule object will be emitted for every warning specified by -Waudit=id, regardless of whether any instances of that warning were emitted.

Note that -Waudit=id does not actually change the initial level of the specified warning. Thus, in the above command line, only the uninitialized warnings are initially at level warning; the deprecated-declarations warnings are initially at level ignored, although even those ignored instances will be emitted to the audit log (with suppression information).

The default path for the audit log is the path of the object file with its extension replaced with .sarif. (e.g., file.sarif). This path can be changed by specifying the new --sarif-log=path option.

Proposed Changes

New command-line settings

--[no-]sarif-log

This switch enables (or disables) generation of the SARIF log file. Default: disabled.

--sarif-log=path

This switch specifies the path of the SARIF log file. Default: the path of the object file, with its extension changed to .sarif. Specifying this switch implicitly enables --sarif-log.

-W[no-]audit=id

This switch marks the warnings in the specified group as “audited”. Audited warnings are emitted to the SARIF log with additional information described in a later section. Default: not audited. This switch does not affect the level of the specified warnings (i.e., -Waudit=uninitialized does not imply -Wuninitialized).

By design, there is no mechanism for changing the “audited” state of a diagnostic from within the source code, because that would defeat the point of auditing. While the user is of course free to disable auditing for a warning on the command line (either by -Wno-audit=id, or by just never specifying -Waudit=id in the first place), the SARIF log records which warnings were audited vs. not audited. Thus, the consumer of the SARIF log can audit the lack of auditing of warnings.

See Why do we need a new switch to enable auditing for a warning? below for more on why this switch is needed.

Warning IDs and Names

The SARIF format requires a stable string identifier for each warning. Clang’s warnings do not currently have user-visible IDs. We will add support for specifying a stable ID for each warning in the .td file entry for that warning. All IDs will be prefixed with clang., to distinguish them from warning IDs from the Clang static analyzer.

For warnings that do not yet specify a stable ID, we will use the in-source name of the warning (e.g., clang.warn_uninit_var). Should someone later decide to change the in-source name of the warning, they would have to explicitly set the stable ID of that warning to the old name.

Note that SARIF does provide a way for a rule to declare a list of “deprecated IDs”, so once we do assign a stable ID to a warning, we could also emit its original in-source ID in its deprecatedIds property.

Also, note that SARIF specifically suggests that a rule’s id property is not intended to convey information about the rule to the user. Thus, the existing in-source names of warnings might well be acceptable as a SARIF rule ID, despite not being particularly friendly.

The SARIF format also allows a rule to specify a name property with a more human-readable name. We will allow a Clang diagnostic to specify this name in its .td definition. If not specified in the .td, no name property will be emitted for that diagnostic.

The SARIF Log File

The SARIF log file serves as a machine-readable representation of the usual diagnostics output. This serves essentially the same purpose as the serialized diagnostics log (controlled by the --serialize-diagnostics switch), except that the output can be consumed by IDEs, viewers, and results management tools that understand SARIF.

Audited vs. Non-Audited Warning Differences

The SARIF information generated for a particular warning depends on whether that warning is audited (controlled via -Waudit=id) or not. The differences are as follows:

runs[0].results[]

For a non-audited warning, a result object will be emitted for each instance of that warning, unless that warning had a diagnostic level of ignored at the point in the code where the warning was discovered. This is the same set of warning instances that would be emitted to the console output. Note that a warning that is off by default and never enabled on the command line or via #pragma clang diagnostic will always have a level of ignored, and thus will not be emitted.

For an audited warning, a result object will generated for every instance of that warning, even if the warning’s level was ignored.

runs[0].results[].suppressions[]

For a non-audited warning the suppressions property of each result will be an empty array, since the warning would not have been emitted if it had been suppressed.

For an audited warning, if the warning’s diagnostic level was ignored at the point in the code where the warning was discovered, the suppressions array will contain a single element describing what caused that warning’s level to be ignored (e.g., the #pragma clang diagnostic ignored directive, the warning suppression map file, etc.). If the warning’s level was not ignored, the suppressions array will be empty.

runs[0].tool.driver.rules[]

For a non-audited warning, a rule object will be emitted only if at least one result for that rule is present in the log.

For an audited warning, a rule object will be emitted regardless of whether any results were emitted for that rule.

runs[0].invocations[0].ruleConifigurationOverrides[]

For a non-audited warning, no configurationOverride object will be emitted for that rule.

For an audited warning, a configurationOverride object will be emitted for that rule, with its parameters property containing a property named clang/audited with the value true. SARIF does not appear to have a way to distinguish rules for which all suppressed results are emitted from rules for which only unsuppressed results are emitted, so we’ll record that distinction here. We should consider working with the SARIF committee to add such a capability to a future version of the SARIF specification.

Rationale / Alternatives

Why SARIF?

The SARIF format can represent all of the necessary information, and Clang already has some limited support for emitting SARIF. The Clang static analyzer (via --analyzer-output=sarif) can already emit a SARIF log containing the warnings reported by the analyzer, including rich information about the relevant control flow paths that lead to the warning. The Clang compiler itself (via -fdiagnostics-format=sarif) also has the ability to emit a SARIF log, although this support is still unstable. It makes sense to extend this support rather than inventing a new, non-standard format.

Why do we need a new switch to enable auditing for a warning?

The two main requirements for the SARIF log are that it contain all instances of each audited warning, including suppressed instances, and that it record which warnings were audited, even if there are no instances of a particular warning. Without a way to explicitly specify which warnings are audited, we would have to determine the set of audited warnings another way. The following options were considered; we can revisit them if there’s a strong aversion to the -Waudit=id switch.

Option 1: All warnings are audited

If we decided to just enable auditing for all Clang warnings, we would be able to generate a SARIF log that meets the two main requirements. However, that log would be unacceptably large.

We expect auditing and policy enforcement to care about a relatively small number of specific warnings (e.g., security-relevant warnings). If we made all warnings audited, though, then the SARIF log would contain a (suppressed) result for every instance of every warning in the source file and all of its included headers. This would include pedantic warnings that nobody cares about, portability warnings that are irrelevant for a single-platform codebase, etc. There would likely be more of these uninteresting results than interesting results, at least doubling the size of the log without benefit.

Even if the number of uninteresting warning instances were manageable, we would still be emitting rule metadata for every Clang warning, regardless of whether it actually generated any warning instances. The metadata for these hundreds of diagnostics would take up a large amount of space on its own, again with no benefit.

Option 2: Only warnings that are enabled on the command line are audited

So why not just enable auditing for the warnings that are enabled on the command line via -Wid? The first problem here is that all of the existing diagnostics command-line options control the initial level of each warning, but the source code can change that level in either direction via #pragma clang diagnostic. If a warning is initially ignored, but later elevated to warning via a #pragma, do we consider it as audited? If so, we need to record every suppressed instance of every warning, so that we can emit those suppressed results if the warning does get elevated later on.

Note: The latest version of the Clang User’s Manual claims that “upgrading” a warning that was initially disabled on the command line is not possible “yet”, but empirically, this seems to be supported in current versions of Clang.

The second problem is that this approach provides no distinction between “not audited” and “audited but suppressed for this translation unit”. When trying to clean up all instances of a particular warning across a codebase, it is common to fix warnings file-by-file or subproject-by-subproject. When the developer is ready to fix the warnings in a particular file, they enable that warning just for that file. They then fix whatever warnings are reported in that file before moving on to the next file. The warning remains disabled in all of the files that the developer has not reached yet. We want the warnings from the not-yet-reached files to show up as suppressed results in the SARIF log, so that the develop can track what warning instances remain to be fixed.

Related Work

[RFC] Add a new text diagnostics format that supports nested diagnostics
The linked RFC makes some great improvements to the human-readable diagnostic output, but those should be orthogonal to the changes proposed here, which focus on machine-readable output.

[RFC] Hardening mode for the compiler
Part of the linked RFC proposes that hardening mode could enable additional warnings to go along with any other language and code generation changes. This is certainly helpful for projects trying to adopt more secure coding practices, which is the scenario targeted by the changes proposed here as well, but for now, the two RFCs appear orthogonal.

Future Work

Tracking Warnings over Time

One important capability for a warning management system is tracking warnings over time: spotting when new warnings get introduced, or when known warnings get fixed. This requires some way to tell whether two warnings, reported against different commits of the same repo, are “the same” warning. There are a few potential approaches to this problem:

  1. Add a unique ID to each suppressed warning in source code, via a comment or attribute. Advantages: moves with the code; visible to developers working on the code. Disadvantage: Unsightly, especially if there are a lot of suppressions, such as when a codebase first starts to adopt a particular warning and has to suppress all existing instances of that warning as a “baseline”.
  2. Compute a “fingerprint” of the relevant code. Advantages: Can be used without modifying source code, so it works well for “baselining” scenarios; we have such an algorithm in the Clang static analyzer already. Disadvantages: Less visible to developers; no fingerprinting algorithm is resilient to exactly the right set of changes.
  3. Tracking lines through Git history. Advantage: Can be used without modifying source code, so it works well for “baselining” scenarios. Disadvantages: Less visible to developers, requires access to Git commit history, still a bit of a research project.

Without more experience with how any of the above approaches work in practice, we’re not yet ready to propose a particular approach to integrate into Clang. We’d like to solicit other ideas from the community, and will share the results of our own experiments with the community.

Unification with Clang Static Analyzer

We would like to have the Clang Static Analyzer and the regular Clang compiler emit their warnings to the same SARIF log, with the same support for auditing. While the implementation of this RFC will attempt to reuse code between the compiler and analyzer where practical, unifying the two different sets of command line options, as well as the different suppression mechanisms, will be deferred to a future RFC.

CC

@AaronBallman @envp @steakhal @t-rasmud @dtarditi @usama

6 Likes

GCC is also picking up SARIF support in a few places already (recent talk on this).

2 Likes

Hi @dbartol, to me, this RFC is well motivated, and offers a very detailed plan. I like it.

  • There are already tools that offers audition for compiler warnings, like CodeChecker. Although, I understand that tools like that are outside of the llvm ecosystem and might not fit existing build pipelines, so in that sense it still makes sense to me to have this auditing capabilities built in Clang. Those tools could build on top of these audit logs in the future to simplify their implementation.

  • I didn’t know of --serialize-diagnostics before. Thanks for sharing that.

  • About “Tracking Warnings over Time”, I’m highly against putting unique IDs to each warning appearance. IDs rarely stay unique for long in the world of copy-paste or AI generated code. Not to mention the ergonomics.
    SARIF already has the notion of issue hashes, that (as you mentioned) is also generated by the Clang Static Analyzer too. CodeChecker has a couple different ways of generating. I’d recommend you having a look.
    Your 3rd option also looked interesting, using git history too. I think that also deserves to be investigated as a stretch goal, but probably for a first implementation I’d go with some static way of generating an issue hash.

  • “Unification with Clang Static Analyzer”: This point raised an interesting point in me. Unlike with usual compiler diags, CSA might move around warnings, introduce and make some go away much more often across major clang releases than regular compiler diags or clang tidy diags. The question is, how should this audit too being used when upgrading the clang host toolchain? What stability will the SARIF report offer?
    I guess, once a projects adopts warning auditing, they should either change code or the tooling but not both at the same time. So, as they upgrade they can simply create a new “baseline” snapshot just like they did for the first time. But even this, would need to preserve issue hashes across major clang versions.

Great idea! I’m very supportive of more work in managing clang diagnostics.

Some feedback:

  • Centering the design around SARIF is a wise choice.
  • Adding new output file is a wise choice as well. Build systems and other tools often parse compiler output, so keeping the textual compiler build log as-is is helpful if at all possible.
    • It might seem like a small API change, but allowing existing users of clang to keep their build logs as-is but add SARIF reports on top will be incredibly valuable to allowing tools like package managers, build systems, IDEs, CI tooling, etc., to add value without disrupting existing users.
  • On the other high-level design choices about whether to use this proposal, audit all warnings, or only audit traditional -W warnings that were specifically enumerated, I like the choices in this proposal.
  • Users like it when GCC and Clang to have analogous flag spellings, so spelling the sarif log flag -fdiagnostics-add-output=some_file.sarif instead of --sarif-log and --sarif-log would be a nice-to-have.
  • Assuming -Waudit=everything works as one would expect, the -Waudit flag sounds great.
  • I completely agree with a new switch to enable the auditing for the new warnings. Long term, maybe we can come up with some in-project cxx.toml or somesuch to better model what we use existing -W flags for. In other words, make C++ project configuration work more like other language ecosystems. But that’s likely out of scope for this RFC.
  • I agree that canonical IDs for clang warnings are needed and should be in scope for this design proposal. Can someone explain the attractiveness of using “in-source name” as part of the identifier in contrast to the warnings’ names according to -W flags? I would expect it to be a breaking bug to dramatically change the ways those are spelled, so they could function relatively well as an alternative choice.
    • If the in-source names are used as identifiers, will it be easy for users to understand how they should alter their flags and/or in-source annotations if they want to suppress a warning?
  • The audited vs. non-audited warning results in SARIF basically make sense to me.
    • Question: If I had warnings flags like --sarif-log -Waudit=all (assume no other warnings flags like -Wall, etc.), what sort of message would we expect in the suppression annotation for a typical -Wall result like clang.misleading-indentation? Would it note the suppression is traced to compiler defaults? Or would it not explain a precise source for the suppression?
  • SARIF does not appear to have a way to distinguish rules for which all suppressed results are emitted from rules for which only unsuppressed results are emitted

    • I could see having two different result sections I guess? Possibly clang could describe itself as having two drivers, one of them being the audit mechanism. Though the design as proposed seems good enough for me, though I haven’t actually tried to integrate this proposed design with anything else yet.

Overall, I think this looks great. Let me know if there’s anything I can do to help.

IMO we should just develop more ergonomic and portable ways to suppress diagnostics inline in code, in the file, in the repo, etc. If we have those and people still want to track an instance of a warning across changes to the repo, maybe we revisit. But I’m skeptical we can come up with something that a compiler can implement that reasonably tracks a warning across arbitrary changes to source code – file names, regex-based renames, code reformatting, etc. Inline suppressions reasonably would survive those sorts of changes.

I’m thinking something like:

[[deprecated("Call sayHello() instead.")]]
void sayHello_legacy() {
  // Suppression Note: We will delete the below call
  // when we delete sayHello_legacy.
  [[suppress("clang.deprecated","gcc.deprecated")]]
  sayHello();
}

The above would be analogous to clang-tidy NOLINT comments, though something that would survive a preprocessor pass. Having nice ways to apply suppressions to entire files and blocks of code would be needed as well.

Note that the C++ standards folks are already very interested in granular and standard ways to enable and disable diagnostics as part of the “profiles” proposals. There’s a decent chance that mechanisms like these will end up needing implemented anyway. See “C++ Profiles: The Framework” (P3589) for something that’s very similar.

Thank you for this RFC! Overall, I’m in support of the general idea.

FWIW, we already have the ability to emit diagnostics to SARIF via -fdiagnostics-format=sarif; I worry that --sarif-log will confuse users into thinking it relates to the default diagnostic output format. I wonder if it makes more sense to name it --sarif-audit-log instead or maybe even --diagnostic-audit-log?

We do, though, don’t we? -Wuninitialized is a user-visible ID. Or do you need better granularity for when one warning group emits several distinct diagnostics?

Another situation that comes to mind is that we still (unfortunately) have a number of places where we emit a custom diagnostic rather than one in a .td file. Do you have ideas on how to support those? Like: llvm-project/clang/lib/CodeGen/CGOpenMPRuntime.cpp at fedbe384519115b25b193db2882b18b6bf253eaa · llvm/llvm-project · GitHub (note, we should not be creating custom diagnostics ever, so all of these uses should eventually migrate to emitting a regular diagnostic through the usual means.)

Having the flags match GCC seems reasonable, but note that GCC’s -fdiagnostics-add-output flag is a little complicated: It supports a sequence of key-value pairs to control the details, like output file path, SARIF version, optional SARIF contents, etc. For example: -fdiagnostics-add-output=sarif:version=2.1,file=./results.sarif. Does syntax like that fit well into Clang’s command-line parsing infrastructure? That approach allows an arbitrary number of different SARIF logs with different settings, which doesn’t sound bad, but also seems a bit extravagant if we have to do a lot of implementation work to make it happen. It would just be command-line parsing implementation, though, since our diagnostics infrastructure already supports multiple sinks.

Even ff the GCC approach doesn’t fit, we still need to think about providing similar controls for SARIF details. That could just mean more switches (--sarif-log-version=2.1, --[no-]sarif-log-include-suppressions, etc. And, related to what @AaronBallman mentioned in a different comment, we’d probably need parallel switches for controlling the stderr SARIF output from -fdiagnostics-format=sarif, which kind of makes the GCC approach look more attractive.

I expect that we would emit a SARIF suppression object something like this:

"suppressions": [
  {
    "kind": "external",
    "location": {
      "message": "Disabled by default"
    }
  }
]   

Note that a SARIF location object with no actual file/line/column is legal SARIF, although unusual.

I haven’t actually tried to integrate the proposed design with anything else yet either:) I’d like to start with what I’ve proposed, but as I try to integrate it into a real workflow over the next few months, I’ll learn more about how well it actually works. The “two drivers” approach could be worth exploring if the proposed design doesn’t work in practice.

We have user-visible names for warning groups (-Wuninitialized), but not for individual warnings within a group (warn_uninit_var vs. warn_uninit_self_reference_in_init). For SARIF, and warning management in general, the big question is whether we should distinguish between individual warnings within a group. This would affect several different aspects:

SARIF representation

Each SARIF result object is associated with a specific ruleId. The tool can provide all sorts of metadata about that rule in a reportingDescriptor object. If that metadata needs to be different for different warnings within a group, we need a unique ruleId for each warning. Examples of metadata that are likely to be different: description, fullDescription, help, helpUri.

Warning Evolution

When managing warnings and suppressions over time, we need to consider how the set of warnings and their organization change across compiler versions. If an individual warning can move to a different warning group, we need a way to match existing instances of that warning to the new instances reported via the new group. Having a stable ID for the individual warning means that the ruleId doesn’t change at all when the warning moves to a different group. If we only use group names, we can’t tell the difference between the warnings that were moved to the new group and the warnings that remained in the old group.

Auditing

When an auditor (for example, your security team) looks at the set of warning instances being tracked in your project, it’s helpful for them to break that set of warning instances down by the actual conditions that caused that warning instance to be emitted. Each warning within a warning group is reported using different analysis and/or different heuristics. Some of those may be more prone to false positives that others. Some may represent more serious, or more exploitable, vulnerabilities. Filtering or prioritizing based on individual warnings within a group can help the auditors focus their remediation efforts more efficiently.

Feedback to Tool Authors

As an author of static analysis checkers and compiler warnings, I want to know which warnings are finding the most real-world issues, which warnings are rarely firing, and which warnings are being suppressed as false positives. By monitoring SARIF results from official builds, PRs, and developer desktop builds, I can focus my effort on improving the warnings that need it, but I need to know which specific warnings those are.

Suppression and Configuration

I’m least worried about stable IDs for individual warnings for the two scenarios where we already use warning groups: suppression and command-line configuration. While eventually I’d like to be able to suppress or configure individual warnings in source code via a stable ID, I don’t think that’s strictly necessary to achieve the sort of warning management workflow I’m working towards. Even if we did extend the -W switches and #pragma clang diagnostic to understand individual warning IDs, there would be no reason to remove or deprecate the existing group-based mechanism.

Also, any SARIF results should specify the warning’s group ID in the message, just like the console text diagnostic emitter does.

Yes, this is probably the most important insight into managing warnings over time!

We have to accept that any new toolchain will bring changes to warnings along with it. It may introduce entirely new warnings or warning groups, remove existing warnings, and, most frustratingly, it may change the heuristics and analysis such that new instances of existing warnings are reported in previously clean code.

Note that this problem isn’t limited to compiler updates. Suppose your project updates to a new version of an SDK, or a new version of a package. If that new SDK or package marks one of its symbols as [[deprecated]], the consuming project will see new instances of the “use of deprecated declaration” warning.

I’ll split the problem into two parts:

Changes in results

This covers the case where the new version of the tool changes the analysis and/or heuristics for an existing warning, such that the set of results emitted for that warning change.

Disappearing results

If a warning was previously reported on a particular piece of code, but the new tool no longer reports that instance of the warning, then this is equivalent to the project itself “fixing” that instance of the warning. The warning is gone, and no longer needs to be tracked. Hooray!

New results

If the new version of the tool starts reporting a warning on code that previously did not have an instance of that warning, then this is equivalent to the project itself introducing a new warning at that location. It’s now up to the project to eventually fix or suppress that warning. The warning management system (outside of the scope of this PR) would add the new result to the project’s backlog to be addressed later.

Relocated results

Suppose the new version of the tool changes the implementation of a warning to report at a different location in the source code. For example, a memory leak warning might have originally reported the leak at the point of allocation, but a new version of the tool changes to report the leak at the last use of the pointer to the allocation. Without additional help from the tool, the warning management will see this a “fix” for the warning at the original location, plus a new instance of that warning at the new location. SARIF can help in two ways here:

First, because SARIF provides a way to specify “related locations” and even control-flow paths, it can make it easier for the warning to add new interesting locations without changing the “primary” location of the warning at all. For example, the memory leak warning could still be reported at the point of allocation, but with a related location at the point of last use.

Second, if the warning implementation provides custom partialFingerprints for each result, it could leave the partial fingerprint the same (based on the allocation itself) while changing the reporting location to the point of last use. The warning management tool would take the partial fingerprints into account when matching warnings across a change.

Warning Refactoring

This covers when the tool introduces or removes whole classes of warnings, splits a warning into mulitiple warnings, or merges multiple warnings into one warning.

New Warnings

If the project cares about this new class of warnings, this can be handled the same way as new instances of an existing warning: the new instances just be put on the backlog. If the project doesn’t care about this new class of warnings, it just disables the warning, or never enables auditing for it in the first place.

Removed Warnings

These are all treated as “fixed”, although the auditing configuration might have to be updated to not try to audit the no-longer-existent warning.

Split Warnings

If the tool splits an existing warning into two or more separate warnings for different cases, SARIF’s deprecatedIds property can help. For example, if warning “A” were split into warning “B” and warning “C”, then the SARIF metadata for both “B” and “C” would set the deprecatedIds property to [ “A” ]. The warning management tool saw an instance of warning “B” that otherwise matched a previous instance of warning “A”, the deprecatedIds property would tell the warning management tool to treat that instance of “B” as if it were the original instance of “A”.

Merged Warnings

If the tool merges two or more separate warnings into a single warning, the deprecatedIds property can help there, too. For example, if warnings “X” and “Y” were merged into warning “Z”, the metadata for “Z” would set its deprecatedIds property to [ “X”, “Y” ], so that a new instance of “Z” would match a previously-tracked instance of either “X” or “Y”.