[RFC] Restructure Clang's diagnostic objects to allow for nested diagnostics

cjdb · April 27, 2023, 6:27pm

Introduction

Our tooling emits diagnostics on-the-fly, which makes it extremely difficult (if not impossible) to move the toolchain towards the proposed new diagnostic model. In the context of Clang, this means that it’s: difficult for users to see where one diagnostic (or part thereof) stops and another begins; impossible for SARIF diagnostics to engage in progressive disclosure; and occasionally interleaves completely separate diagnostics by accident. This proposal explores a solution resolving all three problems.

Acknowledgements

A huge thanks to @denik, @dblaikie, @AaronBallman, and @zygoloid for reviewing this proposal ahead of time.

Motivation

There are two kinds of diagnostics: top-level diagnostics that identify a problem in the source (errors, warnings, and remarks), and nested diagnostics that augment top-level diagnostics (presently just notes and fix-it hints). Clang currently does not acknowledge this hierarchy, and treats everything as a top-level diagnostic, which is really simple: the compiler gathers the relevant information and then passes it on to the user. Clang has largely gotten away with this, but a truly robust diagnostics model needs to structure the representation so that it is clear where one diagnostic stops and the next one starts. This will impact both the unstructured text diagnostics that we emit today, and the resulting diagnostics from the structured model that we are implementing over time.

Improve the structure without changing it

Consider the following program that’s ill-formed because it doesn’t have an appropriate selection for std::ranges::begin.

#include <ranges>

struct S {
  void begin();
  void end();
};

int main()
{
    S s;
    std::ranges::begin(s);
}

Clang currently produces the diagnostics as a simple listing, without regard for context. It is difficult to see how notes correspond to one another: the second note is a property of the candidate template ignored above it, but it can easily be mistaken as a property of the error that it sits under. Diagnostics with recursive contexts can improve the readability of the diagnostic, making it easier to identify bits of the context that matter to the reader. For example, the notes nested under the second tend to become noise as one becomes familiar with ranges, and it will be easier to mentally filter out anything nested under __member_begin over time, than it is when they’re all aligned together.

The text presented doesn’t necessarily mean that we would output this for the unstructured text model: indenting text that’s intended for the console is likely to make matters worse than present, although we could emit something fixed that at least tells the reader what “level” they’re reading. The more interesting case is the support for SARIF. The current SARIF specification doesn’t allow us to nest notes and fix-it hints inside one another, and so diagnostics are rendered similarly to the above by a diagnostic consumer (although they are collapsible in VS Code). There’s currently a SARIF work item to permit diagnostics to form a hierarchy so that it is possible for notes to only be presented as a part of progressive disclosure, instead of always being present.

Diagnostics will never appear out-of-order

Although the grouping of diagnostics mostly benefits the structuring as described above, Clang’s emit-and-forget model causes some diagnostics to appear out-of-order, and thus confuse readers. Richard Smith notes that this category of bugs tends to be fixed fairly quickly after discovery, but that there are a few unsolvable ones lingering. One such example is that the AST deserialisation mechanism can emit errors, and can be invoked at any point, including between a diagnostic and its notes. This can be somewhat frustrating for users, who will probably (and understandably) treat the second note as a bogus part of the second error’s context. Solving this problem simultaneously gives users more context in similar cases (since all the relevant notes fall into the correct position) and eliminates notes that are seemingly unrelated.

Design

Since the current design assumes that all diagnostics have the same hierarchy, we currently have a single type for all diagnostic identifiers, and doesn’t impact how we process anything: we just do something along the lines of Diag(Location, diag::Code) and get back the canonical error type. This won’t suffice moving forward, since Diag(Location, ErrorCode) needs to distinguish between top-level diagnostics and nested diagnostics at compile-time. To achieve this, we need to change what TableGen emits: there needs to be at least two categories of diagnostics: top_level_diag::Code for errors, warnings, remarks; and nested_diag::Code for everything else (currently just notes). Splitting these further, into error::Code, warn::Code, etc. may have other benefits, and we should evaluate that before settling on a dichotomy. If possible, we may like to consider privately deriving NestedDiag and TopLevelDiag from PartialDiagnostic, to reduce the amount of repeated design.

From here, we should be able to adapt existing diagnostics without impacting terribly much. For example, consider this snippet from Sema::BuildCXXNestedNameSpecifier:

Current

Diag(IdInfo.IdentifierLoc, diag::err_expected_class_or_namespace)
    << IdInfo.Identifier << getLangOpts().CPlusPlus;
if (NamedDecl *ND = Found.getAsSingle<NamedDecl>())
  Diag(ND->getLocation(), diag::note_entity_declared_at)
      << IdInfo.Identifier;

After implementing this proposal’s changes

TopLevelDiag Error =
   Diag(IdInfo.IdentifierLoc, top_level_diag::err_expected_class_or_namespace)
    << IdInfo.Identifier << getLangOpts().CPlusPlus;
if (NamedDecl *ND = Found.getAsSingle<NamedDecl>())
  Error.AddNote(ND->getLocation(), nested_diag::entity_declared_at)
    << IdInfo.Identifier;

In the second snippet, we create a TopLevelDiag object to represent the error, instead of a SemaDiagBuilder. The two types are largely the same, except that TopLevelDiag stores NestedDiag objects. NestedDiag is a logically identical type, but doesn’t have name equivalence with TopLevelDiag, so that we aren’t able to nest top-level diagnostics.

The difference between DiagBuilder and SemaDiagBuilder is that SemaDiagBuilder takes semantic information such as SFINAE into account, and uses this to produce context notes (or similar). Since this information only needs to be stored once, we can refactor the compiler so that this can be held by TopLevelDiag; NestedDiag should hold a pointer to this information and access it as necessary. D141451 identifies that some Sema diagnostics may need to cross the boundary into the optimiser in order to provide the richest experience. This may mean that the emission model needs to change so that we buffer all diagnostics until compilation completes, so that additional information can be added as we go (here’s an example of how the transition may look). Assert handlers ought to be aware of this so that they dump everything before crashing.

Deployment

The design put forward makes it possible for us to progressively add context to all of Clang’s diagnostics, meaning that any contextless diagnostics are unaffected by this change. In simple situations like the example presented in this paper, where the context is immediately available, we intend to write a Clang Tidy pass that will automate the deployment of this design. Aaron Ballman estimates that around three-quarters of the diagnostics will be cleaned up this way, and the remainder will need to be done by hand. Identifying which ones are left will be easy enough, as we can lean on making NestedDiag nodiscard. Once partially deployed, Clang will need to adopt a policy requiring contributors to pass notes to error objects, so that we don’t end up with an ever-growing backlog of things to triage. We also consider it reasonable for reviewers to ask contributors to adapt diagnostics as they make contributions to those diagnostics (e.g. someone is correcting a typo), which helps to speed up the transition process; however, reviewers shouldn’t be blocking on this, as the conversions might be more involved than adapting Diag(Loc, EC) to Error.AddNote(Loc, NC).

Xazax-hun · April 27, 2023, 8:11pm

I like the direction; I have a couple of questions about the implications.

Does this approach change how we should continue after the first error? Consider the following code snippet:

void g() {
    Foo f;
    Foo::staticMethod();
}

Clang will emit the following diagnostics:

<source>:2:5: error: unknown type name 'Foo'
    Foo f;
    ^
<source>:3:5: error: use of undeclared identifier 'Foo'
    Foo::staticMethod();
    ^
2 errors generated.

These two errors have the same root cause and I think Clang could potentially figure this out while parsing. Should the second error be nested under the first one? Or should we have one new top level error with both nested under them?

What should be the guidelines about nesting for other tools like the Clang Static Analyzer?

Consider the following snippet:

void deref(int *p) { *p = 42; }

void foo() {
    deref(nullptr);
}

Running CSA (--analyze --analyzer-output text) will produce the following output:

<source>:1:25: warning: Dereference of null pointer (loaded from variable 'p') [core.NullDereference]
void deref(int *p) { *p = 42; }
                        ^
<source>:4:11: note: Passing null pointer value via 1st parameter 'p'
    deref(nullptr);
          ^~~~~~~
<source>:4:5: note: Calling 'deref'
    deref(nullptr);
    ^~~~~~~~~~~~~~
<source>:1:25: note: Dereference of null pointer (loaded from variable 'p')
void deref(int *p) { *p = 42; }
                      ~ ^
1 warning generated.

What should be the rules for nesting? Should we have all the notes on the same level? Should we nest when we enter a new function call? Should we nest when we enter a new syntactical structure like a branch or loop?

Summoning @NoQ, just in case he has some opinions.

cjdb · April 27, 2023, 9:17pm

These are both errors, so they’re distinct top-level diagnostics for now. Unifying them might be neat, but that would be separate work.

That would come down to how the tools choose to emit the diagnostics. Just restructuring the diagnostics internally won’t actually change how they’re output, so if a project really doesn’t want to change their output (and have their own emitter), they should be unaffected.

A nested note is a note that pertains to a prior note. In the above example, it looks like the nesting would be

warning
  note 1
    note 2
  note 3

It looks like note 3 is the same as the warning, so I don’t actually know where to put it without examining the code.

nickdesaulniers · April 28, 2023, 9:26pm

The suggestion to have notes refer back to the corresponding and relevant warning or error diagnostic LGTM.

bcardosolopes · April 28, 2023, 11:34pm

Sounds like overall goodness for diagnostic emission, excited to see improvements in this direction.

This is pretty cool.

cor3ntin · April 29, 2023, 8:27am

This sounds like an excellent idea!
There is somewhat precedent for that with fixit, which are always attached back to a diagnostic.

AaronBallman · May 5, 2023, 7:08pm

I like the direction this is going, thank you for suggesting it!

Idle question: do you think we could make a precommit CI check that looks for new code introducing diag::note_ so that precommit CI fails if the user isn’t creating a nested diagnostic for it? (It’s important that this would not trigger on existing code so as not to block patches with unrelated changes, only changed lines in a patch.) That would be a huge help to reviewers with catching the problem.

Agreed – if it’s easy to convert it over when modifying that code, we should just convert it over rather than hope it’s done in a follow-up patch. I’m fine leaving this to reviewer discretion.

cjdb · May 5, 2023, 7:21pm

Do you have a vague idea of how this would work? Best I can come up with is having a script that checks the note is in both the added and removed lines of the diff (and maybe a cross-reference in case it was renamed).

erichkeane · May 5, 2023, 7:34pm

You might consider creating a TableGen’ed “new” type for notes, perhaps renaming all the old ones LegacyNote (rather than Note). Then just grep the diff for an added LegacyNote.

THEN, make it so that Diag will NOT accept the id of this NewNote type (perhaps just named Note here) by making it its own type (not a DiagID), and only being ‘accepted’ by this error object for addition?

AaronBallman · May 6, 2023, 12:58pm

Or the added lines of the diff (could be brand new code instead of modified code). IIRC, we have some similar logic for clang-format related failures so that we don’t bark about surrounding code with formatting issues.

Topic		Replies	Views
RFC: Improving Clang’s Diagnostics Clang Frontend	44	6961	April 28, 2023
Diagnostic Improvements Clang Frontend	9	107	November 18, 2008
Splitting the diagnostic text field into a summary field and a reason field Clang Frontend clang	2	413	December 9, 2022
[clang-tidy] Highlight "custom range" in diagnostic of check Clang Frontend	1	75	October 23, 2019
RFC: c++ diagnostics Clang Frontend	2	130	April 5, 2010