Splitting the diagnostic text field into a summary field and a reason field

While implementing the infrastructure changes for the proposed Clang diagnostics improvements, it was discovered that I was conflating multiple issues that can be independently implemented. As articulated in the improvements thread, the diagnostics that Clang currently issues are terse. We currently aim to balance getting the necessary information across and to not bog the user down in text that they don’t need. This is a product of having unstructured text that’s output to a terminal, but structured diagnostics need not have this model. Structured diagnostics that are consumed by a second tool to render the diagnostics for humans, and can provide an interface that progressively discloses information at a user’s behest.

Progressive disclosure within the context of a single diagnostic requires identifying what information should be presented first, and what information may sometimes need a bit of effort to reveal. Progressive disclosure isn’t really something that tooling can work out on its own (unless AI is involved, which isn’t an equitable expectation). The current Clang model requires us to put the entire diagnostic in a single string. By splitting up the string into a statement that summarises the problem, and a string that can potentially give more background, we can provide tooling with a way to present relevant information at appropriate times.

Design

The following design is intended to be backwards-compatible with the current diagnostic model. We intend to add a second (optional) configurable field to the diagnostic interface that allows us to provide a reason. The following is an example of what that might look like.

Before

class Diagnostic<string text, DiagClass DC, Severity defaultmapping> {
  ...
  string Text = text;
}
...
class Error<string str> : Diagnostic<str, CLASS_ERROR, SEV_Error>, SFINAEFailure {
 bit ShowInSystemHeader = 1;
}

After

class DiagReason<string value> {
  string Value = value;
}
def NO_REASON_YET : DiagReason<"">;
 
class Diagnostic<string summary, DiagClass DC, Severity defaultmapping, DiagReason reason> {
  ...
  string Summary = summary;
  ...
  DiagReason Reason = reason;
}
...
 
class Error<string str, DiagReason reason = NO_REASON_YET> : Diagnostic<str, CLASS_ERROR, SEV_Error, reason>, SFINAEFailure {
 bit ShowInSystemHeader = 1;
}
...

This will make it possible for diagnostics to have a reason, but doesn’t require tens of thousands of reasons to immediately manifest for this design to be viable by defaulting them to the empty string. When writing a diagnostic, it will be good for developers to assume that the reason will be displayed, because when it’s present, it will be. To be fully backwards-compatible, we intend for diagnostics that are split over the summary field and the reason field to be concatenated in the form summary + ": " + reason for the command line. This will make it possible for diagnostics to be presented as they already are (barring rephrasing). Finally, there won’t be any changes to the way parameters are consumed: we’ll still be using %0, %select, etc., in both the summary field and the reason field.

1 Like

Looping in @njames93, @NoQ, @ymand for awareness. I think there was a desire in the past to make the diagnostics experience more uniform across the tools like Clang, Clang Tidy, and the Clang Static Analyzer, so I wonder if we want to make some changes in Tidy and CSA as well.

@cjdb could you add some examples to this proposal how this reason field would be used? I am specifically interested in what information would we add to the Reason field instead of emitting an additional note.

In general, I agree with the direction of making warning messages more structured internally, I just want to avoid confusions about when to emit notes when to add a reason etc. Preferably, we should have clear guidelines on these.

@cjdb could you add some examples to this proposal how this reason field would be used? I am specifically interested in what information would we add to the Reason field instead of emitting an additional note.

In general, I agree with the direction of making warning messages more structured internally, I just want to avoid confusions about when to emit notes when to add a reason etc. Preferably, we should have clear guidelines on these.

tl;dr I don’t think there’s a clear boundary between the two, and it will take time to produce advice. For now, we should continue to write notes, as the new diagnostic model is far off into the horizon. A summary should be considered the headline of the diagnostic, such as candidate template ignored; and the reason should be what follows that, such as expected a type, got a class template.

Ideally, when in structured output mode, notes would become a part of the error/warning/remark. Although it conflates a few ideas, the web-based section of my original proposal demos the distinction between a summary, a reason, and context (notes are a form of context). If you’re interested in the phrasing component, please head over to Let’s get Clang diagnostics translatable! for that discussion.

SARIF doesn’t yet support the model I’d like to see, but here’s what it would roughly look like as an arbitrary JSON object:

{
  "summary": "we couldn't instantiate `llvm::sort(Locations)`",
  "source_location": {
    "path": "/tmp/llvm-project/clang/lib/Lex.cpp",
    "line": 512,
    "column": 81
  },
  "reason": "`llvm::sort` tried to add `std::list<SourceLocation>::iterator` with an `int `, but we can't find an appropriate `operator+` overload.",
  "snippet": "...",
  "notes": [
    // contains overload candidates, template instantiation (if necessary)
  ]
}

There are various things about phrasing and diagnostic emission order here that make the above example substantially different to the status quo, and those are off-topic here. What is interesting in this discussion is that the summary explains what is wrong (we couldn’t instantiate a template), and the reason augments the summary by briefly explaining why compilation failed (the template does something our parameter doesn’t support). Notes are then used to provide additional, probably more detailed, insight into the problem.