Abstract
Clang diagnostics are alright. That is: they get the job done more often than not, but leave a lot
of room for improvement, particularly for non-experts. This document details how one could improve
Clang so that its diagnostics cater to a much broader audience of programmers.
Summary
We propose to change the way in which diagnostics are presented to users, so that they might have better user-experience:
- Add a new Clang diagnostic engine that emits machine-parsable Static Analysis Results Interchange Format (SARIF) in lieu of unstructured text.
- Pave a way for new, clearer, diagnostic messages to be added.
- Teases tooling that can consume machine-parsable diagnostics for better user experience.
Motivation
A Clang diagnostic is usually just a phrase that matter-of-factly states a problem in the code as output to a console. Clang’s current philosophy on diagnostics is to value terseness and impersonal language. The claim is that by aiming to keep diagnostics under 80 columns in length, terminal wrapping is avoided and forces one to think about the important point. The first premise is observably false: the author of this document routinely has short diagnostics wrap when fonts are extra large. As we’ll explore in this section, the second reason isn’t necessarily achieved; moreover, in the form Clang delivers them now, terse and impersonal diagnostics may work well for C and C++ experts, but are often lacking for less adept users of this language.
In their WG21 paper P2429 ‘Concepts Error Messages for Humans’, Sy Brand cites various tweets poking fun at compiler diagnostics as a part of the paper’s motivation. Two more examples include tweets by @Red_shirt_no2 (and the reply about crying) and @miniciv. Brand then goes on to cite the annual ISOCPP developer surveys, noting that compiler diagnostics are a top complaint from average C++ programmers. Brand’s paper details a lot of advantages about both the usefulness of compiler diagnostics and how friendly diagnostics improve developer experience. It also contrasts C++ diagnostics against diagnostics from various modern programming languages, which is something we’ll explore as well. In short, their paper is an excellent corequisite for this document, and we’ll be citing it a fair bit.
Because the ISOCPP survey results aren’t particularly nuanced, we conducted a dedicated public survey asking users both how satisfied they are with the status quo and what they’d like to see. According to Twitter analytics, over 15’760 people have seen the survey, although only 110 have responded. We learnt that the vast majority of respondents are reasonably satisfied with Clang’s diagnostics when compared to other C++ compilers (mean=3.9, median=4). There does not appear to be consensus when comparing Clang’s C++ diagnostics to other programming languages, although slightly more people have expressed dissatisfaction than satisfaction (mean=3, median=3). Roughly two thirds said that it takes them minutes to understand Clang diagnostics, with the remainder being almost equally divided between seconds (16.7%) and hours (18.5%). About half stated that they “never” find themselves spending over an hour or more trying to understand a diagnostic, that presented differently, may have been solvable in a matter of moments. The others are clustered into 1-2/week (32.4%), 2-4/week (13%), and sparingly few said 4–8 times or 9+ times a week.
We are still working through the feedback for the free-form question:
“When thinking about how other software presents errors, what do you appreciate most?”
This was answered by around half of the participants. From the discussed data, it appears that most participants are happy with Clang diagnostics when Clang is compared against its peers (GCC and MSVC), but many have flagged that their productivity could be improved in some way if we invest into new ways to communicate with developers.
We also asked for respondents to rank the three areas of diagnostics they feel should be prioritised when triaging improvements. The list of choices were overload sets, implicit conversions, undeclared symbols, templates, and concepts. Although we can very clearly say that templates have an overwhelming majority for first place, it isn’t currently clear what we should be prioritising second and third. To fairly assess this, we’ll count the results using single-transferable vote with the weighted inclusive Gregory counting method, to ensure that we fairly triage all of the issues.
Diagnostics are documentation
We posit that compiler diagnostics are a form of documentation. Wikipedia defines documentation as “any communicable material that is used to describe, explain, or instruct regarding some attributes of an object, system, or procedure, such as its parts, assembly, installation, maintenance, and use”. Compiler diagnostics are the way in which the tool communicates with the human, describing why the source cannot be translated to a target. Clang diagnostics also sometimes explain and instruct how to fix invalid source code. It’s non-traditional documentation, but it is still a form of documentation nevertheless.
In their conference talk Documentation in the Era of Concepts and Ranges, Sy Brand and Christopher Di Bella identify and discuss six fundamental attributes of good documentation. They claim that “good documentation” is fit-for-context, clear, digestible, complete, accessible, and up-to-date. We will not be focusing on up-to-dateness in this section, as that’s an issue of maintenance, but we will discuss it in the legacy section later on. Most of these attributes centre around documentation authors knowing their audience, using tools to maximise quality, measuring the quality of one’s documentation on several axes, and ensuring that documentation tools aren’t misused. We summarise their points to spare readers needing to watch an hour-long video mostly focussed on library documentation.
Fit-for-context
Documentation that’s fit-for-purpose is inextricably linked to knowing one’s audience. In the context of Clang, that means that we need to consider whether we are writing diagnostics that reflect the perspective of the user or the perspective of the compiler. These are two very different audiences: most users are not experts in the core language, and that means the way in which we communicate to them needs to be very different than the way in which we communicate with compiler developers (who are the people writing these diagnostics). We also hypothesise that hard-to-understand diagnostics discourage newcomers. Because there are far more non-expert users than compiler developers, we should be prioritising their needs ahead of our own.
Furthermore, humans are not the only audience of compiler diagnostics: scripts can and do listen to compiler diagnostics and act upon them. When analysing the status quo, we need to consider that the audience of a popular general-purpose programming language such as C++ is absurdly vast, and that there are many categories of “audience” in the mix, and act appropriately.
Based on the commentary from our survey, it seems that Clang very much is not framing its diagnostics from the perspective of a user. Although there were many comments that implied this, two responses to the question ‘When thinking about how other software presents errors, what do you appreciate most?’ jumped out:
“Tells what I did wrong and how to fix it rather than speaking in standardese”
“Error diagnostics feel like they’re designed with users in mind, whereas with cpp it feels like the main goal is just to print the state of the compiler when it gave up”.
Clarity
Clear documentation is documentation that communicates a message to the user that answers their questions, and doesn’t cause confusion or leave them with more questions than they had at the beginning. An outcome of clear documentation is that users should be able to at least recall and describe the communication from the author. In the case of Clang, a clear diagnostic would be one that results in the user being able to explain why their source is ill-formed, and hopefully
identify a reasonable resolution.
Clang diagnostics for simple things can be good enough. For example, when one forgets the closing brace on a function definition, they’re told that a closing brace is expected to match a specific opening brace. In this particular example, the Clang diagnostic is slightly better than the GCC one, because it points out the location of the lonely brace in addition to where the closing one should be. Not all Clang diagnostics for simple code are top-tier though: GCC is far clearer when it comes to keeping track of standard library names with missing headers.
An intermediate example might be operators with mismatched operands. The output here is likely functional for most programmers, but folks who are learning C++ in the era of consistent comparison operations may struggle to understand why 0 == x
works, while 0 + x
does not. It doesn’t help that the diagnostic draws attention to the fact that there’s “no known conversion from ‘int’ to ‘s’ for 1st argument”. Keeping with the ‘flat’ and ‘terse’ tone for now, a more informative message might look something like:
error: don't know how to add an 'int' and an 's'
15 | 0 + x;
| ~ ^ ~
| | |
| int s
note: 'operator+(int, s)' was not found, tried to convert parameters for known operator+ candidates
note: 's operator+(s, int)' is not viable because 'int' is not convertible to 's' for argument 1
This is an immediate improvement because it clearly states the problem from a human’s perspective as opposed to a compiler’s perspective: the problem according to the programmer is likely to be that 0 + x
isn’t working, not that there’s no viable operator+
. It then immediately follows on with context about why the operands are invalid instead of just listing all the possible alternatives out of context. Unfortunately, because it uses the same structure as the current design, it’s restricted in its ability to help the person receiving the diagnosis. Richard Smith also notes that printing out the full signature as above can create readability problems for larger signatures.
Responses from our survey strongly suggest that Clang diagnostics, while often satisfying with respect to other C++ compilers, can be rather opaque and lead to a lot of user frustration depending on the error. The survey quotes from the previous section are relevant here too. Other relevant commentary includes
“Clang is very, very bad at telling you why overload resolution failed, which clashes with modern library design that leans on overload resolution to try to provide a good user experience…”
Suggestions for fixes. Explanation for why it’s wrong
“Concisely telling me where the error is, why it’s wrong and how to fix it”.
Digestibility
While “clear” documentation focuses on being able to understand the content of a message, digestibility concerns itself with the structure and presentation of that message. Examples of facilitating digestibility can include breaking text into paragraphs, using different font style (e.g. bold, italic, etc.), using different font faces (e.g. sans-serif font for descriptions, code font
for snippets, etc.), using punctuation, putting code that can’t fit into prose text into its own block, and using headings to section different topics.
Again, responses indicate that Clang can do a lot of work to improve digestibility. Colour and improved formatting seemed to have a fair number of responses, collapsible trees were mentioned several times, and visualisation. The suggested “interactive webpage” option to ‘How would you prefer the information is presented to you?’ holds around 40% of the vote (people could select multiple options). Multiple people flagged the presentation of template backtraces and candidates for overload resolution as a source of frustration, which is probably unsurprising to the reader.
Completeness
Completeness is about ensuring that all the relevant information has been made available. Several people complained that Clang doesn’t provide enough context in certain circumstances, so we suspect that Clang diagnostics aren’t complete.
Accessibility
Accessibility concerns itself with ensuring that information is easily available for everyone. Some examples include making sure that documentation: can be read by screen readers, is scalable in size, translating to other natural languages, and using inclusive language.
Clang aims to be accessible by providing diagnostics with a way to be translated to other natural languages and by limiting lines to eighty columns where possible. Because Clang currently emits diagnostics with a terminal in mind, there’s not much more that it can do.
Aaron Ballman points out that LLVM documentation is reasonably accessible for the in-progress release, and that older versions of LLVM (e.g. LLVM 14!) become difficult to access. This isn’t a huge concern for this effort, but it would be nice if LLVM had ways to find version-specific information more easily.
Up-to-date
Documentation that’s up-to-date isn’t stale, and ensures that there’s no link rot. Documentation that is outdated is worse than useless. This is a maintenance issue. No complaints about diagnostics being out-of-date were observed and there aren’t any that we can recall either.
Improving the status quo
Instead of doing the traditional thing of writing diagnostics to the console as unstructured text, we could instead emit structured output; which will make the diagnostics machine parseable, substantially more powerful, and open the door for tooling to make diagnostics extensible. Clang is already incredibly robust thanks to its plug-and-play AST, and we can improve this by emitting machine-parsable diagnostics (making them plug-and-play too). At the time of writing, we consider there to be three main ways to present diagnostics to the user: either to the console (structure-independent), to an IDE, and as HTML (see below for a mock). These do not need to be the only ways to communicate diagnostics, and motivated users can take advantage of the structured output to write their own fantastic communication medium. Many respondents also requested that we provide some form of structured output.
We break console diagnostics into two modes: the first is the incumbent unstructured diagnostic model we currently use, and the second is structured output that can be either written to console or to file. This allows people who have Hyrum’s Law’d themselves into a situation to not have to change things overnight (or ever), and also allows for structured output to be available as raw text, which will be useful for machine parsing and a change of pace for readers. IDEs can consume this structured output to produce more intuitive diagnostics for their users based on the IDE structure. We drive the web-based approach below using the mock.
Structured output can be represented in multiple ways: we could use XML, JSON, YAML, or something domain-specific. We’ve chosen JSON for two main reasons: the first is because JSON is a way to structure data and has a limited set of interpretable types, and the vcpkg team from Microsoft expands on this fair bit. Using JSON allows Clang to emit the Static Analysis Results Interchange Format (SARIF)—which is an open standard based on JSON—to more easily communicate with other tools that can present the information in different ways for humans. The Clang static analyser already uses SARIF, so we may be able to leverage existing code into the mainstream compiler. Although SARIF is a good starting point, we may need to extend it to facilitate the described design. For example, it isn’t currently clear whether SARIF can represent a digestible context or links to documentation. Any extensions that we make in Clang should be proposed to SARIF so that there exists a canonical way for tooling to use these facilities (it’s conceivable that GCC or MSVC may attempt to follow suit once Clang has broken the mould).
A web-based approach
Below is a mock of an entirely redesigned approach to presenting errors to users outside of an IDE. It is entirely hypothetical and represents neither a finished product, nor does it represent the only solution.
There are several advantages to this approach. The first, and hopefully the most noticeable point is that everything has been grouped. We can only see the diagnostics for a source file when it has been selected. Diving a little deeper into this, we see that individual diagnostics and their components are also collapsible, introducing progressive disclosure. This means that relevant information becomes available only as the user requests it. This also offers the opportunity for us to coalesce repeated diagnostics. Condensing repeated diagnostics reduces the amount of noise for identical mistakes in source at different locations.
As with its predecessor, the problem is stated up-front in a user-digestible fashion. It then describes the problem in a way that feels as if a peer is collaborating with them, rather than a machine talking at them. Brand, who cites ‘How a computer should talk to people’ and ‘Compilers as Assistants’, emphasises that how a computer program’s message is worded affects how it is received by the human programmer. This is the motivation for a conversational tone in the reason section, as opposed to the formal tone that Clang currently employs. It also adopts the use of “we” and present tense like Flow, as noted in Brand’s section on other languages. There have traditionally been concerns about internationalisation in this situation: we should strive to make the user feel as if the implementer is talking with the programmer, rather than the implementation to maximise user experience, even if this means that we need to put more effort into ensuring that translations are just as good (i.e. we should not compromise internationalisation for the sake of English speakers’ UX, nor should we forsake anyone’s UX for the sake of easy internationalisation).
A list of resources are presented to the user: in this case, the user has opted to see both the cppreference documentation and N4860 (note they’re both hosted on llvm.org), for demonstration purposes. This is derived from both the Elm and Rust compilers, which provide messages that allow users to research more, if necessary. C++ is a language with a lot of sharp edges, and error messages aren’t always enough to diagnose a problem in obscure corner cases. For example, int const volatile& x = static_cast<int&&>(0);
is ill-formed, but it’s fairly obscure as to why, given that int const& x = static_cast<int&&>(0);
is allowed, and int const volatile&
should be a “superset” of int const&
, right? This sadly isn’t the case, and the compiler’s diagnostic doesn’t make that apparent, so links to dcl.init.ref#5.1.1 and dcl.init.ref#5.2 may be useful (this is an advanced thing to be doing, so links to standardese would likely be involved). We’ll discuss the feasibility of such an idea in the design section. Only a handful of respondents to our survey said that providing links to cppreference would impede their work and an overwhelming majority said that it would be at least somewhat helpful. Far fewer people agreed on providing the standardese always being helpful, but it still appears to have popular support (albeit ranked lower in priority with respect to cppreference).
Finally, context central to the diagnostics are presented in a more readable fashion. Providing context is always critical, but the way in which it is presented has received criticism from participants of our survey. A problem with the status quo is that Clang produces unstructured text. What we collected from our survey suggests that people would appreciate some form of structured data, although how that data is to be presented has far, far less consensus.
C++ is a complex language with many ways for programmers to encounter diagnostics. The design below outlines a new way to facilitate diagnostic presentation, but it does not make recommendations for replacing specific diagnostics.
Design
Adopting SARIF for structured output
We need to consider how diagnostics are output in order to provide users with the information they need to perform their duties in the most understandable format. How we choose to internally represent this has a lot of implications for user experience too. Unstructured text is great for humans using Clang on the command line (or somewhere that mirrors a command line), because it’s convenient for simple diagnostics such as a missing semicolon or quote. By appropriating the SARIF components of Clang’s static analyser into the Clang compiler, we can buy into a standard form of communication at a relatively low cost.
Due to the existing model being baked into C++ programming as we know it, the proposed design won’t replace the existing diagnostic engine, but will instead be an opt-in replacement (e.g. -femit-diagnostics-as=sarif -femit-diagnostics-to=/tmp/diagnostics
). Once it is properly mature and possibly widely adopted, we might consider deprecating today’s status quo as the default and swap the defaults then after a few more releases, but it will take a lot of time to permanently remove the existing engine. The best way for us to fast-track returning to a single diagnostic engine is to create a new tool that consumes SARIF as input and emits the current diagnostics as output, but we are very far away from completing that at present.
As suggested above, we propose two flags: one to indicate how diagnostics are emitted (e.g. SARIF, unstructured, etc.), and one to indicate where diagnostics are written (e.g. as a path, to an IP address, stdout, stderr etc.). -femit-diagnostics-as
value is unstructured
. The default -femit-diagnostics-to
value is stderr
(if a user really wants to write to a file called stderr in the current working directory, they can use ./stderr
).
This is a user-experience problem
As articulated in the motivation, we consider compiler diagnostics to be a form of documentation: they specifically document what doesn’t constitute a conforming program. This means that we need to prepare information for users that is audience-appropriate, clear, digestible, accurate, and widely-accessible. We intend to work with UX experts to determine what information is genuinely critical, what information is optional, and how this information can be presented by canonical LLVM tools. Given that SARIF is a standard format, users who are dissatisfied with our proposed canonical tools will be free to build alternatives to fit their needs or preferences.
We propose having at least the following information made available separately:
Diagnostic category | Error, warning, remark |
Source location | |
Summary | Similar to current diagnostics, states what the problem is. |
Reason | Explains why the diagnostic was generated in a friendly manner. |
Context | Relevant source info such as considered overload resolution candidates, template backtraces, etc. These should be structured, rather than appearing as plaintext. |
Potential fixes | |
Reference material | cppreference, C++ standard drafts, etc. |
Glossary | Separates identifying type aliases and template parameters from the message. |
By separating these out, we can build tools that can produce diagnostics with a digestible format that suits a user’s needs. For example, a web server might generate a web page that lists all the errors with their summaries by default, and then shows more detailed information by selecting a specific error. A server could also have the responsibility of de-duplicating repeated errors. The UX team that we consulted mentioned that syntax highlighting may be a benefit in error messages.
The summary is similar to non-note diagnostics that we currently receive, which helps to make the diagnostic fit for purpose. To help improve clarity, they should be user-oriented. This can mean changing diagnostics to present information from their perspective, rather than a compiler’s perspective. This may mean that we’ll need to change how we talk about certain things such as instantiation errors and overload resolution failure, which currently provide a technically accurate reason, but not necessarily an easily understandable one. The reason is a longer description that expands on the summary’s short description. This section attempts to help the user understand why their code is failing as best as possible, and relates the description back to their code. Its language should be inclusive and help the programmer feel as if the compiler is trying to help them, rather than hinder them. The reference material is a list of likely-relevant documentation that are hosted on LLVM-owned servers. While the reason section will provide an explanation of why things went wrong, the section is not the right spot for providing the user with reference-like information. Instead, we offer the most relevant links to cppreference (or equivalent) and the final draft for whichever C++ standard the user is compiling against, in the format of Draft C++ Standard: Contents. Our survey shows significantly more people finding cppreference documentation to be more useful to them, so we should prioritise this over C++ standard wording. This section is the most likely to go stale, and we will need to have a plan to ensure that the cppreference material remains up to date, as well as citing applied defect reports as necessary. The context region collects all the “noisy” bits, such as the list of overload candidates and template backtraces. Finally, the glossary is a collection of type aliases that are used elsewhere in the diagnostic. When the programmer uses std::string
, we should be presenting std::string
as often as possible, rather than presenting diagnostics that use std::basic_string<char>
or string (where string = std::basic_string<char>)
.
One survey respondent suggested that we “burn compiler time in error cases to generate better diagnostics. Just don’t hit the fast path of correct code”. Despite C++ compile-times being painfully slow, we think this is a suggestion worth exploring, because it may drastically improve the quality of life for developers and result in getting them working binaries faster than messages with less context. We intend to work closely with UX experts to determine what is an acceptable performance hit. One approach may be to reduce the default maximum number of errors encountered, and abort once we reach that ceiling.
Clang web server
This proposal intends to establish a web server as one of the canonical diagnostic servers. This would allow a user to be presented with a radically different UI that would make it easier to organise information in a visual and collapsible format. The server need not be hosted on a machine actually on the web: one can self-host clang-web-diagnosticsd
or host it on a network-local machine (this is in fact the encouraged model).
Due to the highly interactive nature of browser interfaces, this will require lots of consultation with UX engineers to ensure that the interface caters to as many community needs as possible.
Choice of language
The way in which we communicate with the user is critical to this effort’s success. How we choose to word things determines how our tools will be perceived. Clang currently prefers to be overly brief in its wording and be impersonal. We would prefer it if Clang instead provided as much information as possible to get the programmer on the right path and felt as if the compiler were a really caring university tutor instead of a blunt machine. We also think Flow’s approach of using first-person plural wording is worth considering, since it adds a personal touch to the compiler, and should hopefully make the programmer feel as if the compiler developer were talking to the programmer, rather than the compiler program.
Clang’s developer documentation says that it uses impersonal language to aid with translation. Our design uses personal and conversational language, and intend to provide translators with guidance on how we intend our message to be communicated to people: it is then at their discretion on how to word the diagnostics so that it’s culturally appropriate for the chosen locale.
Project legacy
While we will be driving the initial effort, we would like for there to be serious community buy-in.
That is: we’ll be contributing the structural changes and other tooling, but will need the community’s assistance in the more fine-grained things such as specific diagnostics, if we’re to complete this in a reasonable timeframe.
Alternative designs
Staying within the existing text framework
The original design for this project was to remain within the existing diagnostics framework that Clang has provided since forever. That is, the original design proposed making fundamental changes to how the front-end presented diagnostics, but still presented them to a text-based console. Because this proposal considers diagnostics to be a form of documentation, the proposal aimed to structure the text output in such a way that it became more digestible. The final result looked something like the following.
========== On source.cpp:15 ==========
------- Error summary -------
Invalid operands to binary expression ('int' and 's')
15 | 0 + x;
| ~ ^ ~
| | |
| int s
------- Reason -------
'0 + x' is invalid because we aren't able to find 'operator+(int, s)', nor can we find a compatible 'operator+' overload by converting the parameters.
------- Potential fixes -------
- Add an overload for 'operator+(int, s)' (recommendation: make it a hidden friend in the body of 's')
- Add the conversion operator 'operator s::int() const'
------- These cppreference pages that may provide insight -------
- https://cppreference.llvm.org/w/cpp/language/operator_arithmetic
- https://cppreference.llvm.org/w/cpp/language/operators
- https://cppreference.llvm.org/w/cpp/language/overload_resolution
------- C++20 standard draft -------
- https://c++20.llvm.org/stable-name-1
- https://c++20.llvm.org/stable-name-2#1
- https://c++20.llvm.org/stable-name-3
------- Context -------
If you'd like to see which overloads were considered, you can compile again with '-fdiagnostic-show-overload-candidates'.
While it improves digestibility, the change in diagnostics has significant drawbacks due to remaining in the console. Firstly, it substantially increases the overall amount of text that one has to read, and even though it’s now sectioned, the design has been overfitted for beginners due to its heavy exposition in reasoning and hiding of context by default. Worse, since important context is hidden by default, it makes the overall process worse for intermediate and expert programmers, since they need to recompile with new flags to get the context, which will invalidate build caches (and if they permanently leave them on, then hiding the context doesn’t reduce the verbosity at all).
The reason this design has been abandoned is outlined in the motivation: many of the ideas presented in this section aren’t suitable for a text-based terminal, and would be better off in a more interactive environment.
Open questions
Finally, we need to address the technical impact to the Clang codebase. Richard Smith notes that we haven’t addressed any of the following:
- How will this be integrated into the Clang codebase?
- How invasive will the changes be?
- What are the criteria for determining if this experiment is successful?
- How do we plan to maintain this if it’s successful?
- How do we plan to roll this back if it’s not?
- How much harder does this make it to implement new functionality requiring new diagnostics?
We intend to begin answering these questions within the next fortnight, and provide a preliminary report as a follow-up to this document.