[RFC] Improving Clang's middle and back end diagnostics


Clang performs most semantic analysis in the front-end as part of Sema. Sema produces diagnostics (Diags) such as warnings and errors with precise source locations (SourceLocation) to inform users precisely where a mistake was made in their sources.

Not all problems affecting compilation are diagnosable in the front-end of the compiler. When we encounter such issues, we can sometimes provide diagnostics. In other cases we crash, assert, produce nonsensical code, etc… There is limited support for backend diagnostics (clang::BackendConsumer::DiagnosticHandlerImpl) and when we do implement diagnostics this way there are a few issues.

Problem 1: ad-hoc debug info leads to duplication in IR

From most people’s experience, enabling debug info slows down compilation and linkage; there’s a lot of information produced in terms of locations and types. So unless explicitly enabled, the default mode is to keep debug info disabled.

Debug info contains information such as DILocations, DIScopes, and DISubroutines. These together can tell you precisely where in the source languages something went wrong. But not if debug info was never generated in the first place. When I refer to ad hoc debug info in this document, I’m referring to IR metadata that is explicitly not DILocation or related Metadata subclasses intended for the emission of debug information in the output of compilation.

Many issues related to support for inline asm are only diagnosed in the backend. To facilitate generating diagnostics, even when debug info may not be enabled, clang always unconditionally attaches a !srcloc metadata node to the call/callbr instruction.

call void asm "", ""() !srcloc !7

The above is with -g0 to disable debugging. With debug info enabled, we get

call void asm "", ""() !dbg !15, !srcloc !16

Either !srcloc or !dbg provides enough information to reconstruct a precise SourceLocation which is necessary to provide a diagnostic with file+line+col info to the user. Instead, llvm and clang both have code to decode these custom !srcloc metadata rather than rely on debug info, because debug info may or may not be available.

Similarly, to support __attribute__((warning(“”))) and __attribute__((error(“”))), clang will attach !srcloc metadata to calls of such functions.

call void @bar(), !srcloc !7

And with debug info enabled:

call void @bar(), !dbg !15, !srcloc !16

In this sense, ad hoc metadata like !srcloc is zero cost. Or more precisely, we only pay for generating and maintaining this metadata when inline asm or calls to functions attributed with error/warning attributes exist in source. But, when debug info is enabled, we pay this cost twice.

Problem 2: loss of context that make comprehending diagnostics challenging

Consider the following code:

static void foo (int i) { asm (""::"i"(i)); }

void bar (int x) {

Today in Clang, this produces the following diagnostic at -O2:

<source>:1:32: error: invalid operand for inline asm constraint 'i'
static void foo (int i) { asm (""::"i"(i)); }

This obfuscates where the problem lies; the issue is that x is not an immediate value which is a requirement of the “i” input constraint. While the constraint could be modified, it would have been more helpful to pinpoint that the specific expression foo(x) was the lone problematic site. (This code example is also a fun parlor trick answer to the question “write code that only compiles at when optimizations are enabled").

Next, let’s look at one more case. GCC’s -Waggressive-loop-optimizations (enabled by default) can report specific instance:

int a[4096];
void foo (void) {
    for (int i = 0; i <= 4096; ++i)
        a[i] = 42;

Featuring a classic off by one produces the diagnostic:

<source>: In function 'foo':
<source>:4:14: warning: iteration 4096 invokes undefined behavior [-Waggressive-loop-optimizations]
    4 |         a[i] = 42;
      |         ~~~~~^~~~
<source>:3:23: note: within this loop
    3 |     for (int i = 0; i <= 4096; ++i)
      |                     ~~^~~~~~~

Clang reports:

<source>:1:5: warning: 'a' is an unsafe buffer that does not perform bounds checks [-Wunsafe-buffer-usage]
int a[4096];
<source>:4:9: note: used in buffer access here
       a[i] = 42;

Via -Wunsafe-buffer-usage that was added in clang-16, though it’s off by default and not enabled by -Wall or -Wextra. Probably because even with the off by one fixed, we still get the same diagnostic.

This is better than I was expecting, but note that Sema relies on building an ArraySubscriptGadget to catch this. Souldn’t SCEV be able to notice the same thing? Could SCEV report this? Would it need more ad hoc debug info to report back precisely where the loop was vs where the array access was located in the original sources?

Another case where this comes up is in relation to _FORTIFY_SOURCE. Consider the following code sample:

#include <string.h>
__attribute__((error("bad memcpy"))) void bad(void);

static void *my_memcpy(void *restrict dest, size_t dest_size,
                       const void *restrict src, size_t src_size,
                       size_t n) {
    if (n > dest_size || n > src_size)
    return memcpy(dest, src, n);

void my_driver (void) {
    unsigned char src [42], dst [42];
    my_memcpy(dst, 42, src, 42, 0);
    my_memcpy(dst, 42, src, 42, 42);
    my_memcpy(dst, 42, src, 42, 4096);
    my_memcpy(dst, 42, src, 42, 1);

Today in Clang, when compiled with optimizations enabled we get:

<source>:9:9: error: call to 'bad' declared with 'error' attribute: bad memcpy

Great! The compiler caught that we have an array out of bounds access, and did so at compile time. The issue is that it didn’t tell us precisely that one of the calls to my_memcpy (the one with a value of 4096 for n) was problematic. Now imagine that you have hundreds of calls to my_memcpy in your driver. And clang tells you there’s a bug. Somewhere. Let’s contrast that with the diagnostics produced by GCC for the above test case:

In function 'my_memcpy',
    inlined from 'my_driver' at <source>:16:5:
<source>:8:9: error: call to 'bad' declared with attribute error: bad memcpy
    8 |         bad();
      |         ^~~~~

As a developer, this helps me pinpoint that the call to my_memcpy on line 16 col 5 in function my_driver is the problematic call site.

How could we emulate such an improved developer experience with Clang? Solving this problem is more important to me in this RFC than solving problem 1.

Solution 1: more ad hoc debug info

⚙ D141451 [clang] report inlining decisions with -Wattribute-{warning|error} proposes recording inlining decisions in metadata in IR. This is done during inline substitution. While it can give us a more precise chain of which inlining decisions led to the problematic call to my_memcpy (see previous example), it doesn’t have precise line information for each of the individual calls. I wouldn’t say this is necessary to debug such an issue as a user, but it leaves something to be desired. It also duplicates work and memory usage when there is debug info to update; DILocations may have an inlinedAt DIScope. These chains can be followed to understand inlining decisions made.

One way to improve that perhaps is to also emit more ad hoc debug info for each call site. Reading through CGDebugInfo.cpp, it’s not clear to me how we could limit debug info (ad hoc or not) to just call expressions.

Solution 2: enable location tracking only unconditionally

Clang already has the option to emit debug info metadata in IR, with a flag set so that debug info does not persist into the backend. clang::codegenoptions::LocTrackingOnly in order to support emitting optimization remarks (such as the various -Rpass= flags). If we defaulted to this mode, then we would always have the debug info metadata which contains chains of inlining decisions. This would allow for the middle end (or backend; MIR can refer to the original IR) to more precisely emit diagnostics. (Or rather diagnostics that are precise wrt file+line+col info). This would allow us to remove all ad-hoc location info such as !srcloc as well.

⚙ D145994 WIP is a reimplementation of D141451 that demonstrates that the same information wrt. Inlining decisions can be emitted when such debug info exists in IR. An improvement in this implementation is that it can display file+line+col info for each call in a chain, which the initial implementation cannot do.

Isn’t this going to blow up compile-time and memory usage?

Initial measurements show a 1% slowdown (N=30) when compiling the Linux kernel with clang vs the baseline (both intentionally not emitting debug info into the object file; I’d expect no difference when debug info was intentionally generated). This includes time to link and overhead from GNU Make. I measured no change in maximum resident set size for such builds.

For a release build (i.e. no debug info in the binary) of clang itself using a bootstrap of clang defaulting to LocTrackingOnly, I did measure a 2.77% slowdown (N=2) and 1.69% higher peak RSS.

LLVM Compile-Time Tracker shows a geomean slowdown of 4.77% of wall time to build various open source projects (ranging from 1.95% to 6.9%). That said, the results page above has a banner stating “Warning: The wall-time metric is very noisy and not meaningful for comparisons between specific revisions“ so take from that what you will. Max-rss also balloons for some thinLTO projects to +20.8% probably due to duplication of debug info, particularly from headers that are included into multiple CUs.
Thanks @tstellar and @nikic for helping me collect those metrics.

Is this worse for <other codebase>? IDK you tell me; it’s a one line change to clang to measure.

LocTrackingOnly omits type information and does not result in .debug_info (and friends) generation in ELF object files. That said, it does emit DILocations for IR instructions even when they would not be necessary to support inline asm or fn-attr-error diagnostics (i.e. DILocations for ret instructions for instance).

We might be able to claw some of this lost compile time back if we could easily limit the debug info to just call instructions, though this might limit our ability to more easily support diagnostics from the middle or back ends like -Waggressive-loop-optimizations. Also noted in Solution 1 is that it’s non-obvious how to make such a change in Clang.

I’m not sure how else we might try to support an improved diagnostic experience with clang and make these more zero cost. I’m also not sure whether the community considers an immediate 1-6.9% worsening of compile time a worthwhile trade in exchange for an immediate simplification in middle end and back end diagnostics and the potential for better middle end and back end diagnostics in the future?

Solution 3: have diagnostic differences based on debug info

What if we produced different qualities of diagnostics based on whether the sources were built with debug info? I.e. we provide initial diagnostics that don’t contain inlining decisions, but produce a note saying that if you recompile with debug info enabled, we could provide additional information?

It’s an idea that I’m not too enthused about. It would require two different implementations based on whether debug info vs ad hoc debug info was emitted, though we could probably eliminate the IR duplication when debug info was requested.

Also, I’m not sure if we produce different decisions wrt. Inline substitution when debug info is enabled or not. I hope we don’t, but if we do, it would be a poor user experience if we told a user there’s an error, enable debug info to get more info, they did so, and now the issue went away due to changes in inlining.

I also don’t think there’s precedence for this and the ergonomics as a user aren’t great (IMO).

Solution 4: enable location tracking conditional on some new flag

What if rather than say “recompile with -g for more info" (solution 3) we said “recompile with -<new flag>” instead?

I also don’t think there’s precedence for this and the ergonomics as a user aren’t great (IMO).

Solution 5: forget IR representations entirely

-Wframe-larger-than= is another backend specific diagnostic, diagnosed by prolog epilog inserter. How does that work? To do so, Clang’s CodeGenAction retains a vector of pairs of hashes of every llvm::Function name emitted and corresponding FullSourceLoc, rather than emit ad hoc debug info. This trades explicit representation in IR for memory usage in clang. Unlike metadata which is attached to Instructions or Functions which gets cleaned up when those anchors get removed from the Module, we always retain the corresponding key+value pairs.

We could change the current implementation of asm and attribute-warning/attribute-error to do something similar today; record the FullSourceLoc for every call instruction.

We would still need a way of querying decisions made by the inliner after the fact, but maybe we can come up with some other interface for that separately?

Solution 6: have Sema modify CodeGenOpts.DebugInfo when encountering ErrorAttr

FWICT, CodeGenModule’s DebugInfo is constructed BEFORE Sema detects any ErrorAttr. Not sure this is possible. I.e. by the time Sema sees there’s an ErrorAttr in the sources, CodeGenModule has already made the decision not to emit debug info.

Any other ideas other than the ones I’ve laid out above? Thoughts on any of the above solutions? If there’s no feedback, I will simply pursue landing the original idea from solution 1 (⚙ D141451 [clang] report inlining decisions with -Wattribute-{warning|error}).

cc @AaronBallman @dblaikie for explicit feedback.

Even 1% globally is a significant regression for debug info that we aren’t actually going to query for common codebases. (Most code doesn’t normally contain either inline asm or warning/error attributes, and I’m not really convinced by other potential diagnostics.)

Currently, debug info is applied via IRBuilder::SetCurrentDebugLocation(), which is very convenient for applying debug info everywhere, but obviously less convenient for applying it selectively. You’d have to drive the computation a different way: make the code that constructs the relevant calls explicitly request debug info.

re:-Waggressive-loop-optimizations: In the past, we’ve intentionally avoided generating diagnostics from optimizations. This is based on feedback in the past from other optimizations: diagnostics produced the way tend to be produced inconsistently because they’re based on how code is optimized, and there are often unfixable false positives due to warnings on impossible codepaths. Instead, we’ve focused on a combination of static analysis and sanitizers. (This is also why _FORTIFY_SOURCE does not normally use “warning” or “error” attributes, outside the Linux kernel’s implementation.)

+1 to the current perf of just using loctracking being a bit much -you could probably do a hack-ish prototype by modifying IRBuilder to only attach the debug location to call instructions. (though all that checking might be expensive & make the comparison not representative… & maybe there’s a cheaper way to do it) - IRBuilder::AddMetadataToInst could check if the instruction is a call first, then inside the loop check if the metadata is debug info (or perhaps record the index of the debug info metadata when it’s added to MetadataToCopy - or store it in a separate member variable entirely for this experiment) and skip adding it if it’s not a call.

& yeah, generally we don’t want to make more diagnostics optimization-based.

@efriedma-quic what do you think, then, about the basic question of “if, and how, to improve the diagnostic experience for inline asm/attr-warning/attr-error”? Would you prefer the existing ad-hoc srcloc metadata be used more broadly with its own inlining semantics/updating/tracking? Something opt-in (either an existing or new/dedicated flag)? None of these options seem worth the improved diagnostic experience? Some other solutions?

Fundamentally, if we’re tracking this information through inlining, we have to attach some sort of location to every function definition in the file… but that shouldn’t be a measurable cost, I think, with a reasonably lean representation of that information. So I don’t think there’s any reason to avoid improvements in this area.

Not sure off the top of my head if DWARF locations allow us to derive a macro backtrace like we can with clang SourceLocation. And I’m not sure about the overhead; the overhead should be roughly comparable to !srcloc metadata, I think? I can’t think of any other reason to stick with !srcloc.

It might be worth looking to see if we can make using DWARF cheaper. Looking a bit more at the representation, DebugLoc isn’t really very efficient; I mean, it’s not intentionally wasting space, but using a unique Metadata allocation for every unique line/column number has to have significant overhead. Shoving a clang SourceLocation into every LLVM IR instruction would be basically free.

Thanks everyone for the feedback thus far!

(This is also why _FORTIFY_SOURCE does not normally use “warning” or “error” attributes, outside the Linux kernel’s implementation.)

Lest other readers think that modifying the Linux kernel’s implementation of _FORTIFY_SOURCE be a valid solution, it’s important to note that those other implementations are deficient at providing compile time diagnostics when there are overflows detectable at compile time, though after optimizations have run. Example: Compiler Explorer

Compare that with the kernel’s impl: Compiler Explorer which does catch such mistakes at compile time.

That said, we have run into issues where missed optimizations lead to diagnostics that aren’t pretty to avoid in the sources. Example:

we have to attach some sort of location to every function definition in the file

We do already track basic function definitions locations unconditionally. See clang::BackendConsumer::ManglingFullSourceLocs. We keep that info around even when the Function gets DCE’d from IR, which to me sounds like a memory leak.

But to improve these diagnostics we need a bread crumb trail of inlining decisions made. That includes both the locations of calls and a “chain” of inline substitutions that occurred. (Solution 1 above has the chain, Solution 2 above has the chain and location of calls).

Not sure off the top of my head if DWARF locations allow us to derive a macro backtrace like we can with clang SourceLocation.

DWARF does not encode that macro expansion occurred. That is a deficiency with the current approach, though it’s no worse than the competition wrt. reporting what looks like a call chain.

but using a unique Metadata allocation for every unique line/column number has to have significant overhead. Shoving a clang SourceLocation into every LLVM IR instruction would be basically free.

Sorry, I’m having trouble picturing what you had in mind. Today, an example DILocation Metadata node for a call site might look like:

tail call void @foo() #2, !dbg !13, !srcloc !19
!13 = !DILocation(line: 5, column: 3, scope: !14, inlinedAt: !15)
!19 = !{i32 79}

(Where the immediate value 79 is a clang SourceLocation’s raw encoding, which is just a uint32_t). Is your suggestion to replace the line and column info with the raw SourceLocation as is done today for the ad-hoc !srcloc? Does that save us much? We would still be allocating a Metadata node for each line+column combo. Oh, it looks like every Instruction does already contain a DebugLoc.

Wouldn’t we have to coordinate such a change to DILocation with other out of tree front ends?

Or did you have something else in mind?

I’d expect this to be caught by clang up front, similar to builtins with operands that are required to be immediates. It would require some more code duplication of the constraint rules

In other words, it rejects reject valid code.

When we generate diagnostics for constructs which aren’t wrong, it forces people to ignore real issues. I mean, there’s a tradeoff here; LLVM IR optimizers can find issues that static analysis might not find, but minimizing the false positive rate for diagnostics is very important. (I think the Linux kernel developers are willing to tolerate a higher false positive rate than most users would be happy with.)

Right, we already reserve 64 bits in each llvm::Instruction to specifically encode debug locations. The proposal would be to change the encoding.

I haven’t quite thought it through… it would probably have to be a little more complicated to deal with inlining, and LTO. But something along those lines, yes; we’re doing a bunch of unnecessary work re-encoding the information.

Strong +1 to this – optimization-based diagnostics lead to a rather poor user experience in many situations. (Code works locally under development, gets checked in, post-commit CI finds issues because it did an optimized build. User checks their code for problems in their IDE (-fsyntax-only), has no issues, goes to build their code and now there’s warnings. Some optimizations may differ in behavior based on analysis cut points, so the code may have no warnings for developer A but then warn on developer B’s machine. And so on.)

I think it’s totally appropriate to add new remark diagnostics that are generated from the optimizer (so users can learn about missed inlining or other optimizer decisions), but we should be very wary of adding any new warning diagnostics and I think error diagnostics (aside from the error attribute or -Werror use) are something we basically never want to come out of the backend.

but we should be very wary of adding any new warning diagnostics and I think error diagnostics (aside from the error attribute or -Werror use) are something we basically never want to come out of the backend.

Fully agree. Just look at the disaster with GCC and -Warray-bounds (that comes as standard in -Wall).

The HLSL compiler does enable location tracking by default to be able to generate source locations for diagnostics that are surfaced from IR. It is a significant performance regression (I think around 10%). I added a flag to DXC to opt-out which hurts the quality of the diagnostics but has a big win in compile time. :frowning_face: