Background
Clang performs most semantic analysis in the front-end as part of Sema. Sema produces diagnostics (Diags) such as warnings and errors with precise source locations (SourceLocation) to inform users precisely where a mistake was made in their sources.
Not all problems affecting compilation are diagnosable in the front-end of the compiler. When we encounter such issues, we can sometimes provide diagnostics. In other cases we crash, assert, produce nonsensical code, etc… There is limited support for backend diagnostics (clang::BackendConsumer::DiagnosticHandlerImpl) and when we do implement diagnostics this way there are a few issues.
Problem 1: ad-hoc debug info leads to duplication in IR
From most people’s experience, enabling debug info slows down compilation and linkage; there’s a lot of information produced in terms of locations and types. So unless explicitly enabled, the default mode is to keep debug info disabled.
Debug info contains information such as DILocations, DIScopes, and DISubroutines. These together can tell you precisely where in the source languages something went wrong. But not if debug info was never generated in the first place. When I refer to ad hoc debug info in this document, I’m referring to IR metadata that is explicitly not DILocation or related Metadata subclasses intended for the emission of debug information in the output of compilation.
Many issues related to support for inline asm are only diagnosed in the backend. To facilitate generating diagnostics, even when debug info may not be enabled, clang always unconditionally attaches a !srcloc
metadata node to the call
/callbr
instruction.
call void asm "", ""() !srcloc !7
The above is with -g0 to disable debugging. With debug info enabled, we get
call void asm "", ""() !dbg !15, !srcloc !16
Either !srcloc
or !dbg
provides enough information to reconstruct a precise SourceLocation which is necessary to provide a diagnostic with file+line+col info to the user. Instead, llvm and clang both have code to decode these custom !srcloc
metadata rather than rely on debug info, because debug info may or may not be available.
Similarly, to support __attribute__((warning(“”)))
and __attribute__((error(“”)))
, clang will attach !srcloc
metadata to calls of such functions.
call void @bar(), !srcloc !7
And with debug info enabled:
call void @bar(), !dbg !15, !srcloc !16
In this sense, ad hoc metadata like !srcloc
is zero cost. Or more precisely, we only pay for generating and maintaining this metadata when inline asm or calls to functions attributed with error/warning attributes exist in source. But, when debug info is enabled, we pay this cost twice.
Problem 2: loss of context that make comprehending diagnostics challenging
Consider the following code:
static void foo (int i) { asm (""::"i"(i)); }
void bar (int x) {
foo(42);
foo(0);
foo(x);
foo(-1);
}
Today in Clang, this produces the following diagnostic at -O2
:
<source>:1:32: error: invalid operand for inline asm constraint 'i'
static void foo (int i) { asm (""::"i"(i)); }
^
This obfuscates where the problem lies; the issue is that x
is not an immediate value which is a requirement of the “i” input constraint. While the constraint could be modified, it would have been more helpful to pinpoint that the specific expression foo(x)
was the lone problematic site. (This code example is also a fun parlor trick answer to the question “write code that only compiles at when optimizations are enabled").
Next, let’s look at one more case. GCC’s -Waggressive-loop-optimizations (enabled by default) can report specific instance:
int a[4096];
void foo (void) {
for (int i = 0; i <= 4096; ++i)
a[i] = 42;
}
Featuring a classic off by one produces the diagnostic:
<source>: In function 'foo':
<source>:4:14: warning: iteration 4096 invokes undefined behavior [-Waggressive-loop-optimizations]
4 | a[i] = 42;
| ~~~~~^~~~
<source>:3:23: note: within this loop
3 | for (int i = 0; i <= 4096; ++i)
| ~~^~~~~~~
Clang reports:
<source>:1:5: warning: 'a' is an unsafe buffer that does not perform bounds checks [-Wunsafe-buffer-usage]
int a[4096];
~~~~^~~~~~~
<source>:4:9: note: used in buffer access here
a[i] = 42;
^
Via -Wunsafe-buffer-usage that was added in clang-16, though it’s off by default and not enabled by -Wall or -Wextra. Probably because even with the off by one fixed, we still get the same diagnostic.
This is better than I was expecting, but note that Sema relies on building an ArraySubscriptGadget to catch this. Souldn’t SCEV be able to notice the same thing? Could SCEV report this? Would it need more ad hoc debug info to report back precisely where the loop was vs where the array access was located in the original sources?
Another case where this comes up is in relation to _FORTIFY_SOURCE. Consider the following code sample:
#include <string.h>
__attribute__((error("bad memcpy"))) void bad(void);
static void *my_memcpy(void *restrict dest, size_t dest_size,
const void *restrict src, size_t src_size,
size_t n) {
if (n > dest_size || n > src_size)
bad();
return memcpy(dest, src, n);
}
void my_driver (void) {
unsigned char src [42], dst [42];
my_memcpy(dst, 42, src, 42, 0);
my_memcpy(dst, 42, src, 42, 42);
my_memcpy(dst, 42, src, 42, 4096);
my_memcpy(dst, 42, src, 42, 1);
}
Today in Clang, when compiled with optimizations enabled we get:
<source>:9:9: error: call to 'bad' declared with 'error' attribute: bad memcpy
bad();
^
Great! The compiler caught that we have an array out of bounds access, and did so at compile time. The issue is that it didn’t tell us precisely that one of the calls to my_memcpy (the one with a value of 4096 for n) was problematic. Now imagine that you have hundreds of calls to my_memcpy
in your driver. And clang tells you there’s a bug. Somewhere. Let’s contrast that with the diagnostics produced by GCC for the above test case:
In function 'my_memcpy',
inlined from 'my_driver' at <source>:16:5:
<source>:8:9: error: call to 'bad' declared with attribute error: bad memcpy
8 | bad();
| ^~~~~
As a developer, this helps me pinpoint that the call to my_memcpy on line 16 col 5 in function my_driver is the problematic call site.
How could we emulate such an improved developer experience with Clang? Solving this problem is more important to me in this RFC than solving problem 1.
Solution 1: more ad hoc debug info
⚙ D141451 [clang] report inlining decisions with -Wattribute-{warning|error} proposes recording inlining decisions in metadata in IR. This is done during inline substitution. While it can give us a more precise chain of which inlining decisions led to the problematic call to my_memcpy (see previous example), it doesn’t have precise line information for each of the individual calls. I wouldn’t say this is necessary to debug such an issue as a user, but it leaves something to be desired. It also duplicates work and memory usage when there is debug info to update; DILocations may have an inlinedAt DIScope. These chains can be followed to understand inlining decisions made.
One way to improve that perhaps is to also emit more ad hoc debug info for each call site. Reading through CGDebugInfo.cpp, it’s not clear to me how we could limit debug info (ad hoc or not) to just call expressions.
Solution 2: enable location tracking only unconditionally
Clang already has the option to emit debug info metadata in IR, with a flag set so that debug info does not persist into the backend. clang::codegenoptions::LocTrackingOnly
in order to support emitting optimization remarks (such as the various -Rpass= flags). If we defaulted to this mode, then we would always have the debug info metadata which contains chains of inlining decisions. This would allow for the middle end (or backend; MIR can refer to the original IR) to more precisely emit diagnostics. (Or rather diagnostics that are precise wrt file+line+col info). This would allow us to remove all ad-hoc location info such as !srcloc as well.
⚙ D145994 WIP is a reimplementation of D141451 that demonstrates that the same information wrt. Inlining decisions can be emitted when such debug info exists in IR. An improvement in this implementation is that it can display file+line+col info for each call in a chain, which the initial implementation cannot do.
Isn’t this going to blow up compile-time and memory usage?
Initial measurements show a 1% slowdown (N=30) when compiling the Linux kernel with clang vs the baseline (both intentionally not emitting debug info into the object file; I’d expect no difference when debug info was intentionally generated). This includes time to link and overhead from GNU Make. I measured no change in maximum resident set size for such builds.
For a release build (i.e. no debug info in the binary) of clang itself using a bootstrap of clang defaulting to LocTrackingOnly, I did measure a 2.77% slowdown (N=2) and 1.69% higher peak RSS.
LLVM Compile-Time Tracker shows a geomean slowdown of 4.77% of wall time to build various open source projects (ranging from 1.95% to 6.9%). That said, the results page above has a banner stating “Warning: The wall-time metric is very noisy and not meaningful for comparisons between specific revisions“ so take from that what you will. Max-rss also balloons for some thinLTO projects to +20.8% probably due to duplication of debug info, particularly from headers that are included into multiple CUs.
Thanks @tstellar and @nikic for helping me collect those metrics.
Is this worse for <other codebase>
? IDK you tell me; it’s a one line change to clang to measure.
LocTrackingOnly omits type information and does not result in .debug_info (and friends) generation in ELF object files. That said, it does emit DILocations for IR instructions even when they would not be necessary to support inline asm or fn-attr-error diagnostics (i.e. DILocations for ret instructions for instance).
We might be able to claw some of this lost compile time back if we could easily limit the debug info to just call instructions, though this might limit our ability to more easily support diagnostics from the middle or back ends like -Waggressive-loop-optimizations. Also noted in Solution 1 is that it’s non-obvious how to make such a change in Clang.
I’m not sure how else we might try to support an improved diagnostic experience with clang and make these more zero cost. I’m also not sure whether the community considers an immediate 1-6.9% worsening of compile time a worthwhile trade in exchange for an immediate simplification in middle end and back end diagnostics and the potential for better middle end and back end diagnostics in the future?
Solution 3: have diagnostic differences based on debug info
What if we produced different qualities of diagnostics based on whether the sources were built with debug info? I.e. we provide initial diagnostics that don’t contain inlining decisions, but produce a note saying that if you recompile with debug info enabled, we could provide additional information?
It’s an idea that I’m not too enthused about. It would require two different implementations based on whether debug info vs ad hoc debug info was emitted, though we could probably eliminate the IR duplication when debug info was requested.
Also, I’m not sure if we produce different decisions wrt. Inline substitution when debug info is enabled or not. I hope we don’t, but if we do, it would be a poor user experience if we told a user there’s an error, enable debug info to get more info, they did so, and now the issue went away due to changes in inlining.
I also don’t think there’s precedence for this and the ergonomics as a user aren’t great (IMO).
Solution 4: enable location tracking conditional on some new flag
What if rather than say “recompile with -g for more info" (solution 3) we said “recompile with -<new flag>
” instead?
I also don’t think there’s precedence for this and the ergonomics as a user aren’t great (IMO).
Solution 5: forget IR representations entirely
-Wframe-larger-than= is another backend specific diagnostic, diagnosed by prolog epilog inserter. How does that work? To do so, Clang’s CodeGenAction retains a vector of pairs of hashes of every llvm::Function name emitted and corresponding FullSourceLoc, rather than emit ad hoc debug info. This trades explicit representation in IR for memory usage in clang. Unlike metadata which is attached to Instructions or Functions which gets cleaned up when those anchors get removed from the Module, we always retain the corresponding key+value pairs.
We could change the current implementation of asm and attribute-warning/attribute-error to do something similar today; record the FullSourceLoc for every call instruction.
We would still need a way of querying decisions made by the inliner after the fact, but maybe we can come up with some other interface for that separately?
Solution 6: have Sema modify CodeGenOpts.DebugInfo when encountering ErrorAttr
FWICT, CodeGenModule’s DebugInfo is constructed BEFORE Sema detects any ErrorAttr. Not sure this is possible. I.e. by the time Sema sees there’s an ErrorAttr in the sources, CodeGenModule has already made the decision not to emit debug info.
Any other ideas other than the ones I’ve laid out above? Thoughts on any of the above solutions? If there’s no feedback, I will simply pursue landing the original idea from solution 1 (⚙ D141451 [clang] report inlining decisions with -Wattribute-{warning|error}).
cc @AaronBallman @dblaikie for explicit feedback.