Debug Locations for Optimized Code

Hi Kostya,

So… a bunch of folks (not all CC’d - feel free to add anyone who seems relevant, I haven’t gone back over the commits to check who’s made what changes) have been making improvements to optimized code debug information - much of it around making sure sample profiles are accurate.

This mostly entails removing debug locations from instructions that are moved between basic blocks.

The justification for this has usually been backed up by “it’ll make users debug experience better too” because the debugger won’t jump around so much, or give users misleading situations like “why am I stepped into a line of code in the ‘if’ block when the condition was clearly false” (because we raised a load that was done in both blocks up to before the branch and had to pick one of the two locations for it)

This may lead to reduction in quality of debug info for other uses - in particular in ASan and MSan at least.

ASan and MSan care a lot about loads and stores and the location where those came from - less so about whether the load or store was sunk into or out of a loop, etc. Dropping the locations entirely means the user has far less information about which load or store caused the invalid memory access. (The ‘if’ case is still tricky and would be confusing to a *San user as it was to a debug user.)

I don’t know what the right, if any, solution to this is - but I thought I should bring it up in case you or anyone else wanted to puzzle it over & see if the competing needs/desires might need to be considered.

  • Dave

From: "David Blaikie via llvm-dev" <llvm-dev@lists.llvm.org>
To: "llvm-dev" <llvm-dev@lists.llvm.org>, "Kostya Serebryany"
<kcc@google.com>
Sent: Tuesday, December 6, 2016 10:38:50 PM
Subject: [llvm-dev] Debug Locations for Optimized Code

Hi Kostya,

So... a bunch of folks (not all CC'd - feel free to add anyone who
seems relevant, I haven't gone back over the commits to check who's
made what changes) have been making improvements to optimized code
debug information - much of it around making sure sample profiles
are accurate.

This mostly entails removing debug locations from instructions that
are moved between basic blocks.

The justification for this has usually been backed up by "it'll make
users debug experience better too" because the debugger won't jump
around so much, or give users misleading situations like "why am I
stepped into a line of code in the 'if' block when the condition was
clearly false" (because we raised a load that was done in both
blocks up to before the branch and had to pick one of the two
locations for it)

This may lead to reduction in quality of debug info for other uses -
in particular in ASan and MSan at least.

ASan and MSan care a lot about loads and stores and the location
where those came from - less so about whether the load or store was
sunk into or out of a loop, etc. Dropping the locations entirely
means the user has far less information about which load or store
caused the invalid memory access. (The 'if' case is still tricky and
would be confusing to a *San user as it was to a debug user.)

I think that it is always potentially confusing: Having ASAN complain about a memory access from a loop that the code hasn't entered yet or has already completed is also difficult to decipher.

I don't know what the right, if any, solution to this is - but I
thought I should bring it up in case you or anyone else wanted to
puzzle it over & see if the competing needs/desires might need to be
considered.

One thing that I recall being discussed was changing the way that we set the is_stmt flag in the DWARF line-table information. As I understand it, we currently set this flag for the first instruction in any sequence that is on the same line. This is, in part, why the debugger appears to jump around when stepping through code with speculated instructions, etc. If we did not do this for out-of-place instructions, then we might be able to keep for debugging information for tools while still providing a reasonable debugging experience.

-Hal

I don't know what the right, if any, solution to this is - but I
thought I should bring it up in case you or anyone else wanted to
puzzle it over & see if the competing needs/desires might need to be
considered.

One thing that I recall being discussed was changing the way that we
set the is_stmt flag in the DWARF line-table information. As I
understand it, we currently set this flag for the first instruction in
any sequence that is on the same line. This is, in part, why the
debugger appears to jump around when stepping through code with
speculated instructions, etc. If we did not do this for out-of-place
instructions, then we might be able to keep for debugging information
for tools while still providing a reasonable debugging experience.

When we are looking at a situation where an instruction is merely *moved*
from one place to another, retaining the source location and having a
less naïve statement-marking tactic could help the debugging experience
without perturbing other consumers (although one still wonders whether
profiles will get messed up in cases where e.g. a loop invariant gets
hoisted out of a cold loop into a hot predecessor).

When we are looking at a situation where two instructions are *merged* or
*combined* into one, and the original two instructions had different
source locations, that's a separate problem. In that case there is no
single correct source location for the new instruction, and typically
erasing the source location will give a better debugging experience (also
a less misleading profile).

My personal opinion is that having sanitizers *rely* on debug info for
accurate source attribution is just asking for trouble. It happens to
work at –O0 but cannot be considered reliable in the face of optimization.
IMO this is a fundamental design flaw; debug info is best-effort and full
of ambiguities, as shown above. Sanitizers need a more reliable
source-of-truth, i.e. they should encode source info into their own
instrumentation.

--paulr

I don't see how ASan and debuggers are different. It feels like both need
reasonably accurate source location attribution for any instruction. ASan
just happens to care more about loads and stores than interactive stepping
debuggers.

It really doesn't make sense for ASan to invent another mechanism to track
source location information. Any mechanism we build would be so redundant
with debug info that, as an implementation detail, we would find a way to
make them use the same storage when possible. With that in mind, maybe we
should really find a way to mark source locations as "hoisted" or "sunk" so
that we can suppress them from our line tables or do something else clever.

I don’t see how ASan and debuggers are different. It feels like both need reasonably accurate source location attribution for any instruction. ASan just happens to care more about loads and stores than interactive stepping debuggers.

Actually they are pretty different in their requirements.

ASan cares about accurate source location info for specific instructions, the ones that do something ASan cares about. The source attributions for any other instruction is irrelevant to ASan. The source attributions for these instructions must survive optimization.

Debuggers care about useful source location info for sets of instructions, i.e. the instructions related to some particular source statement. If that set is only 90% complete/accurate, instead of 100%, generally that doesn’t adversely affect the user experience. If you step past statement A, and happen to execute one or two instructions from the next statement B before you actually stop, generally that is not important to the user. Debuggers are able to tolerate a moderate amount of slop in the source attributions, because absolute accuracy is not critical to correct operation of the debugger. This is why optimizations can get away with dropping attributions that are difficult to represent accurately.

ASan should be able to encode source info for just the instructions it cares about, e.g. pass an index or other encoded representation to the RT calls. Being actual parameters, they will survive any correct optimization, unlike today’s situation where multiple calls might be merged by an optimization, damaging the correctness of ASan reports. (We’ve see this exact thing happen.) ASan does not need a line table mapping all instructions back to their source; it needs a parameter at each call (more or less). It does need a file table, that’s the main bit of redundancy with debug info that I see happening.

–paulr

Should it turn out that the needs of instrumentation versus debugging are truly irreconcilable (and I hope they aren't) we could add an instrumentation debugger tuning option as a last resort.

-- adrian

I don’t see how ASan and debuggers are different. It feels like both need reasonably accurate source location attribution for any instruction. ASan just happens to care more about loads and stores than interactive stepping debuggers.

Actually they are pretty different in their requirements.

ASan cares about accurate source location info for specific instructions, the ones that do something ASan cares about. The source attributions for any other instruction is irrelevant to ASan. The source attributions for these instructions must survive optimization.

Debuggers care about useful source location info for sets of instructions, i.e. the instructions related to some particular source statement. If that set is only 90% complete/accurate, instead of 100%, generally that doesn’t adversely affect the user experience. If you step past statement A, and happen to execute one or two instructions from the next statement B before you actually stop, generally that is not important to the user. Debuggers are able to tolerate a moderate amount of slop in the source attributions, because absolute accuracy is not critical to correct operation of the debugger. This is why optimizations can get away with dropping attributions that are difficult to represent accurately.

ASan should be able to encode source info for just the instructions it cares about, e.g. pass an index or other encoded representation to the RT calls. Being actual parameters, they will survive any correct optimization, unlike today’s situation where multiple calls might be merged by an optimization, damaging the correctness of ASan reports. (We’ve see this exact thing happen.) ASan does not need a line table mapping all instructions back to their source; it needs a parameter at each call (more or less). It does need a file table, that’s the main bit of redundancy with debug info that I see happening.

From: "Paul Robinson" <paul.robinson@sony.com>
To: "Hal Finkel" <hfinkel@anl.gov>, "David Blaikie" <dblaikie@gmail.com>
Cc: llvm-dev@lists.llvm.org
Sent: Wednesday, December 7, 2016 9:39:16 AM
Subject: RE: [llvm-dev] Debug Locations for Optimized Code

>> I don't know what the right, if any, solution to this is - but I
>> thought I should bring it up in case you or anyone else wanted to
>> puzzle it over & see if the competing needs/desires might need to
>> be
>> considered.
> One thing that I recall being discussed was changing the way that
> we
> set the is_stmt flag in the DWARF line-table information. As I
> understand it, we currently set this flag for the first instruction
> in
> any sequence that is on the same line. This is, in part, why the
> debugger appears to jump around when stepping through code with
> speculated instructions, etc. If we did not do this for
> out-of-place
> instructions, then we might be able to keep for debugging
> information
> for tools while still providing a reasonable debugging experience.

When we are looking at a situation where an instruction is merely
*moved*
from one place to another, retaining the source location and having a
less naïve statement-marking tactic could help the debugging
experience
without perturbing other consumers (although one still wonders
whether
profiles will get messed up in cases where e.g. a loop invariant gets
hoisted out of a cold loop into a hot predecessor).

When we are looking at a situation where two instructions are
*merged* or
*combined* into one, and the original two instructions had different
source locations, that's a separate problem. In that case there is
no
single correct source location for the new instruction, and typically
erasing the source location will give a better debugging experience
(also
a less misleading profile).

Is there a reason why we must only have one location for every instruction? If not, why not merge them and keep them all?

-Hal

I don’t see how ASan and debuggers are different. It feels like both need reasonably accurate source location attribution for any instruction. ASan just happens to care more about loads and stores than interactive stepping debuggers.

Actually they are pretty different in their requirements.

I think they’re closer than they appear below.

ASan cares about accurate source location info for specific instructions, the ones that do something ASan cares about. The source attributions for any other instruction is irrelevant to ASan. The source attributions for these instructions must survive optimization.

Kostya can correct me if I’m wrong - but I don’t believe there’s a requirement that the must survive anymore than debug info locations.

I believe the sanitizers run on similar requirements about impact on optimizations - they probably don’t want to adversely perturb optimizations by adding a more strict location tracking system that was undroppable (maybe I’m wrong here) like intrinsics. I think this is perhaps the critical point - if ASan has the same “don’t mess with optimization” requirement as debug info, and it needs high accuracy, it can be no higher than debug info /can/ be (even if it’s not that accurate now). If that’s the case, then we should endeavor to make debug info (if only for the instructions ASan cares about) as accurate ASan needs, and that benefits all debug info consumers.

Now, if there’s a competing need for what information (as I brought up in this thread) hopefully we can have a conversation about what those competing needs look like - how to address them (if we can reconcile the different needs, or need different tuning mode, etc).

From: “Paul Robinson” <paul.robinson@sony.com>
To: “Hal Finkel” <hfinkel@anl.gov>, “David Blaikie” <dblaikie@gmail.com>
Cc: llvm-dev@lists.llvm.org
Sent: Wednesday, December 7, 2016 9:39:16 AM
Subject: RE: [llvm-dev] Debug Locations for Optimized Code

I don’t know what the right, if any, solution to this is - but I
thought I should bring it up in case you or anyone else wanted to
puzzle it over & see if the competing needs/desires might need to
be
considered.
One thing that I recall being discussed was changing the way that
we
set the is_stmt flag in the DWARF line-table information. As I
understand it, we currently set this flag for the first instruction
in
any sequence that is on the same line. This is, in part, why the
debugger appears to jump around when stepping through code with
speculated instructions, etc. If we did not do this for
out-of-place
instructions, then we might be able to keep for debugging
information
for tools while still providing a reasonable debugging experience.

When we are looking at a situation where an instruction is merely
moved
from one place to another, retaining the source location and having a
less naïve statement-marking tactic could help the debugging
experience
without perturbing other consumers (although one still wonders
whether
profiles will get messed up in cases where e.g. a loop invariant gets
hoisted out of a cold loop into a hot predecessor).

When we are looking at a situation where two instructions are
merged or
combined into one, and the original two instructions had different
source locations, that’s a separate problem. In that case there is
no
single correct source location for the new instruction, and typically
erasing the source location will give a better debugging experience
(also
a less misleading profile).

Is there a reason why we must only have one location for every instruction? If not, why not merge them and keep them all?

Not a requirement - of course we could keep them all with some kind of ordered list and even potentially include a “this is the one we would’ve picked” info (eg: the first one’s the one we would pick today, if we would’ve picked one rather than none) so we could be backwards compatible if desired.

That would be a lot of engineering work to plumb through LLVM the notion of multiple debug locations, I think.

I’m not sure how DWARF (or CodeView) and its consumers currently copes with multiple locations - it’s probably technically possible to describe using the line table format (not sure if it’s intentional/documented for that purpose), but existing consumers might have to be fixed not to trip over it.

It’d certainly be cute/fun/nice to have the extra fidelity (though all extra fidelity also comes at a size cost to the IR and the resulting object/executable files).

Not sure anyone’s in a position to sign up for that work right now - but maybe someone is. (looks like Apple’s making a bit of a push on optimized debug info quality at the moment)

  • David

FYI, if we do end up deciding that asan needs a stronger guarantee than debug info can provide, we have another mechanism in tree which is available for this purpose. Operand bundles can be associated with a callsite today to provide a guaranteed side table of value locations at runtime. We use the "deopt" bundle type for exactly this purpose and it's explicitly part of the design to be stronger than debug info and accept the performance impact that implies while trying to minimize it as much as possible. We might have to extend the notion of operand bundles to other instruction types, but the fundamental mechanism is already in the IR.

Philip

I suspect that you misunderstand where ASan instrumentation is added. Unlike UBSan, which is added by Clang during initial IR generation, ASan instrumentation is added late (at the EP_OptimizerLast extension point).

You are correct, I did not know that.

I don’t see any better way to get the location information at that point than using the existing debug info.

Let us distinguish between information carried around in metadata, and information emitted to the object file.

Seems like it would be completely feasible to flag a DebugLoc instance as “do not emit” and yet retain it in the metadata, rather than erasing it (overwriting it with DebugLoc()). Then the ASan instrumentation can extract the file/line info from it, while we still decline to generate it in the DWARF line table. ASan gets the info it needs, the debugger doesn’t get the information that will distress the user.

Another possibility is to flag the DebugLoc for a moved/combined instruction as “not a statement” and cause this flag to suppress the DWARF “is_stmt” flag. That idea would need a little more baking but feels like it could have potential. Some debuggers might have to learn to pay attention to the flag, but that’s the debugger’s problem.

–paulr

Is there a reason why we must only have one location for every instruction? If not, why not merge them and keep them all?

Not a requirement - of course we could keep them all with some kind of ordered list and even potentially include a “this is the one we would’ve picked” info (eg: the first one’s the one we would pick today, if we would’ve picked one rather than none) so we could be backwards compatible if desired.

That would be a lot of engineering work to plumb through LLVM the notion of multiple debug locations, I think.

I’m not sure how DWARF (or CodeView) and its consumers currently copes with multiple locations - it’s probably technically possible to describe using the line table format (not sure if it’s intentional/documented for that purpose), but existing consumers might have to be fixed not to trip over it.

Technically the DWARF encoding of the line table does allow it, I’ve seen it happen, but not with the intent of describing two real source locations; it was by accident. (And was one of the things that prompted me to submit patch D27492.) I seriously doubt any DWARF consumer takes the trouble to look for it. It’s really not clear how a debugger should respond to seeing two source locations for one instruction.

–paulr

my 2c.

the sanitizers rely on debug info to produce human-readable error messages,
and I agree with Reid that it’s unwise to have a parallel way of encoding the source locations.

Well, we have something like this in the clang coverage already… Right?
(I never particularly liked this design decision).
But since the debug info is known to be unreliable it kind of made sense. Grrr.
And since the coverage instrumentation is applied early (in clang) we can do it.
asan/etc don’t have this luxury.

The sanitizers do not actually rely hard on the correctness of debug info,

but lots of tests in compiler-rt expect the debug info to be sane.

If we break debug info in a way that affects the sanitizers two things may happen:
a) some of the existing *san tests in compiler-rt will start failing. That’s usually easy to fix.
b) all tests will continue working but users will be getting less readable reports – and we will learn about it 6 months from the time of breakage.
That’s less welcome, but I am not sure if we can do something here.

–kcc

In fact, we are already using "parallel" debug info. Frame layout
descriptions encode line numbers for local variable declarations. They
don't include file names to keep object size under control, and we
can't really afford to add more duplicate debug info.

If the sanitizers already rely on the debug info line table, which
already has a file table, why can't frame layout descriptions use
that as well?
--paulr

> From: Evgenii Stepanov [mailto:eugeni.stepanov@gmail.com]
> Sent: Wednesday, December 07, 2016 4:51 PM
> To: Kostya Serebryany
> Cc: Robinson, Paul; llvm-dev@lists.llvm.org
> Subject: Re: [llvm-dev] Debug Locations for Optimized Code
>
> In fact, we are already using "parallel" debug info. Frame layout
> descriptions encode line numbers for local variable declarations. They
> don't include file names to keep object size under control, and we
> can't really afford to add more duplicate debug info.
>
> > my 2c.
> >
> > the sanitizers rely on debug info to produce human-readable error
> messages,

If the sanitizers already rely on the debug info line table, which
already has a file table, why can't frame layout descriptions use
that as well?

the frame layout descriptions contain names of the local variables, which
are not part of line tables.

If the sanitizers already rely on the debug info line table, which
already has a file table, why can’t frame layout descriptions use
that as well?

the frame layout descriptions contain names of the local variables, which are not part of line tables.

Sorry, I was not clear, but I was actually responding to this statement:

Frame layout
descriptions encode line numbers for local variable declarations. They
don’t include file names to keep object size under control, and we
can’t really afford to add more duplicate debug info.

This suggests that omitting file info from frame layout descriptions is a problem. But surely whoever is reading the descriptions can get file info from the line table, so there is no need to duplicate that info—in short, why is this a problem in frame layouts?

I assume you build your own frame layout descriptions because you want to work with –gmlt info instead of requiring full –g info? If you have full –g then the variable info is in the .debug_info section already.

–paulr

If the sanitizers already rely on the debug info line table, which
already has a file table, why can't frame layout descriptions use
that as well?

the frame layout descriptions contain names of the local variables, which
are not part of line tables.

Sorry, I was not clear, but I was actually responding to this statement:

> Frame layout
> descriptions encode line numbers for local variable declarations. They
> don't include file names to keep object size under control, and we
> can't really afford to add more duplicate debug info.

This suggests that omitting file info from frame layout descriptions is a
problem. But surely whoever is reading the descriptions can get file info
from the line table, so there is no need to duplicate that info—in short,
why is this a problem in frame layouts?

I assume you build your own frame layout descriptions because you want to
work with –gmlt info instead of requiring full –g info? If you have full
–g then the variable info is in the .debug_info section already.

Yes. Most of the *big* asan users can't use -g due to enormous code size
and thus fall back to -gmlt.
So, asan should work well with gmlt

Ultimately I agree with Adrian here. I’d also need a compelling argument that not only are these things irreconcilable, but that there’s nothing we can do as far as information or even dwarf changes to ameliorate the problems.

Thanks.

-eric