Switch to ld.bfd tombstone behavior by default

In short: Perhaps we should switch lld to the bfd-style tombstoning behavior for a release or two, letting users opt-in to testing with the new -1/-2 tombstoning in the interim, before switching to the new tombstone by default (while still having the flag to switch back when users find surprise places that can’t handle the new behavior).

In long:
https://reviews.llvm.org/D81784 and follow-on patches modified the behavior of lld with regards to resolving relocations from debug sections to dead code (either comdat deduplicated, or gc-sections use).

A very quick summary of the situation:

Original Behavior:

  • bfd: 1 for debug_ranges(0 would prematurely terminate the list), 0 elsewhere

  • gold/lld: 0+addend everywhere
    Limitations/bugs:

  • bfd/gold/lld

  • doesn’t support 0 as a valid executable address without ambiguities- gold/lld

  • ambiguities with large gc’d functions combined with a .text mapping that starts in relative low addresses

  • premature debug_range termination with zero-length functions (Clang produces these with __builtin_unreachable or non-void return type functions without a return statement)
    New behavior:

  • -2 for DWARFv4 debug_loc, debug_ranges (-1 is a base address specifier there)

  • -1 elsewhere

  • linker flag to customize to other values if desired
    Known issues:

  • lldb’s line table parsing can’t handle -1 well at all (essentially unusable)

  • gdb’s line table parsing ends up with different handling when breaking on gc’d functions (minor functionality issue)

  • chromium/firefox have some tools that were broken: https://bugs.chromium.org/p/chromium/issues/detail?id=1102223#c5

I think there’s enough risk in this work (even given the small number of bugs found so far), given there’s a pretty wide array of debug info consumers out there, that we should change lld’s default to match the long-lived bfd strategy. This would address my original motivation for raising all this (empty functions prematurely terminating the list), while letting users who want to experiment with it, or need it (like Alexey), can opt-in to the -1/-2 behavior.

I’m not sure how to get the word out to DWARF consumers that they should consider this new experimental behavior. Ray’s done a good job evangelizing/discussing this with gdb and lldb at least - and of course having turned it on by default briefly has found some users (like Chromium) that we probably wouldn’t have found no matter how long we left this as an experimental option… so some things are going to break when we switch no matter what.

P.S: Sony’s already been using the -1 technique with their debugger and linker for a while, so they may want to keep this on by default for SCE - but I’m not sure how to do that in-tree. Clang doesn’t know which lld version it’s running, so whether the flag can be specified, I would think? (so it’d be hard to have Clang go “if SCE and LLD, pass the flag to use -1”, I think) - if there is a way to make that decision in the compiler driver+linker, then we’d have a question of “default new behavior except when tuning for LLDB and GDB” or “default bfd behavior except when tuning for SCE”.

Thanks for the write-up!

In short: Perhaps we should switch lld to the bfd-style tombstoning
behavior for a release or two, letting users opt-in to testing with the new
-1/-2 tombstoning in the interim, before switching to the new tombstone by
default (while still having the flag to switch back when users find
surprise places that can't handle the new behavior).

In long:
⚙ D81784 [ELF] Resolve relocations in .debug_* referencing (discarded symbols or ICF folded section symbols) to tombstone values and follow-on patches modified the behavior
of lld with regards to resolving relocations from debug sections to dead
code (either comdat deduplicated, or gc-sections use).

A very quick summary of the situation:

Original Behavior:

  - bfd: 1 for debug_ranges(0 would prematurely terminate the list), 0
  elsewhere
  - gold/lld: 0+addend everywhere

Limitations/bugs:

  - bfd/gold/lld
     - doesn't support 0 as a valid executable address without ambiguities
  - gold/lld
     - ambiguities with large gc'd functions combined with a .text mapping
     that starts in relative low addresses
     - premature debug_range termination with zero-length functions (Clang
     produces these with __builtin_unreachable or non-void return
type functions
     without a return statement)

New behavior:

  - -2 for DWARFv4 debug_loc, debug_ranges (-1 is a base address specifier
  there)
  - -1 elsewhere
  - linker flag to customize to other values if desired

Known issues:

  - lldb's line table parsing can't handle -1 well at all (essentially
  unusable)

Pavel Labath will fix this soon ⚙ D83957 [lldb/DWARF] Don't get confused by line sequences with tombstone values
This is an unhandled address-space wraparound problem.
This pattern is potentially common - and other downstream DWARF
consumers might make similar line table handling mistakes.

  - gdb's line table parsing ends up with different handling when breaking
  on gc'd functions (minor functionality issue)

This is just a behavior difference, not affecting users.
It did break a test if linked with LLD (gdb intrinsically has lots of
failing tests even if built with GCC+GNU ld).

Previous behavior (when an address is zero): a breakpoint on a
--gc-sections discarded function will be redirected to a larger line
number with debug info, even if that line can be an unrelated different
function.
New behavior is that the breakpoint is on a wrapped-around small address.

GDB 9.3 will restore the previous behavior
(sourceware.org Git - binutils-gdb.git/commit)

I think there's enough risk in this work (even given the small number of
bugs found so far), given there's a pretty wide array of debug info
consumers out there, that we should change lld's default to match the
long-lived bfd strategy. This would address my original motivation for
raising all this (empty functions prematurely terminating the list), while
letting users who want to experiment with it, or need it (like Alexey), can
opt-in to the -1/-2 behavior.

I think we can only confidently say that there is enough risk in using
tombstone value -1 in .debug_line, but I'd not say tombstone value -1 in
other .debug_* can cause problems. Hope others can chime in.

With consideration for satefy for the upcoming release/11.x, we can make
two choices:

a) .debug_ranges&.debug_loc => -2, .debug_line => 0, other .debug_* -> -1
b) .debug_ranges&.debug_loc => -2, other .debug_* => 0

Delaying .debug_line => -1 for one or two release sounds good to me.
So LLD 11 or 12 linked binaries can be debugged by LLDB 10. This is a
nice property.

This write-up proposes b), but I'd say a) is likely sufficient. With the
available information, I cannot yet say that a) will have more risk.

  - chromium/firefox have some tools that were broken:
  1102223 - chromium - An open-source project to help move the web forward. - Monorail

This is potentially related to other .debug_* (not .debug_line)
I hope Chromium developers can chime in here:) The breakage was
unfortunate but I don't know how we could have avoided that. IMHO this
is no different from "clang started to emit a new DW_FORM_* and a
postprocessing tool of .debug chokes on that" Whether we want to
suppress that particular DW_FORM_* definitely should depend on how
likely it can cause problems, but we can't yet say we have to hold off
on a feature for a solved (precisely, mitigated) problem.

I'm not sure how to get the word out to DWARF consumers that they should
consider this new experimental behavior. Ray's done a good job
evangelizing/discussing this with gdb and lldb at least - and of course
having turned it on by default briefly has found some users (like Chromium)
that we probably wouldn't have found no matter how long we left this as an
experimental option... so some things are going to break when we switch no
matter what.

Thank you for following up with some GNU folks on their lists!
If folks want to follow along the thread:
https://sourceware.org/pipermail/binutils/2020-June/111376.html

We have informed binutils, elfutils-devel (elfutils has a few debug
tools) and gdb. I don't recall that anyone has thought about problems
with a tombstone value.

P.S: Sony's already been using the -1 technique with their debugger and
linker for a while, so they may want to keep this on by default for SCE -
but I'm not sure how to do that in-tree.

Clang doesn't know which lld
version it's running, so whether the flag can be specified, I would think?
(so it'd be hard to have Clang go "if SCE and LLD, pass the flag to use
-1", I think) - if there is a way to make that decision in the compiler
driver+linker, then we'd have a question of "default new behavior except
when tuning for LLDB and GDB" or "default bfd behavior except when tuning
for SCE".

I've been involed in another thread on SHF_LINK_ORDER (SHF_LINK_ORDER's original semantics make upgrade difficult ).
We may need a way to tell codegen about the used linker.

pcc proposed -mbinutils-version= - This is nice in that some MC
decisions related to -fno-integrated-as can use this option as well.
jyknight proposed -mlinker-version= and syntax like -fuse-ld=bfd:2.34

This may get more complex if the generated object file want to be linked
with more than one linker. This discussion probably deserves its own
thread.

Currently, where we want different defaults in our LLD to upstream LLD, our downstream port overrides the behaviour, if the linker name matches our downstream linker name. The behaviour isn’t at all related to the input file targets etc. This is so that the linker can be run without using the clang driver to provide all the various arguments/keep things simpler in our restricted environment etc. We might change this in the future, but either way, as long as there’s a configurable behaviour we can easily override, then it’s fine.

It would help if we get a heads up if a default behaviour like this is changing, since sometimes it might cause us problems/not get noticed by our testing etc.

Thanks for the write-up!

>In short: Perhaps we should switch lld to the bfd-style tombstoning
>behavior for a release or two, letting users opt-in to testing with the new
>-1/-2 tombstoning in the interim, before switching to the new tombstone by
>default (while still having the flag to switch back when users find
>surprise places that can't handle the new behavior).
>
>In long:
>https://reviews.llvm.org/D81784 and follow-on patches modified the behavior
>of lld with regards to resolving relocations from debug sections to dead
>code (either comdat deduplicated, or gc-sections use).
>
>A very quick summary of the situation:
>
>Original Behavior:
>
> - bfd: 1 for debug_ranges(0 would prematurely terminate the list), 0
> elsewhere
> - gold/lld: 0+addend everywhere
>
>Limitations/bugs:
>
> - bfd/gold/lld
> - doesn't support 0 as a valid executable address without ambiguities
> - gold/lld
> - ambiguities with large gc'd functions combined with a .text mapping
> that starts in relative low addresses
> - premature debug_range termination with zero-length functions (Clang
> produces these with __builtin_unreachable or non-void return
>type functions
> without a return statement)
>
>New behavior:
>
> - -2 for DWARFv4 debug_loc, debug_ranges (-1 is a base address specifier
> there)
> - -1 elsewhere
> - linker flag to customize to other values if desired
>
>Known issues:
>
> - lldb's line table parsing can't handle -1 well at all (essentially
> unusable)

Pavel Labath will fix this soon https://reviews.llvm.org/D83957
This is an unhandled address-space wraparound problem.
This pattern is potentially common - and other downstream DWARF
consumers might make similar line table handling mistakes.

That's the thing - I'm not sure we can really classify them as
"mistakes". I think bfd.ld's tombstoning behavior is about the only
thing we can reasonably say DWARF consumers should /probably/ be
expected to handle - and I'd imagine in many cases they haven't been
written intentionally to handle it, but whatever behavior they have
has accidentally been sufficient for their needs.

> - gdb's line table parsing ends up with different handling when breaking
> on gc'd functions (minor functionality issue)

This is just a behavior difference, not affecting users.
It did break a test if linked with LLD (gdb intrinsically has lots of
failing tests even if built with GCC+GNU ld).

Previous behavior (when an address is zero): a breakpoint on a
--gc-sections discarded function will be redirected to a larger line
number with debug info, even if that line can be an unrelated different
function.
New behavior is that the breakpoint is on a wrapped-around small address.

GDB 9.3 will restore the previous behavior
(https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=a8caed5d7faa639a1e6769eba551d15d8ddd9510)

>
>I think there's enough risk in this work (even given the small number of
>bugs found so far), given there's a pretty wide array of debug info
>consumers out there, that we should change lld's default to match the
>long-lived bfd strategy. This would address my original motivation for
>raising all this (empty functions prematurely terminating the list), while
>letting users who want to experiment with it, or need it (like Alexey), can
>opt-in to the -1/-2 behavior.

I think we can only confidently say that there is enough risk in using
tombstone value -1 in .debug_line, but I'd not say tombstone value -1 in
other .debug_* can cause problems.

Given how many DWARF parsers we've had to cleanup or migrate off in
transitioning to DWARFv5 inside Google, I think that's a fair bit of
evidence the set of parsers isn't as narrow/closed as we'd like and
thus the number of places that might have issues isn't known/easily
exhaustively tested. So I'm not so concerned about the bugs we've
seen, but what that might indicate about the things that we don't know
about and can't test (because we don't know about them). (of course,
the flipside of that is that if we don't know about them, we can't
tell people who own them to go check if they work with this opt-in
feature - so they'll break whenever we turn this on by default - but
perhaps in the interim we can get at least a few big LLD customers to
deploy the feature and flush out some of the issues - happy for Google
to use this internally, hopefully we can encourage Apple folks, Sony
of course already has this semantic so nothing to find there most
likely - Chromium, maybe Firefox, etc)

With consideration for satefy for the upcoming release/11.x, we can make
two choices:

a) .debug_ranges&.debug_loc => -2, .debug_line => 0, other .debug_* -> -1
b) .debug_ranges&.debug_loc => -2, other .debug_* => 0

Delaying .debug_line => -1 for one or two release sounds good to me.
So LLD 11 or 12 linked binaries can be debugged by LLDB 10. This is a
nice property.

This write-up proposes b), but I'd say a) is likely sufficient. With the
available information, I cannot yet say that a) will have more risk.

Risk is about the unknowns - and it still seems like a lot of
unknowns. While there are probably many more consumers that read
.debug_line than other sections, reading debug_info (for instance) is
necessary for inline frames in symbolizing - still probably one of the
most common uses of DWARF I'd guess. (what about stack unwinding using
debug_frame? that'd worry me a bit if anyone got /that/ wrong because
of this change)

> - chromium/firefox have some tools that were broken:
> https://bugs.chromium.org/p/chromium/issues/detail?id=1102223#c5

This is potentially related to other .debug_* (not .debug_line)
I hope Chromium developers can chime in here:) The breakage was
unfortunate but I don't know how we could have avoided that. IMHO this
is no different from "clang started to emit a new DW_FORM_* and a
postprocessing tool of .debug chokes on that" Whether we want to
suppress that particular DW_FORM_* definitely should depend on how
likely it can cause problems, but we can't yet say we have to hold off
on a feature for a solved (precisely, mitigated) problem.

LLVM has no custom forms and I'd be super cautious about adding any
that were on by default because of how bad that breakage would be.

I'm not so concerned about the problems we know - but what they tell
us about the problems that might arise from use cases we don't know.
All the other projects out there that might have custom DWARF parsers
to do some ad-hoc things.

(also, ultimately - given how far-reaching this is, I think we'll want
some tidier flags that are more user-focussed. I'd hope for a flag
that gives BFD-like semantics (though I'd be OK with fixing debug_loc
(using 1 instead of 0) to work the same as debug_ranges while we're
there - a minor divergence from BFD, but highly likely to not cause
problems/fall out naturally from a simple implementation of parsing
that section) - something that's been in-use and tested by basically
everyone for decades. And another flag for the new semantics (-2 for
debug_loc/debug_ranges, -1 everywhere else). Customizable per-section
expression-based support I think is a recipe for platform divergence &
I'd rather it not be available/supported at all, but if you really
want to keep it in, I'd at least rather it not be the feature we
promote to users about how they can test/opt in/out of the behavior
when they're seeing breakages or want to test the future semantics)

Thanks for the write-up!

In short: Perhaps we should switch lld to the bfd-style tombstoning
behavior for a release or two, letting users opt-in to testing with the new
-1/-2 tombstoning in the interim, before switching to the new tombstone by
default (while still having the flag to switch back when users find
surprise places that can’t handle the new behavior).

In long:
https://reviews.llvm.org/D81784 and follow-on patches modified the behavior
of lld with regards to resolving relocations from debug sections to dead
code (either comdat deduplicated, or gc-sections use).

A very quick summary of the situation:

Original Behavior:

  • bfd: 1 for debug_ranges(0 would prematurely terminate the list), 0
    elsewhere
  • gold/lld: 0+addend everywhere

Limitations/bugs:

  • bfd/gold/lld
  • doesn’t support 0 as a valid executable address without ambiguities
  • gold/lld
  • ambiguities with large gc’d functions combined with a .text mapping
    that starts in relative low addresses
  • premature debug_range termination with zero-length functions (Clang
    produces these with __builtin_unreachable or non-void return
    type functions
    without a return statement)

New behavior:

  • -2 for DWARFv4 debug_loc, debug_ranges (-1 is a base address specifier
    there)
  • -1 elsewhere
  • linker flag to customize to other values if desired

Known issues:

  • lldb’s line table parsing can’t handle -1 well at all (essentially
    unusable)

Pavel Labath will fix this soon https://reviews.llvm.org/D83957
This is an unhandled address-space wraparound problem.
This pattern is potentially common - and other downstream DWARF
consumers might make similar line table handling mistakes.

That’s the thing - I’m not sure we can really classify them as
“mistakes”. I think bfd.ld’s tombstoning behavior is about the only
thing we can reasonably say DWARF consumers should /probably/ be
expected to handle - and I’d imagine in many cases they haven’t been
written intentionally to handle it, but whatever behavior they have
has accidentally been sufficient for their needs.

  • gdb’s line table parsing ends up with different handling when breaking
    on gc’d functions (minor functionality issue)

This is just a behavior difference, not affecting users.
It did break a test if linked with LLD (gdb intrinsically has lots of
failing tests even if built with GCC+GNU ld).

Previous behavior (when an address is zero): a breakpoint on a
–gc-sections discarded function will be redirected to a larger line
number with debug info, even if that line can be an unrelated different
function.
New behavior is that the breakpoint is on a wrapped-around small address.

GDB 9.3 will restore the previous behavior
(https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=a8caed5d7faa639a1e6769eba551d15d8ddd9510)

I think there’s enough risk in this work (even given the small number of
bugs found so far), given there’s a pretty wide array of debug info
consumers out there, that we should change lld’s default to match the
long-lived bfd strategy. This would address my original motivation for
raising all this (empty functions prematurely terminating the list), while
letting users who want to experiment with it, or need it (like Alexey), can
opt-in to the -1/-2 behavior.

I think we can only confidently say that there is enough risk in using
tombstone value -1 in .debug_line, but I’d not say tombstone value -1 in
other .debug_* can cause problems.

Given how many DWARF parsers we’ve had to cleanup or migrate off in
transitioning to DWARFv5 inside Google, I think that’s a fair bit of
evidence the set of parsers isn’t as narrow/closed as we’d like and
thus the number of places that might have issues isn’t known/easily
exhaustively tested. So I’m not so concerned about the bugs we’ve
seen, but what that might indicate about the things that we don’t know
about and can’t test (because we don’t know about them). (of course,
the flipside of that is that if we don’t know about them, we can’t
tell people who own them to go check if they work with this opt-in
feature - so they’ll break whenever we turn this on by default - but
perhaps in the interim we can get at least a few big LLD customers to
deploy the feature and flush out some of the issues - happy for Google
to use this internally, hopefully we can encourage Apple folks, Sony
of course already has this semantic so nothing to find there most
likely - Chromium, maybe Firefox, etc)

From the other side — when we already switched new behavior ON as default —
then it is easier to discover all these unknown cases. We have an option restoring
old behavior(which should be fine for all of the current users). Thus, everybody
who needs old behavior is able to continue using it.

That makes transition more understandable. All problems would be explicitly seen. Then,
If we will know about new problems - we could adapt the current solution. Similar to this suggestion:

a) .debug_ranges&.debug_loc => -2, .debug_line => 0, other .debug_* → -1

This would help while dwarf users are preparing their code to the new solution.

Is option restoring old behavior not enough to solve problems caused by new tombstone value?

>
>>
>> Thanks for the write-up!
>>
>> >In short: Perhaps we should switch lld to the bfd-style tombstoning
>> >behavior for a release or two, letting users opt-in to testing with the new
>> >-1/-2 tombstoning in the interim, before switching to the new tombstone by
>> >default (while still having the flag to switch back when users find
>> >surprise places that can't handle the new behavior).
>> >
>> >In long:
>> >https://reviews.llvm.org/D81784 and follow-on patches modified the behavior
>> >of lld with regards to resolving relocations from debug sections to dead
>> >code (either comdat deduplicated, or gc-sections use).
>> >
>> >A very quick summary of the situation:
>> >
>> >Original Behavior:
>> >
>> > - bfd: 1 for debug_ranges(0 would prematurely terminate the list), 0
>> > elsewhere
>> > - gold/lld: 0+addend everywhere
>> >
>> >Limitations/bugs:
>> >
>> > - bfd/gold/lld
>> > - doesn't support 0 as a valid executable address without ambiguities
>> > - gold/lld
>> > - ambiguities with large gc'd functions combined with a .text mapping
>> > that starts in relative low addresses
>> > - premature debug_range termination with zero-length functions (Clang
>> > produces these with __builtin_unreachable or non-void return
>> >type functions
>> > without a return statement)
>> >
>> >New behavior:
>> >
>> > - -2 for DWARFv4 debug_loc, debug_ranges (-1 is a base address specifier
>> > there)
>> > - -1 elsewhere
>> > - linker flag to customize to other values if desired
>> >
>> >Known issues:
>> >
>> > - lldb's line table parsing can't handle -1 well at all (essentially
>> > unusable)
>>
>> Pavel Labath will fix this soon https://reviews.llvm.org/D83957
>> This is an unhandled address-space wraparound problem.
>> This pattern is potentially common - and other downstream DWARF
>> consumers might make similar line table handling mistakes.
>
>That's the thing - I'm not sure we can really classify them as
>"mistakes". I think bfd.ld's tombstoning behavior is about the only
>thing we can reasonably say DWARF consumers should /probably/ be
>expected to handle - and I'd imagine in many cases they haven't been
>written intentionally to handle it, but whatever behavior they have
>has accidentally been sufficient for their needs.

>> > - gdb's line table parsing ends up with different handling when breaking
>> > on gc'd functions (minor functionality issue)
>>
>> This is just a behavior difference, not affecting users.
>> It did break a test if linked with LLD (gdb intrinsically has lots of
>> failing tests even if built with GCC+GNU ld).
>>
>> Previous behavior (when an address is zero): a breakpoint on a
>> --gc-sections discarded function will be redirected to a larger line
>> number with debug info, even if that line can be an unrelated different
>> function.
>> New behavior is that the breakpoint is on a wrapped-around small address.
>>
>> GDB 9.3 will restore the previous behavior
>> (https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=a8caed5d7faa639a1e6769eba551d15d8ddd9510)
>>
>> >
>> >I think there's enough risk in this work (even given the small number of
>> >bugs found so far), given there's a pretty wide array of debug info
>> >consumers out there, that we should change lld's default to match the
>> >long-lived bfd strategy. This would address my original motivation for
>> >raising all this (empty functions prematurely terminating the list), while
>> >letting users who want to experiment with it, or need it (like Alexey), can
>> >opt-in to the -1/-2 behavior.
>>
>> I think we can only confidently say that there is enough risk in using
>> tombstone value -1 in .debug_line, but I'd not say tombstone value -1 in
>> other .debug_* can cause problems.

>Given how many DWARF parsers we've had to cleanup or migrate off in
>transitioning to DWARFv5 inside Google, I think that's a fair bit of
>evidence the set of parsers isn't as narrow/closed as we'd like and
>thus the number of places that might have issues isn't known/easily
>exhaustively tested. So I'm not so concerned about the bugs we've
>seen, but what that might indicate about the things that we don't know
>about and can't test (because we don't know about them). (of course,
>the flipside of that is that if we don't know about them, we can't
>tell people who own them to go check if they work with this opt-in
>feature - so they'll break whenever we turn this on by default - but
>perhaps in the interim we can get at least a few big LLD customers to
>deploy the feature and flush out some of the issues - happy for Google
>to use this internally, hopefully we can encourage Apple folks, Sony
>of course already has this semantic so nothing to find there most
>likely - Chromium, maybe Firefox, etc)

From the other side — when we already switched new behavior ON as default —
then it is easier to discover all these unknown cases. We have an option restoring
old behavior(which should be fine for all of the current users). Thus, everybody
who needs old behavior is able to continue using it.

The cost to those users figuring out that it's a problem and finding
the flags/adding them, etc, is non-zero.
But, yes, as I said - eventually some of that will happen, sooner or
later. But reducing how much of it happens is valuable too.

That makes transition more understandable. All problems would be explicitly seen. Then,
If we will know about new problems - we could adapt the current solution. Similar to this suggestion:

a) .debug_ranges&.debug_loc => -2, .debug_line => 0, other .debug_* -> -1

The issue I have with this is that it's reflective of the known
breakage in some tools that happen to have a quick turnaround/feedback
- and assumes all the other sections are significantly less likely to
have problems with this format - and I don't think we have enough data
to make that estimate.

I think having a clear flag for bfd behavior or new behavior would be
good, keeping it bfd-by-default for now, evangelizing (patch notes,
outreach to other projects and tool developers) the new functionality
& then changing the default shortly after an LLVM release (perhaps 12
- but perhaps longer) to give it time to bake for trunk-following
users (ample time for them to pick it up on their schedule, see what
other tools might break, whether they're isolated enough to expect a
quick turnaround or whether we should get them updated and then wait a
while longer.

This would help while dwarf users are preparing their code to the new solution.

Is option restoring old behavior not enough to solve problems caused by new tombstone value?

I don't personally think it is enough - given some of the known
breakage - I think that points to a fair bit of reasonably possible
unknown breakage. Having a few big (like Apple and Google) users
switch over by default & potentially flush out a few more issues (or
not -but build confidence that there aren't more issues) I think is an
appropriate way to roll out a change like this.

(minor doubt: I wonder how well the bfd tombstoning works in DWARFv5
(rnglists/loclists) or in debug_range/debug_loc that uses base address
specifiers, where the zero-without-addend doesn't have a chance to
make an empty range (because it's not a start/end pair, it's a
start+offset pair - so the offset remains non-zero)... if bfd
tombstoning breaks DWARFv5 parsing (or base address specifiers in
range/loc) in common consumers I might be more inclined to support
enabling new tombstoning by default sooner (though if we could enable
it just for DWARFv5 that might be nice - but not practical, since the
linker doesn't know what DWARF it's linking & could be linking
multiple versions))

- Dave

>
>>
>> Thanks for the write-up!
>>
>> >In short: Perhaps we should switch lld to the bfd-style tombstoning
>> >behavior for a release or two, letting users opt-in to testing with the new
>> >-1/-2 tombstoning in the interim, before switching to the new tombstone by
>> >default (while still having the flag to switch back when users find
>> >surprise places that can't handle the new behavior).
>> >
>> >In long:
>> >⚙ D81784 [ELF] Resolve relocations in .debug_* referencing (discarded symbols or ICF folded section symbols) to tombstone values and follow-on patches modified the behavior
>> >of lld with regards to resolving relocations from debug sections to dead
>> >code (either comdat deduplicated, or gc-sections use).
>> >
>> >A very quick summary of the situation:
>> >
>> >Original Behavior:
>> >
>> > - bfd: 1 for debug_ranges(0 would prematurely terminate the list), 0
>> > elsewhere
>> > - gold/lld: 0+addend everywhere
>> >
>> >Limitations/bugs:
>> >
>> > - bfd/gold/lld
>> > - doesn't support 0 as a valid executable address without ambiguities
>> > - gold/lld
>> > - ambiguities with large gc'd functions combined with a .text mapping
>> > that starts in relative low addresses
>> > - premature debug_range termination with zero-length functions (Clang
>> > produces these with __builtin_unreachable or non-void return
>> >type functions
>> > without a return statement)
>> >
>> >New behavior:
>> >
>> > - -2 for DWARFv4 debug_loc, debug_ranges (-1 is a base address specifier
>> > there)
>> > - -1 elsewhere
>> > - linker flag to customize to other values if desired
>> >
>> >Known issues:
>> >
>> > - lldb's line table parsing can't handle -1 well at all (essentially
>> > unusable)
>>
>> Pavel Labath will fix this soon ⚙ D83957 [lldb/DWARF] Don't get confused by line sequences with tombstone values
>> This is an unhandled address-space wraparound problem.
>> This pattern is potentially common - and other downstream DWARF
>> consumers might make similar line table handling mistakes.
>
>That's the thing - I'm not sure we can really classify them as
>"mistakes". I think bfd.ld's tombstoning behavior is about the only
>thing we can reasonably say DWARF consumers should /probably/ be
>expected to handle - and I'd imagine in many cases they haven't been
>written intentionally to handle it, but whatever behavior they have
>has accidentally been sufficient for their needs.

>> > - gdb's line table parsing ends up with different handling when breaking
>> > on gc'd functions (minor functionality issue)
>>
>> This is just a behavior difference, not affecting users.
>> It did break a test if linked with LLD (gdb intrinsically has lots of
>> failing tests even if built with GCC+GNU ld).
>>
>> Previous behavior (when an address is zero): a breakpoint on a
>> --gc-sections discarded function will be redirected to a larger line
>> number with debug info, even if that line can be an unrelated different
>> function.
>> New behavior is that the breakpoint is on a wrapped-around small address.
>>
>> GDB 9.3 will restore the previous behavior
>> (sourceware.org Git - binutils-gdb.git/commit)
>>
>> >
>> >I think there's enough risk in this work (even given the small number of
>> >bugs found so far), given there's a pretty wide array of debug info
>> >consumers out there, that we should change lld's default to match the
>> >long-lived bfd strategy. This would address my original motivation for
>> >raising all this (empty functions prematurely terminating the list), while
>> >letting users who want to experiment with it, or need it (like Alexey), can
>> >opt-in to the -1/-2 behavior.
>>
>> I think we can only confidently say that there is enough risk in using
>> tombstone value -1 in .debug_line, but I'd not say tombstone value -1 in
>> other .debug_* can cause problems.

>Given how many DWARF parsers we've had to cleanup or migrate off in
>transitioning to DWARFv5 inside Google, I think that's a fair bit of
>evidence the set of parsers isn't as narrow/closed as we'd like and
>thus the number of places that might have issues isn't known/easily
>exhaustively tested. So I'm not so concerned about the bugs we've
>seen, but what that might indicate about the things that we don't know
>about and can't test (because we don't know about them). (of course,
>the flipside of that is that if we don't know about them, we can't
>tell people who own them to go check if they work with this opt-in
>feature - so they'll break whenever we turn this on by default - but
>perhaps in the interim we can get at least a few big LLD customers to
>deploy the feature and flush out some of the issues - happy for Google
>to use this internally, hopefully we can encourage Apple folks, Sony
>of course already has this semantic so nothing to find there most
>likely - Chromium, maybe Firefox, etc)

From the other side — when we already switched new behavior ON as default —
then it is easier to discover all these unknown cases. We have an option restoring
old behavior(which should be fine for all of the current users). Thus, everybody
who needs old behavior is able to continue using it.

The cost to those users figuring out that it's a problem and finding
the flags/adding them, etc, is non-zero.
But, yes, as I said - eventually some of that will happen, sooner or
later. But reducing how much of it happens is valuable too.

That makes transition more understandable. All problems would be explicitly seen. Then,
If we will know about new problems - we could adapt the current solution. Similar to this suggestion:

a) .debug_ranges&.debug_loc => -2, .debug_line => 0, other .debug_* -> -1

The issue I have with this is that it's reflective of the known
breakage in some tools that happen to have a quick turnaround/feedback
- and assumes all the other sections are significantly less likely to
have problems with this format - and I don't think we have enough data
to make that estimate.

I think having a clear flag for bfd behavior or new behavior would be
good, keeping it bfd-by-default for now, evangelizing (patch notes,
outreach to other projects and tool developers) the new functionality
& then changing the default shortly after an LLVM release (perhaps 12
- but perhaps longer) to give it time to bake for trunk-following
users (ample time for them to pick it up on their schedule, see what
other tools might break, whether they're isolated enough to expect a
quick turnaround or whether we should get them updated and then wait a
while longer.

This would help while dwarf users are preparing their code to the new solution.

Is option restoring old behavior not enough to solve problems caused by new tombstone value?

I don't personally think it is enough - given some of the known
breakage - I think that points to a fair bit of reasonably possible
unknown breakage. Having a few big (like Apple and Google) users
switch over by default & potentially flush out a few more issues (or
not -but build confidence that there aren't more issues) I think is an
appropriate way to roll out a change like this.

(minor doubt: I wonder how well the bfd tombstoning works in DWARFv5
(rnglists/loclists) or in debug_range/debug_loc that uses base address
specifiers, where the zero-without-addend doesn't have a chance to
make an empty range (because it's not a start/end pair, it's a
start+offset pair - so the offset remains non-zero)... if bfd
tombstoning breaks DWARFv5 parsing (or base address specifiers in
range/loc) in common consumers I might be more inclined to support
enabling new tombstoning by default sooner (though if we could enable
it just for DWARFv5 that might be nice - but not practical, since the
linker doesn't know what DWARF it's linking & could be linking
multiple versions))

- Dave

The argument that (.debug_line=>-1 is risky so other .debug_*=>-1 are risky) looks a bit
like a slippery slope argument to me.

a) .debug_ranges&.debug_loc => -2, .debug_line => 0, other .debug_* => -1

has the merit that people will be aware that .debug_* sections can generally have -1
as tombstone values. This is a step towards the full transition (.debug_line => -1).
We keep it for one or two release so that LLD 11 or 12 linked binaries can be debugged by LLDB 10.

If we do b), I don't see how the situation will be different in llvm-project
12.x or 13.x and the same unsafety argument could be made against the
transition. I don't see how the half transition is more risky than compiler
generating a new DW_FORM_* which wasn't used before.

With -z dead-reloc-in-nonalloc=.debug_*=0 and a detailed release note, I don't
find a) more problematic than b).

>>
>> >
>> >>
>> >> Thanks for the write-up!
>> >>
>> >> >In short: Perhaps we should switch lld to the bfd-style tombstoning
>> >> >behavior for a release or two, letting users opt-in to testing with the new
>> >> >-1/-2 tombstoning in the interim, before switching to the new tombstone by
>> >> >default (while still having the flag to switch back when users find
>> >> >surprise places that can't handle the new behavior).
>> >> >
>> >> >In long:
>> >> >https://reviews.llvm.org/D81784 and follow-on patches modified the behavior
>> >> >of lld with regards to resolving relocations from debug sections to dead
>> >> >code (either comdat deduplicated, or gc-sections use).
>> >> >
>> >> >A very quick summary of the situation:
>> >> >
>> >> >Original Behavior:
>> >> >
>> >> > - bfd: 1 for debug_ranges(0 would prematurely terminate the list), 0
>> >> > elsewhere
>> >> > - gold/lld: 0+addend everywhere
>> >> >
>> >> >Limitations/bugs:
>> >> >
>> >> > - bfd/gold/lld
>> >> > - doesn't support 0 as a valid executable address without ambiguities
>> >> > - gold/lld
>> >> > - ambiguities with large gc'd functions combined with a .text mapping
>> >> > that starts in relative low addresses
>> >> > - premature debug_range termination with zero-length functions (Clang
>> >> > produces these with __builtin_unreachable or non-void return
>> >> >type functions
>> >> > without a return statement)
>> >> >
>> >> >New behavior:
>> >> >
>> >> > - -2 for DWARFv4 debug_loc, debug_ranges (-1 is a base address specifier
>> >> > there)
>> >> > - -1 elsewhere
>> >> > - linker flag to customize to other values if desired
>> >> >
>> >> >Known issues:
>> >> >
>> >> > - lldb's line table parsing can't handle -1 well at all (essentially
>> >> > unusable)
>> >>
>> >> Pavel Labath will fix this soon https://reviews.llvm.org/D83957
>> >> This is an unhandled address-space wraparound problem.
>> >> This pattern is potentially common - and other downstream DWARF
>> >> consumers might make similar line table handling mistakes.
>> >
>> >That's the thing - I'm not sure we can really classify them as
>> >"mistakes". I think bfd.ld's tombstoning behavior is about the only
>> >thing we can reasonably say DWARF consumers should /probably/ be
>> >expected to handle - and I'd imagine in many cases they haven't been
>> >written intentionally to handle it, but whatever behavior they have
>> >has accidentally been sufficient for their needs.
>>
>> >> > - gdb's line table parsing ends up with different handling when breaking
>> >> > on gc'd functions (minor functionality issue)
>> >>
>> >> This is just a behavior difference, not affecting users.
>> >> It did break a test if linked with LLD (gdb intrinsically has lots of
>> >> failing tests even if built with GCC+GNU ld).
>> >>
>> >> Previous behavior (when an address is zero): a breakpoint on a
>> >> --gc-sections discarded function will be redirected to a larger line
>> >> number with debug info, even if that line can be an unrelated different
>> >> function.
>> >> New behavior is that the breakpoint is on a wrapped-around small address.
>> >>
>> >> GDB 9.3 will restore the previous behavior
>> >> (https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=a8caed5d7faa639a1e6769eba551d15d8ddd9510)
>> >>
>> >> >
>> >> >I think there's enough risk in this work (even given the small number of
>> >> >bugs found so far), given there's a pretty wide array of debug info
>> >> >consumers out there, that we should change lld's default to match the
>> >> >long-lived bfd strategy. This would address my original motivation for
>> >> >raising all this (empty functions prematurely terminating the list), while
>> >> >letting users who want to experiment with it, or need it (like Alexey), can
>> >> >opt-in to the -1/-2 behavior.
>> >>
>> >> I think we can only confidently say that there is enough risk in using
>> >> tombstone value -1 in .debug_line, but I'd not say tombstone value -1 in
>> >> other .debug_* can cause problems.
>>
>> >Given how many DWARF parsers we've had to cleanup or migrate off in
>> >transitioning to DWARFv5 inside Google, I think that's a fair bit of
>> >evidence the set of parsers isn't as narrow/closed as we'd like and
>> >thus the number of places that might have issues isn't known/easily
>> >exhaustively tested. So I'm not so concerned about the bugs we've
>> >seen, but what that might indicate about the things that we don't know
>> >about and can't test (because we don't know about them). (of course,
>> >the flipside of that is that if we don't know about them, we can't
>> >tell people who own them to go check if they work with this opt-in
>> >feature - so they'll break whenever we turn this on by default - but
>> >perhaps in the interim we can get at least a few big LLD customers to
>> >deploy the feature and flush out some of the issues - happy for Google
>> >to use this internally, hopefully we can encourage Apple folks, Sony
>> >of course already has this semantic so nothing to find there most
>> >likely - Chromium, maybe Firefox, etc)
>>
>> From the other side — when we already switched new behavior ON as default —
>> then it is easier to discover all these unknown cases. We have an option restoring
>> old behavior(which should be fine for all of the current users). Thus, everybody
>> who needs old behavior is able to continue using it.
>
>The cost to those users figuring out that it's a problem and finding
>the flags/adding them, etc, is non-zero.
>But, yes, as I said - eventually some of that will happen, sooner or
>later. But reducing how much of it happens is valuable too.
>
>> That makes transition more understandable. All problems would be explicitly seen. Then,
>> If we will know about new problems - we could adapt the current solution. Similar to this suggestion:
>>
>> a) .debug_ranges&.debug_loc => -2, .debug_line => 0, other .debug_* -> -1
>
>The issue I have with this is that it's reflective of the known
>breakage in some tools that happen to have a quick turnaround/feedback
>- and assumes all the other sections are significantly less likely to
>have problems with this format - and I don't think we have enough data
>to make that estimate.
>
>I think having a clear flag for bfd behavior or new behavior would be
>good, keeping it bfd-by-default for now, evangelizing (patch notes,
>outreach to other projects and tool developers) the new functionality
>& then changing the default shortly after an LLVM release (perhaps 12
>- but perhaps longer) to give it time to bake for trunk-following
>users (ample time for them to pick it up on their schedule, see what
>other tools might break, whether they're isolated enough to expect a
>quick turnaround or whether we should get them updated and then wait a
>while longer.
>> This would help while dwarf users are preparing their code to the new solution.
>>
>> Is option restoring old behavior not enough to solve problems caused by new tombstone value?
>
>I don't personally think it is enough - given some of the known
>breakage - I think that points to a fair bit of reasonably possible
>unknown breakage. Having a few big (like Apple and Google) users
>switch over by default & potentially flush out a few more issues (or
>not -but build confidence that there aren't more issues) I think is an
>appropriate way to roll out a change like this.
>
>(minor doubt: I wonder how well the bfd tombstoning works in DWARFv5
>(rnglists/loclists) or in debug_range/debug_loc that uses base address
>specifiers, where the zero-without-addend doesn't have a chance to
>make an empty range (because it's not a start/end pair, it's a
>start+offset pair - so the offset remains non-zero)... if bfd
>tombstoning breaks DWARFv5 parsing (or base address specifiers in
>range/loc) in common consumers I might be more inclined to support
>enabling new tombstoning by default sooner (though if we could enable
>it just for DWARFv5 that might be nice - but not practical, since the
>linker doesn't know what DWARF it's linking & could be linking
>multiple versions))
>
>- Dave

The argument that (.debug_line=>-1 is risky so other .debug_*=>-1 are risky) looks a bit
like a slippery slope argument to me.

I'm not suggesting that one leads to the other, which is a slippery
slope (if you do X, then Y will be inevitable - when X and Y aren't
linked). I'm suggesting it's a good reminder that this change can
break consumers & I think the scope is large enough that we should be
more conservative here.

a) .debug_ranges&.debug_loc => -2, .debug_line => 0, other .debug_* => -1

has the merit that people will be aware that .debug_* sections can generally have -1
as tombstone values. This is a step towards the full transition (.debug_line => -1).
We keep it for one or two release so that LLD 11 or 12 linked binaries can be debugged by LLDB 10.

If we do b), I don't see how the situation will be different in llvm-project
12.x or 13.x and the same unsafety argument could be made against the
transition.

I think opt-in deployment experience (eg: at Google, possibly through
Apple, ) would help provide confidence that at least a reasonable set
of tools are robust to this change.

  I don't see how the half transition is more risky than compiler
generating a new DW_FORM_* which wasn't used before.

LLVM hasn't added any custom forms that I know of, and I'd push back
pretty hard on adding one that was on-by-default. (even one that works
in gdb or lldb current release and is only used when tuning for the
respective debugger would be questionable to me - given existing
installations of debuggers that aren't necessarily the latest release)

>
>>
>> Thanks for the write-up!
>>
>> >In short: Perhaps we should switch lld to the bfd-style tombstoning
>> >behavior for a release or two, letting users opt-in to testing with the new
>> >-1/-2 tombstoning in the interim, before switching to the new tombstone by
>> >default (while still having the flag to switch back when users find
>> >surprise places that can't handle the new behavior).
>> >
>> >In long:
>> >https://reviews.llvm.org/D81784 and follow-on patches modified the behavior
>> >of lld with regards to resolving relocations from debug sections to dead
>> >code (either comdat deduplicated, or gc-sections use).
>> >
>> >A very quick summary of the situation:
>> >
>> >Original Behavior:
>> >
>> > - bfd: 1 for debug_ranges(0 would prematurely terminate the list), 0
>> > elsewhere
>> > - gold/lld: 0+addend everywhere
>> >
>> >Limitations/bugs:
>> >
>> > - bfd/gold/lld
>> > - doesn't support 0 as a valid executable address without ambiguities
>> > - gold/lld
>> > - ambiguities with large gc'd functions combined with a .text mapping
>> > that starts in relative low addresses
>> > - premature debug_range termination with zero-length functions (Clang
>> > produces these with __builtin_unreachable or non-void return
>> >type functions
>> > without a return statement)
>> >
>> >New behavior:
>> >
>> > - -2 for DWARFv4 debug_loc, debug_ranges (-1 is a base address specifier
>> > there)
>> > - -1 elsewhere
>> > - linker flag to customize to other values if desired
>> >
>> >Known issues:
>> >
>> > - lldb's line table parsing can't handle -1 well at all (essentially
>> > unusable)
>>
>> Pavel Labath will fix this soon https://reviews.llvm.org/D83957
>> This is an unhandled address-space wraparound problem.
>> This pattern is potentially common - and other downstream DWARF
>> consumers might make similar line table handling mistakes.
>
>That's the thing - I'm not sure we can really classify them as
>"mistakes". I think bfd.ld's tombstoning behavior is about the only
>thing we can reasonably say DWARF consumers should /probably/ be
>expected to handle - and I'd imagine in many cases they haven't been
>written intentionally to handle it, but whatever behavior they have
>has accidentally been sufficient for their needs.

>> > - gdb's line table parsing ends up with different handling when breaking
>> > on gc'd functions (minor functionality issue)
>>
>> This is just a behavior difference, not affecting users.
>> It did break a test if linked with LLD (gdb intrinsically has lots of
>> failing tests even if built with GCC+GNU ld).
>>
>> Previous behavior (when an address is zero): a breakpoint on a
>> --gc-sections discarded function will be redirected to a larger line
>> number with debug info, even if that line can be an unrelated different
>> function.
>> New behavior is that the breakpoint is on a wrapped-around small address.
>>
>> GDB 9.3 will restore the previous behavior
>> (https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=a8caed5d7faa639a1e6769eba551d15d8ddd9510)
>>
>> >
>> >I think there's enough risk in this work (even given the small number of
>> >bugs found so far), given there's a pretty wide array of debug info
>> >consumers out there, that we should change lld's default to match the
>> >long-lived bfd strategy. This would address my original motivation for
>> >raising all this (empty functions prematurely terminating the list), while
>> >letting users who want to experiment with it, or need it (like Alexey), can
>> >opt-in to the -1/-2 behavior.
>>
>> I think we can only confidently say that there is enough risk in using
>> tombstone value -1 in .debug_line, but I'd not say tombstone value -1 in
>> other .debug_* can cause problems.

>Given how many DWARF parsers we've had to cleanup or migrate off in
>transitioning to DWARFv5 inside Google, I think that's a fair bit of
>evidence the set of parsers isn't as narrow/closed as we'd like and
>thus the number of places that might have issues isn't known/easily
>exhaustively tested. So I'm not so concerned about the bugs we've
>seen, but what that might indicate about the things that we don't know
>about and can't test (because we don't know about them). (of course,
>the flipside of that is that if we don't know about them, we can't
>tell people who own them to go check if they work with this opt-in
>feature - so they'll break whenever we turn this on by default - but
>perhaps in the interim we can get at least a few big LLD customers to
>deploy the feature and flush out some of the issues - happy for Google
>to use this internally, hopefully we can encourage Apple folks, Sony
>of course already has this semantic so nothing to find there most
>likely - Chromium, maybe Firefox, etc)

From the other side — when we already switched new behavior ON as default —
then it is easier to discover all these unknown cases. We have an option restoring
old behavior(which should be fine for all of the current users). Thus, everybody
who needs old behavior is able to continue using it.

The cost to those users figuring out that it's a problem and finding
the flags/adding them, etc, is non-zero.
But, yes, as I said - eventually some of that will happen, sooner or
later. But reducing how much of it happens is valuable too.

I agree. Reducing the impact on customers is important.
Though in this case, as you said, it would be done anyway(sooner or later).
Finally, I do not know what is more important: a) get feedback faster with
the cost of involved users and requiring them to make efforts
to figure out the problem. or b) test with a smaller set of users,
find more problematic cases, make new tombstone value to be default later.
I am kind of preferring a). But, b) is more safe and is OK.

That makes transition more understandable. All problems would be explicitly seen. Then,
If we will know about new problems - we could adapt the current solution. Similar to this >suggestion:

a) .debug_ranges&.debug_loc => -2, .debug_line => 0, other .debug_* -> -1

The issue I have with this is that it's reflective of the known
breakage in some tools that happen to have a quick turnaround/feedback
- and assumes all the other sections are significantly less likely to
have problems with this format - and I don't think we have enough data
to make that estimate.

We could extend that rule when/if new problems become known.
Or, customers could fix their code or use option returning old behavior
and it would probably not be necessary to extend the rule.

I think having a clear flag for bfd behavior or new behavior would be
good, keeping it bfd-by-default for now, evangelizing (patch notes,
outreach to other projects and tool developers) the new functionality
& then changing the default shortly after an LLVM release (perhaps 12
- but perhaps longer) to give it time to bake for trunk-following
users (ample time for them to pick it up on their schedule, see what
other tools might break, whether they're isolated enough to expect a
quick turnaround or whether we should get them updated and then wait a
while longer.

yes, doing clear flag for bfd behavior or new behavior,
and keep it bfd-by-default for now would be OK and is more safe.

This would help while dwarf users are preparing their code to the new solution.

Is option restoring old behavior not enough to solve problems caused by new tombstone >value?

I don't personally think it is enough - given some of the known
breakage - I think that points to a fair bit of reasonably possible
unknown breakage. Having a few big (like Apple and Google) users
switch over by default & potentially flush out a few more issues (or
not -but build confidence that there aren't more issues) I think is an
appropriate way to roll out a change like this.

(minor doubt: I wonder how well the bfd tombstoning works in DWARFv5
(rnglists/loclists) or in debug_range/debug_loc that uses base address
specifiers, where the zero-without-addend doesn't have a chance to
make an empty range (because it's not a start/end pair, it's a
start+offset pair - so the offset remains non-zero)... if bfd
tombstoning breaks DWARFv5 parsing (or base address specifiers in
range/loc) in common consumers I might be more inclined to support
enabling new tombstoning by default sooner (though if we could enable
it just for DWARFv5 that might be nice - but not practical, since the
linker doesn't know what DWARF it's linking & could be linking
multiple versions))

- From the point of view overlapping address ranges,
both bfd-solution(using 1) and old lld solution(using 0) are equal.
for the "start+offset pair" case they could result in overlapping address ranges.

- From the point of view correct parsing of DWARF 5 - it looks like they are equally good.

- From the point of view correct parsing of DWARF 4 - it looks bfd-solution(using 1)
is better than the old lld solution(using 0). When range list entry contains
addresses(start+end) which should be relocated and for the zero-length functions,
bfd-solution would result in range list entry: {1, 1}, while old lld solution
would result in {0, 0}, and match with the end of list entry.
That is the original problem that started this thread.

Though it looks like there still exist case when range list could be terminated earlier:

base address selection entry: {-1, address of deleted code}
following range list entry: {0, 0} << points to the same address as set by base address selection entry and has zero size.

after linker resolved relocations it would look like this, for bfd case:

base address selection entry: {-1, 1}
following range list entry: {0, 0} <<<<<<<<<<<

So there still exists {0,0} entry which could be considered as the end of list entry.

But old lld solution has the same problem, thus it would not be new.

- Additionally, AFAIK gdb has special processing for overlapped address
ranges starting from 0. Using bfd tombstone value could break that processing - I would check it.

Alexey.

>>
>> >
>> >>
>> >> Thanks for the write-up!
>> >>
>> >> >In short: Perhaps we should switch lld to the bfd-style tombstoning
>> >> >behavior for a release or two, letting users opt-in to testing with the new
>> >> >-1/-2 tombstoning in the interim, before switching to the new tombstone by
>> >> >default (while still having the flag to switch back when users find
>> >> >surprise places that can't handle the new behavior).
>> >> >
>> >> >In long:
>> >> >https://reviews.llvm.org/D81784 and follow-on patches modified the behavior
>> >> >of lld with regards to resolving relocations from debug sections to dead
>> >> >code (either comdat deduplicated, or gc-sections use).
>> >> >
>> >> >A very quick summary of the situation:
>> >> >
>> >> >Original Behavior:
>> >> >
>> >> > - bfd: 1 for debug_ranges(0 would prematurely terminate the list), 0
>> >> > elsewhere
>> >> > - gold/lld: 0+addend everywhere
>> >> >
>> >> >Limitations/bugs:
>> >> >
>> >> > - bfd/gold/lld
>> >> > - doesn't support 0 as a valid executable address without ambiguities
>> >> > - gold/lld
>> >> > - ambiguities with large gc'd functions combined with a .text mapping
>> >> > that starts in relative low addresses
>> >> > - premature debug_range termination with zero-length functions (Clang
>> >> > produces these with __builtin_unreachable or non-void return
>> >> >type functions
>> >> > without a return statement)
>> >> >
>> >> >New behavior:
>> >> >
>> >> > - -2 for DWARFv4 debug_loc, debug_ranges (-1 is a base address specifier
>> >> > there)
>> >> > - -1 elsewhere
>> >> > - linker flag to customize to other values if desired
>> >> >
>> >> >Known issues:
>> >> >
>> >> > - lldb's line table parsing can't handle -1 well at all (essentially
>> >> > unusable)
>> >>
>> >> Pavel Labath will fix this soon https://reviews.llvm.org/D83957
>> >> This is an unhandled address-space wraparound problem.
>> >> This pattern is potentially common - and other downstream DWARF
>> >> consumers might make similar line table handling mistakes.
>> >
>> >That's the thing - I'm not sure we can really classify them as
>> >"mistakes". I think bfd.ld's tombstoning behavior is about the only
>> >thing we can reasonably say DWARF consumers should /probably/ be
>> >expected to handle - and I'd imagine in many cases they haven't been
>> >written intentionally to handle it, but whatever behavior they have
>> >has accidentally been sufficient for their needs.
>>
>> >> > - gdb's line table parsing ends up with different handling when breaking
>> >> > on gc'd functions (minor functionality issue)
>> >>
>> >> This is just a behavior difference, not affecting users.
>> >> It did break a test if linked with LLD (gdb intrinsically has lots of
>> >> failing tests even if built with GCC+GNU ld).
>> >>
>> >> Previous behavior (when an address is zero): a breakpoint on a
>> >> --gc-sections discarded function will be redirected to a larger line
>> >> number with debug info, even if that line can be an unrelated different
>> >> function.
>> >> New behavior is that the breakpoint is on a wrapped-around small address.
>> >>
>> >> GDB 9.3 will restore the previous behavior
>> >> (https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=a8caed5d7faa639a1e6769eba551d15d8ddd9510)
>> >>
>> >> >
>> >> >I think there's enough risk in this work (even given the small number of
>> >> >bugs found so far), given there's a pretty wide array of debug info
>> >> >consumers out there, that we should change lld's default to match the
>> >> >long-lived bfd strategy. This would address my original motivation for
>> >> >raising all this (empty functions prematurely terminating the list), while
>> >> >letting users who want to experiment with it, or need it (like Alexey), can
>> >> >opt-in to the -1/-2 behavior.
>> >>
>> >> I think we can only confidently say that there is enough risk in using
>> >> tombstone value -1 in .debug_line, but I'd not say tombstone value -1 in
>> >> other .debug_* can cause problems.
>>
>> >Given how many DWARF parsers we've had to cleanup or migrate off in
>> >transitioning to DWARFv5 inside Google, I think that's a fair bit of
>> >evidence the set of parsers isn't as narrow/closed as we'd like and
>> >thus the number of places that might have issues isn't known/easily
>> >exhaustively tested. So I'm not so concerned about the bugs we've
>> >seen, but what that might indicate about the things that we don't know
>> >about and can't test (because we don't know about them). (of course,
>> >the flipside of that is that if we don't know about them, we can't
>> >tell people who own them to go check if they work with this opt-in
>> >feature - so they'll break whenever we turn this on by default - but
>> >perhaps in the interim we can get at least a few big LLD customers to
>> >deploy the feature and flush out some of the issues - happy for Google
>> >to use this internally, hopefully we can encourage Apple folks, Sony
>> >of course already has this semantic so nothing to find there most
>> >likely - Chromium, maybe Firefox, etc)
>>
>> From the other side — when we already switched new behavior ON as default —
>> then it is easier to discover all these unknown cases. We have an option restoring
>> old behavior(which should be fine for all of the current users). Thus, everybody
>> who needs old behavior is able to continue using it.

>The cost to those users figuring out that it's a problem and finding
>the flags/adding them, etc, is non-zero.
>But, yes, as I said - eventually some of that will happen, sooner or
>later. But reducing how much of it happens is valuable too.

I agree. Reducing the impact on customers is important.
Though in this case, as you said, it would be done anyway(sooner or later).
Finally, I do not know what is more important: a) get feedback faster with
the cost of involved users and requiring them to make efforts
to figure out the problem. or b) test with a smaller set of users,
find more problematic cases, make new tombstone value to be default later.
I am kind of preferring a). But, b) is more safe and is OK.

>> That makes transition more understandable. All problems would be explicitly seen. Then,
>> If we will know about new problems - we could adapt the current solution. Similar to this >suggestion:
>>
>> a) .debug_ranges&.debug_loc => -2, .debug_line => 0, other .debug_* -> -1

>The issue I have with this is that it's reflective of the known
>breakage in some tools that happen to have a quick turnaround/feedback
>- and assumes all the other sections are significantly less likely to
>have problems with this format - and I don't think we have enough data
>to make that estimate.

We could extend that rule when/if new problems become known.
Or, customers could fix their code or use option returning old behavior
and it would probably not be necessary to extend the rule.

>I think having a clear flag for bfd behavior or new behavior would be
>good, keeping it bfd-by-default for now, evangelizing (patch notes,
>outreach to other projects and tool developers) the new functionality
>& then changing the default shortly after an LLVM release (perhaps 12
>- but perhaps longer) to give it time to bake for trunk-following
>users (ample time for them to pick it up on their schedule, see what
>other tools might break, whether they're isolated enough to expect a
>quick turnaround or whether we should get them updated and then wait a
>while longer.

yes, doing clear flag for bfd behavior or new behavior,
and keep it bfd-by-default for now would be OK and is more safe.

>> This would help while dwarf users are preparing their code to the new solution.
>>
>> Is option restoring old behavior not enough to solve problems caused by new tombstone >value?

>I don't personally think it is enough - given some of the known
>breakage - I think that points to a fair bit of reasonably possible
>unknown breakage. Having a few big (like Apple and Google) users
>switch over by default & potentially flush out a few more issues (or
>not -but build confidence that there aren't more issues) I think is an
>appropriate way to roll out a change like this.

>(minor doubt: I wonder how well the bfd tombstoning works in DWARFv5
>(rnglists/loclists) or in debug_range/debug_loc that uses base address
>specifiers, where the zero-without-addend doesn't have a chance to
>make an empty range (because it's not a start/end pair, it's a
>start+offset pair - so the offset remains non-zero)... if bfd
>tombstoning breaks DWARFv5 parsing (or base address specifiers in
>range/loc) in common consumers I might be more inclined to support
>enabling new tombstoning by default sooner (though if we could enable
>it just for DWARFv5 that might be nice - but not practical, since the
>linker doesn't know what DWARF it's linking & could be linking
>multiple versions))

- From the point of view overlapping address ranges,
both bfd-solution(using 1) and old lld solution(using 0) are equal.

gold and lld's solution was a bit more complicated than using zero (&
bfd's solution uses 1 in debug_ranges, but not in other debug
sections). lld and gold use 0+addend in ranges. 0 alone in ranges
would be more problematic than any of the other deployed solutions,
because it would lead to the debug_ranges contributions being
terminated early (0, 0 terminates the list - so debug_ranges for a CU
with one gc'd function mid-way through the range list would drop the
rest of the range lit) - that's why bfd uses 1 for debug_ranges rather
than 0 which it uses elsewhere (though the same fix should,
technically, be applied to debug_loc too for the same reasons as
debug_ranges). the 0+addend approach of gold/lld works OK (except for
the low address overlap) except when there's a zero-length function,
then 0+addend == 0 for the end of the range, andy ou get a 0,0 entry
with the premature range list termination issues.

for the "start+offset pair" case they could result in overlapping address ranges.

Not quite - the way bfd does things, at least in debug_ranges without
base address specifiers (bit hard to isolate for a consumer, really -
though they could special case zero base addresses... sort of) the
empty ranges are (1, 1) - they're actually empty. Whereas lld's
approach ends up with (0+addend, 0+addend) which is usually (0, size)
which has issues with overlap on systems that have a valid low address
range (or if you have sufficiently large functions).

- From the point of view correct parsing of DWARF 5 - it looks like they are equally good.

Yep, DWARFv5 is more complicated - bfd's approach would be marginally
better even for addr+offset encodings in debug_rnglists - since it'd
always produce a 0 for the addr, whereas gold/lld's behavior can in
some cases (eg: "void f1() { } __attribute__((nodebug)) void f2() { }
void f3() { }" with -fno-function-sections - you'll end up with the
range for f3 starting at a non-zero addend - this debug info will be
worse for bfd/ld (if the DWARF for this object was included, but the
whole .text section was discarded (eg: maybe there's a global variable
in this object which is linked in, but -gc-section is still able to
discard the whole .text section as unreferenced) then gold/lld's
approach would make f3 unidentifiable as dead code (because it doesn't
have a tombstone start) but bfd's approach would work (assuming zero
isn't a valid address))

- From the point of view correct parsing of DWARF 4 - it looks bfd-solution(using 1)
is better than the old lld solution(using 0). When range list entry contains
addresses(start+end) which should be relocated and for the zero-length functions,
bfd-solution would result in range list entry: {1, 1}, while old lld solution
would result in {0, 0}, and match with the end of list entry.
That is the original problem that started this thread.

Only comes up for zero-length functions, because gold/lld's approach
was 0+addend, not straight 0.

Though it looks like there still exist case when range list could be terminated earlier:

base address selection entry: {-1, address of deleted code}
following range list entry: {0, 0} << points to the same address as set by base address selection entry and has zero size.

That's a bug in the producer (though a good point - I've probably made
that bug in LLVM) - the linker can't solve that problem, since the
linker can't touch the literal unrelocated 0, 0.

after linker resolved relocations it would look like this, for bfd case:

base address selection entry: {-1, 1}
following range list entry: {0, 0} <<<<<<<<<<<

So there still exists {0,0} entry which could be considered as the end of list entry.

But old lld solution has the same problem, thus it would not be new.

- Additionally, AFAIK gdb has special processing for overlapped address
ranges starting from 0. Using bfd tombstone value could break that processing - I would check it.

Not sure I understand - presumably gdb's special processing is
intended to work with bfd's tombstoning, since it's been the most
common/prolific unix linker, the one intended to work with gdb (they
exist in the same repository) for decades, right?

- Dave

>>
>> >
>> >>
>> >> Thanks for the write-up!
>> >>
>> >> >In short: Perhaps we should switch lld to the bfd-style tombstoning
>> >> >behavior for a release or two, letting users opt-in to testing with the new
>> >> >-1/-2 tombstoning in the interim, before switching to the new tombstone by
>> >> >default (while still having the flag to switch back when users find
>> >> >surprise places that can't handle the new behavior).
>> >> >
>> >> >In long:
>> >> >https://reviews.llvm.org/D81784 and follow-on patches modified the behavior
>> >> >of lld with regards to resolving relocations from debug sections to dead
>> >> >code (either comdat deduplicated, or gc-sections use).
>> >> >
>> >> >A very quick summary of the situation:
>> >> >
>> >> >Original Behavior:
>> >> >
>> >> > - bfd: 1 for debug_ranges(0 would prematurely terminate the list), 0
>> >> > elsewhere
>> >> > - gold/lld: 0+addend everywhere
>> >> >
>> >> >Limitations/bugs:
>> >> >
>> >> > - bfd/gold/lld
>> >> > - doesn't support 0 as a valid executable address without ambiguities
>> >> > - gold/lld
>> >> > - ambiguities with large gc'd functions combined with a .text mapping
>> >> > that starts in relative low addresses
>> >> > - premature debug_range termination with zero-length functions (Clang
>> >> > produces these with __builtin_unreachable or non-void return
>> >> >type functions
>> >> > without a return statement)
>> >> >
>> >> >New behavior:
>> >> >
>> >> > - -2 for DWARFv4 debug_loc, debug_ranges (-1 is a base address specifier
>> >> > there)
>> >> > - -1 elsewhere
>> >> > - linker flag to customize to other values if desired
>> >> >
>> >> >Known issues:
>> >> >
>> >> > - lldb's line table parsing can't handle -1 well at all (essentially
>> >> > unusable)
>> >>
>> >> Pavel Labath will fix this soon https://reviews.llvm.org/D83957
>> >> This is an unhandled address-space wraparound problem.
>> >> This pattern is potentially common - and other downstream DWARF
>> >> consumers might make similar line table handling mistakes.
>> >
>> >That's the thing - I'm not sure we can really classify them as
>> >"mistakes". I think bfd.ld's tombstoning behavior is about the only
>> >thing we can reasonably say DWARF consumers should /probably/ be
>> >expected to handle - and I'd imagine in many cases they haven't been
>> >written intentionally to handle it, but whatever behavior they have
>> >has accidentally been sufficient for their needs.
>>
>> >> > - gdb's line table parsing ends up with different handling when breaking
>> >> > on gc'd functions (minor functionality issue)
>> >>
>> >> This is just a behavior difference, not affecting users.
>> >> It did break a test if linked with LLD (gdb intrinsically has lots of
>> >> failing tests even if built with GCC+GNU ld).
>> >>
>> >> Previous behavior (when an address is zero): a breakpoint on a
>> >> --gc-sections discarded function will be redirected to a larger line
>> >> number with debug info, even if that line can be an unrelated different
>> >> function.
>> >> New behavior is that the breakpoint is on a wrapped-around small address.
>> >>
>> >> GDB 9.3 will restore the previous behavior
>> >> (https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=a8caed5d7faa639a1e6769eba551d15d8ddd9510)
>> >>
>> >> >
>> >> >I think there's enough risk in this work (even given the small number of
>> >> >bugs found so far), given there's a pretty wide array of debug info
>> >> >consumers out there, that we should change lld's default to match the
>> >> >long-lived bfd strategy. This would address my original motivation for
>> >> >raising all this (empty functions prematurely terminating the list), while
>> >> >letting users who want to experiment with it, or need it (like Alexey), can
>> >> >opt-in to the -1/-2 behavior.
>> >>
>> >> I think we can only confidently say that there is enough risk in using
>> >> tombstone value -1 in .debug_line, but I'd not say tombstone value -1 in
>> >> other .debug_* can cause problems.
>>
>> >Given how many DWARF parsers we've had to cleanup or migrate off in
>> >transitioning to DWARFv5 inside Google, I think that's a fair bit of
>> >evidence the set of parsers isn't as narrow/closed as we'd like and
>> >thus the number of places that might have issues isn't known/easily
>> >exhaustively tested. So I'm not so concerned about the bugs we've
>> >seen, but what that might indicate about the things that we don't know
>> >about and can't test (because we don't know about them). (of course,
>> >the flipside of that is that if we don't know about them, we can't
>> >tell people who own them to go check if they work with this opt-in
>> >feature - so they'll break whenever we turn this on by default - but
>> >perhaps in the interim we can get at least a few big LLD customers to
>> >deploy the feature and flush out some of the issues - happy for Google
>> >to use this internally, hopefully we can encourage Apple folks, Sony
>> >of course already has this semantic so nothing to find there most
>> >likely - Chromium, maybe Firefox, etc)
>>
>> From the other side — when we already switched new behavior ON as default —
>> then it is easier to discover all these unknown cases. We have an option restoring
>> old behavior(which should be fine for all of the current users). Thus, everybody
>> who needs old behavior is able to continue using it.

>The cost to those users figuring out that it's a problem and finding
>the flags/adding them, etc, is non-zero.
>But, yes, as I said - eventually some of that will happen, sooner or
>later. But reducing how much of it happens is valuable too.

I agree. Reducing the impact on customers is important.
Though in this case, as you said, it would be done anyway(sooner or later).
Finally, I do not know what is more important: a) get feedback faster with
the cost of involved users and requiring them to make efforts
to figure out the problem. or b) test with a smaller set of users,
find more problematic cases, make new tombstone value to be default later.
I am kind of preferring a). But, b) is more safe and is OK.

>> That makes transition more understandable. All problems would be explicitly seen. Then,
>> If we will know about new problems - we could adapt the current solution. Similar to this >suggestion:
>>
>> a) .debug_ranges&.debug_loc => -2, .debug_line => 0, other .debug_* -> -1

>The issue I have with this is that it's reflective of the known
>breakage in some tools that happen to have a quick turnaround/feedback
>- and assumes all the other sections are significantly less likely to
>have problems with this format - and I don't think we have enough data
>to make that estimate.

We could extend that rule when/if new problems become known.
Or, customers could fix their code or use option returning old behavior
and it would probably not be necessary to extend the rule.

>I think having a clear flag for bfd behavior or new behavior would be
>good, keeping it bfd-by-default for now, evangelizing (patch notes,
>outreach to other projects and tool developers) the new functionality
>& then changing the default shortly after an LLVM release (perhaps 12
>- but perhaps longer) to give it time to bake for trunk-following
>users (ample time for them to pick it up on their schedule, see what
>other tools might break, whether they're isolated enough to expect a
>quick turnaround or whether we should get them updated and then wait a
>while longer.

yes, doing clear flag for bfd behavior or new behavior,
and keep it bfd-by-default for now would be OK and is more safe.

>> This would help while dwarf users are preparing their code to the new solution.
>>
>> Is option restoring old behavior not enough to solve problems caused by new tombstone >value?

>I don't personally think it is enough - given some of the known
>breakage - I think that points to a fair bit of reasonably possible
>unknown breakage. Having a few big (like Apple and Google) users
>switch over by default & potentially flush out a few more issues (or
>not -but build confidence that there aren't more issues) I think is an
>appropriate way to roll out a change like this.

>(minor doubt: I wonder how well the bfd tombstoning works in DWARFv5
>(rnglists/loclists) or in debug_range/debug_loc that uses base address
>specifiers, where the zero-without-addend doesn't have a chance to
>make an empty range (because it's not a start/end pair, it's a
>start+offset pair - so the offset remains non-zero)... if bfd
>tombstoning breaks DWARFv5 parsing (or base address specifiers in
>range/loc) in common consumers I might be more inclined to support
>enabling new tombstoning by default sooner (though if we could enable
>it just for DWARFv5 that might be nice - but not practical, since the
>linker doesn't know what DWARF it's linking & could be linking
>multiple versions))

- From the point of view overlapping address ranges,
both bfd-solution(using 1) and old lld solution(using 0) are equal.

gold and lld's solution was a bit more complicated than using zero (&
bfd's solution uses 1 in debug_ranges, but not in other debug
sections). lld and gold use 0+addend in ranges.

right. I am talking about the same(though assumed that context
instead of detailed explanation). it should be "bfd-solution(using 1
for debug_ranges) and lld/gold solution(using 0+addend)".
Thank you for pointing into that.

0 alone in ranges
would be more problematic than any of the other deployed solutions,
because it would lead to the debug_ranges contributions being
terminated early (0, 0 terminates the list - so debug_ranges for a CU
with one gc'd function mid-way through the range list would drop the
rest of the range lit) - that's why bfd uses 1 for debug_ranges rather
than 0 which it uses elsewhere (though the same fix should,
technically, be applied to debug_loc too for the same reasons as
debug_ranges). the 0+addend approach of gold/lld works OK (except for
the low address overlap) except when there's a zero-length function,
then 0+addend == 0 for the end of the range, andy ou get a 0,0 entry
with the premature range list termination issues.

for the "start+offset pair" case they could result in overlapping address ranges.

Not quite - the way bfd does things, at least in debug_ranges without
base address specifiers (bit hard to isolate for a consumer, really -
though they could special case zero base addresses... sort of) the
empty ranges are (1, 1) - they're actually empty. Whereas lld's
approach ends up with (0+addend, 0+addend) which is usually (0, size)
which has issues with overlap on systems that have a valid low address
range (or if you have sufficiently large functions).

There is a case when size of address range is set not by relocated value but by constant value:

DWARF5:
DW_RLE_startx_length
DW_RLE_start_length

{address of deleted code, length}
{address of existing code, length}

DWARF4:
base address selection entry: {-1, address of deleted code}
following range list entry: {0, length}
base address selection entry: {-1, address of existing code}
following range list entry: {0, length}

In these cases linker do not see the length of address range
and could not change it(using 1, or 0, or 0+addend) to be zero length
address range. Thus address ranges could overlap.

So in both DWARF5 and DWARF4 there are cases when address ranges
could be overlapped and it could not be solved without new meaning for tombstone value.

- From the point of view correct parsing of DWARF 5 - it looks like they are equally good.

Yep, DWARFv5 is more complicated - bfd's approach would be marginally
better even for addr+offset encodings in debug_rnglists - since it'd
always produce a 0 for the addr, whereas gold/lld's behavior can in
some cases (eg: "void f1() { } __attribute__((nodebug)) void f2() { }
void f3() { }" with -fno-function-sections - you'll end up with the
range for f3 starting at a non-zero addend - this debug info will be
worse for bfd/ld (if the DWARF for this object was included, but the
whole .text section was discarded (eg: maybe there's a global variable
in this object which is linked in, but -gc-section is still able to
discard the whole .text section as unreferenced) then gold/lld's
approach would make f3 unidentifiable as dead code (because it doesn't
have a tombstone start) but bfd's approach would work (assuming zero
isn't a valid address))

I see. thanks.

- From the point of view correct parsing of DWARF 4 - it looks bfd-solution(using 1)
is better than the old lld solution(using 0). When range list entry contains
addresses(start+end) which should be relocated and for the zero-length functions,
bfd-solution would result in range list entry: {1, 1}, while old lld solution
would result in {0, 0}, and match with the end of list entry.
That is the original problem that started this thread.

Only comes up for zero-length functions, because gold/lld's approach
was 0+addend, not straight 0.

right. for zero-length functions.

Though it looks like there still exist case when range list could be terminated earlier:

base address selection entry: {-1, address of deleted code}
following range list entry: {0, 0} << points to the same address as set by base address selection entry and has zero size.

That's a bug in the producer (though a good point - I've probably made
that bug in LLVM) - the linker can't solve that problem, since the
linker can't touch the literal unrelocated 0, 0.

after linker resolved relocations it would look like this, for bfd case:

base address selection entry: {-1, 1}
following range list entry: {0, 0} <<<<<<<<<<<

So there still exists {0,0} entry which could be considered as the end of list entry.

But old lld solution has the same problem, thus it would not be new.

- Additionally, AFAIK gdb has special processing for overlapped address
ranges starting from 0. Using bfd tombstone value could break that processing - I would check it.

Not sure I understand - presumably gdb's special processing is
intended to work with bfd's tombstoning, since it's been the most
common/prolific unix linker, the one intended to work with gdb (they
exist in the same repository) for decades, right?

I think I misunderstood this: "I wonder how well the bfd tombstoning works in DWARFv5
(rnglists/loclists)". I read it as we would like to use bfd tombstoning(1 for ranges)
for rnglists/loclists also.

So, right. bfd's tombstoning works correctly with gdb until 1 is not used
for rnglists/loclists as tombstone value.

Alexey.

>>
>> >>
>> >> >
>> >> >>
>> >> >> Thanks for the write-up!
>> >> >>
>> >> >> >In short: Perhaps we should switch lld to the bfd-style tombstoning
>> >> >> >behavior for a release or two, letting users opt-in to testing with the new
>> >> >> >-1/-2 tombstoning in the interim, before switching to the new tombstone by
>> >> >> >default (while still having the flag to switch back when users find
>> >> >> >surprise places that can't handle the new behavior).
>> >> >> >
>> >> >> >In long:
>> >> >> >https://reviews.llvm.org/D81784 and follow-on patches modified the behavior
>> >> >> >of lld with regards to resolving relocations from debug sections to dead
>> >> >> >code (either comdat deduplicated, or gc-sections use).
>> >> >> >
>> >> >> >A very quick summary of the situation:
>> >> >> >
>> >> >> >Original Behavior:
>> >> >> >
>> >> >> > - bfd: 1 for debug_ranges(0 would prematurely terminate the list), 0
>> >> >> > elsewhere
>> >> >> > - gold/lld: 0+addend everywhere
>> >> >> >
>> >> >> >Limitations/bugs:
>> >> >> >
>> >> >> > - bfd/gold/lld
>> >> >> > - doesn't support 0 as a valid executable address without ambiguities
>> >> >> > - gold/lld
>> >> >> > - ambiguities with large gc'd functions combined with a .text mapping
>> >> >> > that starts in relative low addresses
>> >> >> > - premature debug_range termination with zero-length functions (Clang
>> >> >> > produces these with __builtin_unreachable or non-void return
>> >> >> >type functions
>> >> >> > without a return statement)
>> >> >> >
>> >> >> >New behavior:
>> >> >> >
>> >> >> > - -2 for DWARFv4 debug_loc, debug_ranges (-1 is a base address specifier
>> >> >> > there)
>> >> >> > - -1 elsewhere
>> >> >> > - linker flag to customize to other values if desired
>> >> >> >
>> >> >> >Known issues:
>> >> >> >
>> >> >> > - lldb's line table parsing can't handle -1 well at all (essentially
>> >> >> > unusable)
>> >> >>
>> >> >> Pavel Labath will fix this soon https://reviews.llvm.org/D83957
>> >> >> This is an unhandled address-space wraparound problem.
>> >> >> This pattern is potentially common - and other downstream DWARF
>> >> >> consumers might make similar line table handling mistakes.
>> >> >
>> >> >That's the thing - I'm not sure we can really classify them as
>> >> >"mistakes". I think bfd.ld's tombstoning behavior is about the only
>> >> >thing we can reasonably say DWARF consumers should /probably/ be
>> >> >expected to handle - and I'd imagine in many cases they haven't been
>> >> >written intentionally to handle it, but whatever behavior they have
>> >> >has accidentally been sufficient for their needs.
>> >>
>> >> >> > - gdb's line table parsing ends up with different handling when breaking
>> >> >> > on gc'd functions (minor functionality issue)
>> >> >>
>> >> >> This is just a behavior difference, not affecting users.
>> >> >> It did break a test if linked with LLD (gdb intrinsically has lots of
>> >> >> failing tests even if built with GCC+GNU ld).
>> >> >>
>> >> >> Previous behavior (when an address is zero): a breakpoint on a
>> >> >> --gc-sections discarded function will be redirected to a larger line
>> >> >> number with debug info, even if that line can be an unrelated different
>> >> >> function.
>> >> >> New behavior is that the breakpoint is on a wrapped-around small address.
>> >> >>
>> >> >> GDB 9.3 will restore the previous behavior
>> >> >> (https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=a8caed5d7faa639a1e6769eba551d15d8ddd9510)
>> >> >>
>> >> >> >
>> >> >> >I think there's enough risk in this work (even given the small number of
>> >> >> >bugs found so far), given there's a pretty wide array of debug info
>> >> >> >consumers out there, that we should change lld's default to match the
>> >> >> >long-lived bfd strategy. This would address my original motivation for
>> >> >> >raising all this (empty functions prematurely terminating the list), while
>> >> >> >letting users who want to experiment with it, or need it (like Alexey), can
>> >> >> >opt-in to the -1/-2 behavior.
>> >> >>
>> >> >> I think we can only confidently say that there is enough risk in using
>> >> >> tombstone value -1 in .debug_line, but I'd not say tombstone value -1 in
>> >> >> other .debug_* can cause problems.
>> >>
>> >> >Given how many DWARF parsers we've had to cleanup or migrate off in
>> >> >transitioning to DWARFv5 inside Google, I think that's a fair bit of
>> >> >evidence the set of parsers isn't as narrow/closed as we'd like and
>> >> >thus the number of places that might have issues isn't known/easily
>> >> >exhaustively tested. So I'm not so concerned about the bugs we've
>> >> >seen, but what that might indicate about the things that we don't know
>> >> >about and can't test (because we don't know about them). (of course,
>> >> >the flipside of that is that if we don't know about them, we can't
>> >> >tell people who own them to go check if they work with this opt-in
>> >> >feature - so they'll break whenever we turn this on by default - but
>> >> >perhaps in the interim we can get at least a few big LLD customers to
>> >> >deploy the feature and flush out some of the issues - happy for Google
>> >> >to use this internally, hopefully we can encourage Apple folks, Sony
>> >> >of course already has this semantic so nothing to find there most
>> >> >likely - Chromium, maybe Firefox, etc)
>> >>
>> >> From the other side — when we already switched new behavior ON as default —
>> >> then it is easier to discover all these unknown cases. We have an option restoring
>> >> old behavior(which should be fine for all of the current users). Thus, everybody
>> >> who needs old behavior is able to continue using it.
>>
>> >The cost to those users figuring out that it's a problem and finding
>> >the flags/adding them, etc, is non-zero.
>> >But, yes, as I said - eventually some of that will happen, sooner or
>> >later. But reducing how much of it happens is valuable too.
>>
>> I agree. Reducing the impact on customers is important.
>> Though in this case, as you said, it would be done anyway(sooner or later).
>> Finally, I do not know what is more important: a) get feedback faster with
>> the cost of involved users and requiring them to make efforts
>> to figure out the problem. or b) test with a smaller set of users,
>> find more problematic cases, make new tombstone value to be default later.
>> I am kind of preferring a). But, b) is more safe and is OK.
>>
>>
>> >> That makes transition more understandable. All problems would be explicitly seen. Then,
>> >> If we will know about new problems - we could adapt the current solution. Similar to this >suggestion:
>> >>
>> >> a) .debug_ranges&.debug_loc => -2, .debug_line => 0, other .debug_* -> -1
>>
>> >The issue I have with this is that it's reflective of the known
>> >breakage in some tools that happen to have a quick turnaround/feedback
>> >- and assumes all the other sections are significantly less likely to
>> >have problems with this format - and I don't think we have enough data
>> >to make that estimate.
>>
>> We could extend that rule when/if new problems become known.
>> Or, customers could fix their code or use option returning old behavior
>> and it would probably not be necessary to extend the rule.
>>
>> >I think having a clear flag for bfd behavior or new behavior would be
>> >good, keeping it bfd-by-default for now, evangelizing (patch notes,
>> >outreach to other projects and tool developers) the new functionality
>> >& then changing the default shortly after an LLVM release (perhaps 12
>> >- but perhaps longer) to give it time to bake for trunk-following
>> >users (ample time for them to pick it up on their schedule, see what
>> >other tools might break, whether they're isolated enough to expect a
>> >quick turnaround or whether we should get them updated and then wait a
>> >while longer.
>>
>> yes, doing clear flag for bfd behavior or new behavior,
>> and keep it bfd-by-default for now would be OK and is more safe.
>>
>>
>> >> This would help while dwarf users are preparing their code to the new solution.
>> >>
>> >> Is option restoring old behavior not enough to solve problems caused by new tombstone >value?
>>
>> >I don't personally think it is enough - given some of the known
>> >breakage - I think that points to a fair bit of reasonably possible
>> >unknown breakage. Having a few big (like Apple and Google) users
>> >switch over by default & potentially flush out a few more issues (or
>> >not -but build confidence that there aren't more issues) I think is an
>> >appropriate way to roll out a change like this.
>>
>> >(minor doubt: I wonder how well the bfd tombstoning works in DWARFv5
>> >(rnglists/loclists) or in debug_range/debug_loc that uses base address
>> >specifiers, where the zero-without-addend doesn't have a chance to
>> >make an empty range (because it's not a start/end pair, it's a
>> >start+offset pair - so the offset remains non-zero)... if bfd
>> >tombstoning breaks DWARFv5 parsing (or base address specifiers in
>> >range/loc) in common consumers I might be more inclined to support
>> >enabling new tombstoning by default sooner (though if we could enable
>> >it just for DWARFv5 that might be nice - but not practical, since the
>> >linker doesn't know what DWARF it's linking & could be linking
>> >multiple versions))
>>
>> - From the point of view overlapping address ranges,
>> both bfd-solution(using 1) and old lld solution(using 0) are equal.

>gold and lld's solution was a bit more complicated than using zero (&
>bfd's solution uses 1 in debug_ranges, but not in other debug
>sections). lld and gold use 0+addend in ranges.

right. I am talking about the same(though assumed that context
instead of detailed explanation). it should be "bfd-solution(using 1
for debug_ranges) and lld/gold solution(using 0+addend)".
Thank you for pointing into that.

>0 alone in ranges
>would be more problematic than any of the other deployed solutions,
>because it would lead to the debug_ranges contributions being
>terminated early (0, 0 terminates the list - so debug_ranges for a CU
>with one gc'd function mid-way through the range list would drop the
>rest of the range lit) - that's why bfd uses 1 for debug_ranges rather
>than 0 which it uses elsewhere (though the same fix should,
>technically, be applied to debug_loc too for the same reasons as
>debug_ranges). the 0+addend approach of gold/lld works OK (except for
>the low address overlap) except when there's a zero-length function,
>then 0+addend == 0 for the end of the range, andy ou get a 0,0 entry
>with the premature range list termination issues.
>
>> for the "start+offset pair" case they could result in overlapping address ranges.

>Not quite - the way bfd does things, at least in debug_ranges without
>base address specifiers (bit hard to isolate for a consumer, really -
>though they could special case zero base addresses... sort of) the
>empty ranges are (1, 1) - they're actually empty. Whereas lld's
>approach ends up with (0+addend, 0+addend) which is usually (0, size)
>which has issues with overlap on systems that have a valid low address
>range (or if you have sufficiently large functions).
>

There is a case when size of address range is set not by relocated value but by constant value:

DWARF5:
DW_RLE_startx_length
DW_RLE_start_length

{address of deleted code, length}
{address of existing code, length}

DWARF4:
base address selection entry: {-1, address of deleted code}
following range list entry: {0, length}
base address selection entry: {-1, address of existing code}
following range list entry: {0, length}

In these cases linker do not see the length of address range
and could not change it(using 1, or 0, or 0+addend) to be zero length
address range. Thus address ranges could overlap.

So in both DWARF5 and DWARF4 there are cases when address ranges
could be overlapped and it could not be solved without new meaning for tombstone value.

>> - From the point of view correct parsing of DWARF 5 - it looks like they are equally good.
>
>Yep, DWARFv5 is more complicated - bfd's approach would be marginally
>better even for addr+offset encodings in debug_rnglists - since it'd
>always produce a 0 for the addr, whereas gold/lld's behavior can in
>some cases (eg: "void f1() { } __attribute__((nodebug)) void f2() { }
>void f3() { }" with -fno-function-sections - you'll end up with the
>range for f3 starting at a non-zero addend - this debug info will be
>worse for bfd/ld (if the DWARF for this object was included, but the
>whole .text section was discarded (eg: maybe there's a global variable
>in this object which is linked in, but -gc-section is still able to
>discard the whole .text section as unreferenced) then gold/lld's
>approach would make f3 unidentifiable as dead code (because it doesn't
>have a tombstone start) but bfd's approach would work (assuming zero
>isn't a valid address))

I see. thanks.

>
>> - From the point of view correct parsing of DWARF 4 - it looks bfd-solution(using 1)
>> is better than the old lld solution(using 0). When range list entry contains
>> addresses(start+end) which should be relocated and for the zero-length functions,
>> bfd-solution would result in range list entry: {1, 1}, while old lld solution
>> would result in {0, 0}, and match with the end of list entry.
>> That is the original problem that started this thread.
>
>Only comes up for zero-length functions, because gold/lld's approach
>was 0+addend, not straight 0.
>

right. for zero-length functions.

>> Though it looks like there still exist case when range list could be terminated earlier:
>>
>> base address selection entry: {-1, address of deleted code}
>> following range list entry: {0, 0} << points to the same address as set by base address selection entry and has zero size.
>
>That's a bug in the producer (though a good point - I've probably made
>that bug in LLVM) - the linker can't solve that problem, since the
>linker can't touch the literal unrelocated 0, 0.
>
>> after linker resolved relocations it would look like this, for bfd case:
>>
>> base address selection entry: {-1, 1}
>> following range list entry: {0, 0} <<<<<<<<<<<
>>
>> So there still exists {0,0} entry which could be considered as the end of list entry.
>>
>> But old lld solution has the same problem, thus it would not be new.
>>
>> - Additionally, AFAIK gdb has special processing for overlapped address
>> ranges starting from 0. Using bfd tombstone value could break that processing - I would check it.
>
>Not sure I understand - presumably gdb's special processing is
>intended to work with bfd's tombstoning, since it's been the most
>common/prolific unix linker, the one intended to work with gdb (they
>exist in the same repository) for decades, right?

I think I misunderstood this: "I wonder how well the bfd tombstoning works in DWARFv5
(rnglists/loclists)". I read it as we would like to use bfd tombstoning(1 for ranges)
for rnglists/loclists also.

So, right. bfd's tombstoning works correctly with gdb until 1 is not used
for rnglists/loclists as tombstone value.

Not quite parsing this, but I think we're on the same page - that
bfd's tombstoning "1 for debug_ranges/debug_loc, 0 (not 0+addend, but
absolute 0) for everything else (including debug_loclists and
debug_rnglists)" is probably the most likely to work for gdb, since
it's been deployed for a long time.

- Dave

PS: Fair point about base address specifiers being able to produce 0,0
entries - wouldn't mind fixing that in LLVM if I knew of an easy way
to test at compile-time whether the difference between two labels was
zero, then skip that entry in the lists entirely. Would save space and
address the original issue I had with debug_ranges terminating early.
(should've thought about that much earlier than my more alarmist "oh
deer, we need to fundamentally change how linkers resolve relocations
because everything's been broken and we just didn't realize it" -
fixing the compiler not to produce zero-length ranges would've been
less risky & probably still worth doing - though addressing the
broader issue to help with your situation of 0 as a valid address I
think is still good too)

Hi All,

In general I think we should adopt Dave’s plan here. The number of consumers that can (and have) been caught off guard by this change is just too high.

At the very least I think we should move this to opt in to the new tombstoning behavior by default and at most migrate to bfd’s behavior for both the current release and in the current tree. If we want to make this sort of change in the future by default I think we’re going to need to provide release notes about this and do aggressive outreach towards the consumers we do know before making the change. If anyone wants to drive that effort I’ll happily provide any help or assistance in getting you in touch with at least the consumers I know about.

This change and the need for it is also probably worth a quick message to dwarf-discuss@dwarfstd.org as well. Most of the major consumers and producers are on that list and it’s probably one of the easiest ways to get this change out there.

Any strong objections to this path in the meantime?

Thanks!

-eric

Sounds good to me from a release perspective.

>> Though it looks like there still exist case when range list could be terminated earlier:
>>
>> base address selection entry: {-1, address of deleted code}
>> following range list entry: {0, 0} << points to the same address as set by base address

selection entry and has zero size.

>
>That's a bug in the producer (though a good point - I've probably made
>that bug in LLVM) - the linker can't solve that problem, since the
>linker can't touch the literal unrelocated 0, 0.
>
>> after linker resolved relocations it would look like this, for bfd case:
>>
>> base address selection entry: {-1, 1}
>> following range list entry: {0, 0} <<<<<<<<<<<
>>
>> So there still exists {0,0} entry which could be considered as the end of list entry.
>>
>> But old lld solution has the same problem, thus it would not be new.
>>
>> - Additionally, AFAIK gdb has special processing for overlapped address
>> ranges starting from 0. Using bfd tombstone value could break that processing - I would check it.
>
>Not sure I understand - presumably gdb's special processing is
>intended to work with bfd's tombstoning, since it's been the most
>common/prolific unix linker, the one intended to work with gdb (they
>exist in the same repository) for decades, right?

I think I misunderstood this: "I wonder how well the bfd tombstoning works in DWARFv5
(rnglists/loclists)". I read it as we would like to use bfd tombstoning(1 for ranges)
for rnglists/loclists also.

So, right. bfd's tombstoning works correctly with gdb until 1 is not used
for rnglists/loclists as tombstone value.

Not quite parsing this, but I think we're on the same page - that
bfd's tombstoning "1 for debug_ranges/debug_loc, 0 (not 0+addend, but
absolute 0) for everything else (including debug_loclists and
debug_rnglists)" is probably the most likely to work for gdb, since
it's been deployed for a long time.

right.

Speaking of our needs - it would be enough to have a command line option
to switch -1/-2 tombstoning on, while bfd tombstoning would be default.

- Dave

PS: Fair point about base address specifiers being able to produce 0,0
entries - wouldn't mind fixing that in LLVM if I knew of an easy way
to test at compile-time whether the difference between two labels was
zero, then skip that entry in the lists entirely. Would save space and
address the original issue I had with debug_ranges terminating early.
(should've thought about that much earlier than my more alarmist "oh
deer, we need to fundamentally change how linkers resolve relocations
because everything's been broken and we just didn't realize it" -
fixing the compiler not to produce zero-length ranges would've been
less risky & probably still worth doing - though addressing the
broader issue to help with your situation of 0 as a valid address I
think is still good too)

agreed. It would be the most proper way to avoid zero-length ranges
at generation time if possible.

side note - another possibility is the optimizing tool like dsymutil which could remove
zero length address ranges. It would not help to correct parsing but
would save space.

Alexey.

Sounds good to me from a release perspective.

I think we need more input from the triage of
https://chromium-review.googlesource.com/c/chromium/src/+/2291352
whether it is just .debug_line or .debug_*

>Sounds good to me from a release perspective.

I think we need more input from the triage of
https://chromium-review.googlesource.com/c/chromium/src/+/2291352
whether it is just .debug_line or .debug_*

The issue we hit in Chromium is tracked here:
https://bugs.chromium.org/p/chromium/issues/detail?id=1105559 Doesn't
look like it has any more info at the moment.

a) .debug_ranges&.debug_loc => -2, .debug_line => 0, other .debug_* → -1
b) .debug_ranges&.debug_loc => -2, other .debug_* => 0

I am still of the opinion that we should just do a), not b).

Hi Ray :slight_smile:

While I understand the desire to say “we’ve updated everything we’ve found” the problem is that we don’t know that we’ve found even much of the uses or can guarantee that people can upgrade, say their gdb or breakpad, as fast as their compiler. Even worse it’s a change in behavior without much notice at all or even a particularly friendly way to recognize what they should do to fix it.

I still think that we do bfd locs with a decent option to change for at least the current release and sources and then, once we’re a little more certain we have everything that might want to parse dwarf (say by working with dwarf-discuss), we can change the default.

-eric

I still think that we do bfd locs with a decent option to change for at least the current release and sources and then, once we’re a little more certain we have everything that might want to parse dwarf (say by working with dwarf-discuss), we can change the default.

Given what’s been found, I think Eric/Dave are correct, use bfd behavior by default with an option to do the new thing. The option can be coded to let Sony (SCE debugger tuning) have it on by default.

Bringing it up on dwarf-discuss seems like a good idea too, you can refer to http://dwarfstd.org/ShowIssue.php?issue=200609.1 as part of that.

–paulr