>> > > > >
>> > > > >> As has been mentioned elsewhere, Sony generally fixes up references
>> > > > from
>> > > > >> debug info to stripped functions (of any length) using -1, because
>> > > > that’s a
>> > > > >> less-likely-to-be-real address than 0x0 or 0x1. (0x0 is a typical
>> > base
>> > > > >> address for shared libraries, I’d think using it has the potential
>> > to
>> > > > >> mislead various consumers.) For .debug_ranges we use -2, because
>> > both
>> > > > a
>> > > > >> 0/0 pair and a -1/-1 pair have a reserved meaning in that section.
>> > > > >>
>> > > > >
>> > > > >Any harm in using -2 everywhere, for consistency?
>> > > >
>> > > > When resolving a relocation, in certain cases we have to give an
>> > undefined
>> > > > symbol a value.
>> > > > This can happen with:
>> > > >
>> > > > * an undefined weak symbol
>> > > > * an undefined global symbol in --noinhibit-exec mode (a buggy --gc-
>> > > > sections implementation can trigger this as well)
>> > > > * a relocation referencing an undefined symbol in a non-SHF_ALLOC
>> > section
>> > > >
>> > > > We always respect the addend in a relocation entry for an absolute/PC-
>> > > > relative (I can use "most" here)
>> > > > relocation (R_ARM_THM_PC8, R_AARCH64_ADR_PREL_PG_HI21, R_X86_64_64,
>> > > > local exec TLS relocation types, ...)
>> > > > Ignoring the addend (using -2 everywhere) will break this consistency.
>> > > >
>> > > > The relocated code may do pointer subtraction which would work if
>> > addends
>> > > > were
>> > > > respected, but will break using -2 everywhere.
>> > >
>> > > I suspect David meant "any harm to using -2 in all .debug_* sections?"
>> > > and not literally everywhere. Sony does special cases only for the
>> > > .debug_* sections.
>> >
>> > Right - thanks for the clarification.
>> >
>> > > I've been meaning to propose that DWARF v6 reserve a special address for
>> > > this kind of situation. Whether the committee would be willing to make
>> > > it be -1 or -2 for all targets, or make it target-defined, I don't know.
>> > > (Dreading the inevitable argument over whether addresses are signed or
>> > > unsigned, or more to the point whether they wrap. They've been unsigned
>> > > and wrapping was undefined on the small set of machines I'm familiar
>> > with.)
>> > > Certainly the toolchain community would benefit from making it be the
>> > > same everywhere.
>> > >
>> > > Personally I'd vote for -1, and make pre-v5 .debug_loc/.debug_ranges
>> > > sections be an extra-special case using -2. We can (I hope) standardize
>> > > on -1 for v6 onward, and document -1/-2 on the DWARF wiki as recommended
>> > > practice for prior versions.
>> >
>> > That'd make linking difficult - the unix linkers at least, currently
>> > don't have to identify the DWARF version when linking - having to pass
>> > an extra linking flag or have the linker parse any DWARF (what if an
>> > object file contains more than one CU & the linker has to apply
>> > different relocations in different parts of the object file because of
>> > that?) would be a significant cost/problem, I think.
>> >
>> > Though I like the tidiness of -1 everywhere, that backwards
>> > compatibility with debug_ranges (& debug_loc similarly) is a problem.
>> > Though ld.bfd does special case debug_ranges (& should special case
>> > debug_loc), perhaps that's the solution. -2 for debug_ranges and
>> > debug_loc, -1 everywhere else (which effectively means everywhere in
>> > DWARFv5 onwards)?
>>
>> Exactly. Base it on the section name, .debug_loc and .debug_ranges
>> use -2 and all other .debug_* use -1. No explicit version check needed.
>>
>> In terms of *specification*, DWARF v6 would say to use -1, and the
>> best-practices on the wiki would say "use -1, except for .debug_loc
>> and .debug_ranges use -2."
>
>Sounds pretty good to me.
Looks good to me, too.
>
>Ray - how do you feel about that? Do you think that's something lld
>would be able to do?
This seems fine. If my understanding is correct, for an R_X86_64_64
referencing sym + addend, the relocated value is:
if is_defined(sym)
return addr(sym) + addend
if relocated_section is .debug_ranges or .debug_loc
return -2 + addend
// Every DWARF v5 section falls here
return -1 + addend
I still want addend to take part in the computation. This makes
subtraction sound and provides a bit more information (the original
length).
For Alexey's example, if we see [0xfffffffffffffffe, 0x0000000000000004)
in .debug_ranges, we know that the original length was 6. DWARF
consumers should allow [0xfffffffffffffffe, *) (but reject other
(low > high) pairs) and emit no diagnostic (DWARFAddressRange::valid() needs an update).
Ah, unfortunately, I don't think it would be OK to leave the addend. I
believe it needs to be handled sort of the way bfd ld does it (but -2
or -1, instead of 0 or 1) - dropping the addend.
Otherwise we'd be at risk of having both the start and end of the
function having non-zero addend (if they were in a comdat group with
some other code/another function, for instance) and then both would
wraparound and be indistinguishable from normal address ranges.
eg:
__attribute__((section(".text.x"))) void f1() { }
__attribute__((section(".text.x"))) void f2() { }
int main() { }
$ clang++ rng.cpp -fuse-ld=lld -Wl,-gc-sections -g && llvm-dwarfdump a.out
DW_TAG_compile_unit
DW_AT_ranges (0x00000000
[0x0000000000000000, 0x0000000000000016)
[0x0000000000400540, 0x0000000000400548))
...
DW_TAG_subprogram
DW_AT_low_pc (0x0000000000000000)
DW_AT_high_pc (0x0000000000000006)
DW_AT_name ("f1")
...
DW_TAG_subprogram
DW_AT_low_pc (0x0000000000000010)
DW_AT_high_pc (0x0000000000000016)
DW_AT_name ("f2")
...
Linking this with gold or lld, leaves the low_pc of 'f2' non-zero (10,
in my case). Because the addend is non-zero. That makes it not
possible to identify as dead code - even if using -2, since the addend
would still wrap it back around to a positive value.
Not the /most/ realistic example - there are cases where clang puts
multiple functions in the same comdat, though I think they still go in
separate sections still - though maybe they don't actually need to be
in separate sections, though, since they're comdat'd together.
I don't think it's meaningful to talk about the length of a function
that doesn't exist/there are no instructions for - so I don't think
there's a loss in fidelity to not have that information in the final
linked DWARF.
> I guess we'd need to probably have a conversation with the DWARF
> Committee and/or with debugger vendors (gdb and lldb) to ensure they'd
> be willing to make matching changes to support this...
>
> and since they might not support it out of the gate, perhaps we'd need
> it behind a flag for the current (albeit buggy) backwards
> compatibility? Or maybe it works well enough without explicit support
> already.
> So after implementing this, some tools could potentially stop working.
> I do not know, such tools. So, I am not sure whether that is the
> problem.
I hope we don't need a linker option (selecting -1+addend or 0+addend).
If unfortunately it can't be avoided, we probably need to discuss it
with binutils. I can do that.
Another thought. I wonder whether non-debug non-SHF_ALLOC sections
expect addr(undefined) to be 0. If they can live with any value, and if
we don't need the -2 special case (to avoid collision with the base
address selection entry -1),
I think we will need that special case due to DWARFv4 for a while yet.
Though, yes - if the current addr(undefined) variance (0 for bfd,
0+addend for lld/gold) extends beyond .debug_* sections, /maybe/ we
could make it configurable with a default, while keeping the hardcoded
exception for .debug_ranges and .debug_loc of using -2. Though I'd
worry a bit about more open-ended use cases for the non-debug
sections. At least for the debug sections we have a pretty good idea
of which consumers to go and talk to/test with, etc.
we can add a more generic option
-z undef-address-in-nonalloc=-1
Applying to every relocation referencing an undefined symbol in a
non-SHF_ALLOC section. Let -1 be the default value to make .debug_*
(excluding .debug_ranges and .debug_loc) happy.
If we end up blessing it as part of the DWARF spec, we probably
wouldn't want it to be user-configurable for the .debug_ sections, so
I'd hesitate to add that configurability to the linker lest we have to
revoke it to conform to DWARF (breaking flag compatibility with
previous versions of the linker, etc). Admittedly we'll be breaking
output compatibility with this change regardless, so potentially
having the flag as an escape hatch could be useful.