+ Ben Dunbobbin, whose name I take in vain below.
He's my local expert on weird ELF features.
From: David Blaikie <dblaikie@gmail.com>
Sent: Thursday, June 4, 2020 2:43 PM
To: Robinson, Paul <paul.robinson@sony.com>
Cc: jh7370.2008@my.bristol.ac.uk; llvm-dev@lists.llvm.org
Subject: Re: [llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info
in lld.
>
>
>
> > From: David Blaikie <dblaikie@gmail.com>
> > Sent: Wednesday, June 3, 2020 5:31 PM
> > To: Robinson, Paul <paul.robinson@sony.com>
> > Cc: jh7370.2008@my.bristol.ac.uk; llvm-dev@lists.llvm.org
> > Subject: Re: [llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug
info
> > in lld.
> >
> > >
> > > DWARF was designed in an era when COMDAT and ICF were not a thing,
or at
> > least not common, certainly not when talking about function code. The
> > overhead of a unit occurred only once per translation unit, so that
> > expense was reasonably amortized.
> > >
> > >
> > >
> > > Splitting functions into their own object-file sections and making
them
> > excludable is an evolution of compiler/linker technology that DWARF
has
> > not kept up with. The linker-friendly solutions (COMDAT DWARF) would
put
> > function-related .debug_* contributions into a section-group along
with
> > the function .text itself; this multiplies the total number of
sections to
> > deal with, regardless of the tactics used for the content of each per-
> > function DWARF section. The fully DWARF-conformant solution would
create
> > one partial_unit per function, with the corresponding overhead of unit
> > headers (especially painful in the .debug_line section).
Alternatively we
> > fragment DWARF into sections without headers and rely on the linker to
> > make everything look right in the linked executable; this produces .o
> > files that are not DWARF conformant (unless we can standardize this in
> > DWARF v6) and would be a big hassle for consumers other than the
linker.
> >
> > "object files don't contain DWARF, but they contain stuff that the
> > linker will turn into DWARF" wouldn't seem like the worst thing to me
> > - what sort of pre-linking parsing of DWARF use cases do you have in
> > mind, other than for our own compiler development uses?
>
> No, that wouldn't seem like the worst thing. Obviously llvm-dwarfdump
> would want to be able to report what's actually happening, but indeed
> all the other use-cases that come to mind are not looking at .o files.
>
> > (notwithstanding in-object Split DWARF (where the .dwo sections would
> > have to be remain usable without linking) or the MachO style debug
> > info distribution model which is similar)
>
> I expect Split DWARF would be incompatible with fragments. I don't
> know details about MachO but seems likely the same is true there.
Yep, if they're sub-contribution regions, that wouldn't play well with
Split DWARF. (& full contribution isolation have the DWARF header
overhead, etc)
I'd still be concerned about the ELF header overhead even of this
sub-contribution scheme, but could be interesting to see how it plays
out in practice.
All that said, to avoid burying the lede here, I'll splice something
from the end up here:
> Although the point is not to avoid tombstone values, but to do a more
efficient job of editing the final DWARF to omit gc'd functions; it's no
problem at all to use a tombstone value in .debug_addr IMO.
But the tombstone values are Alexey's underlying issue (this ongoing
design discussion for over a year now) & /sort/ of mine too recently
(which, unfortunately, is what's reinvigoraetd this discussion -
would've been nice if I/we/someone had identified this sooner &
could've helped Alexey in a more timely manner): Alexey is dealing
with a platform where 0 is a valid address so the lld/gold strategy of
resolving relocations to dead code to "0+addend" creates ambiguous
DWARF. I'm dealing with a case of zero-length functions ("int f1() {
}" or "void f2() { __builtin_unreachable(); }") causing early
termination of DWARFv4 range lists.
The reason for the DWARF-aware linker proposal was because the "let's
choose a better tombstone" discussion didn't go anywhere & people sort
of encouraged in this direction of "what if we didn't need a
tombstone/the linker fixed up the debug info instead". So if the DWARF
redundancy elimination doesn't address the issue of zero as a valid
address, it doesn't address Alexey's needs, unfortunately. 
But, upthread we had a tombstone discussion IIRC, which seemed to converge
on "-1 except .debug_loc/.debug_ranges use -2" didn't it? If we're still
going on about having the linker rewriting DWARF, then the fragmenting
idea is worth pursuing as an alternative to Alexey's current work.
That said, I super appreciate the time you've put into writing this up
and it is valuable & I'd love to see some (even hand-crafted assembly)
prototypes, maybe do some back-of-the-envelope numbers to see whether
the ELF header overhead would be worth it, etc.
It would be nice to verify that the section-fragment idea would produce
something that looked usable. Hand-written assembly... would require
research into how to specify the right section attributes, but would
likely be less effort than trying to make LLVM do something plausible.
I'll see about creating an internal task for this.
> > But even then, I'm not sure how viable it would be - as Fangrui
> > pointed out on another thread about this: ELF section overhead itself
> > is non-trivial ("sizeof(Elf64_Shdr) = 64.") & it would probably be
> > rather difficult to reconstruct header-less slice-and-dicable sections
> > in some cases. For type information (a reduced overhead version of
> > -fdebug-types-section) I could see it - but for functions, they need
> > to refer to addresses - preferably in the debug_addr section, and
> > that's accessed by index, so taking chunks out of it would break other
> > references to it, etc... adding the header would be expensive, and how
> > would the CU construct its DW_AT_ranges value if that has to be sliced
> > and diced? Again, some amount of linker magic might solve some of
> > these problems - but I think there's still a lot of overhead to making
> > a solution that's workable with a DWARF-agnostic linker (or even with
> > a DWARF aware one, but in an efficient amount of time/space where it's
> > not only usable for small programs, or for linking when you're
> > shipping a final production binary, etc)
>
> The idea we have blue-skied internally would work something like this
> (initially explicated in terms of the .debug_info section, then seeing
> how that tactic applies to other sections):
>
> There's a top fragment, containing the CU header and the CU DIE itself.
> Linker magic makes this first in the output file.
Quick curiosity: Is there existing linker magic for this? What does it
look like? I'd love to know so I can play around with hand crafted
prototypes/keep it in mind for such things.
Ben Dunbobbin did research into this some time ago, under the auspices
of a "COMDAT DWARF" investigation. He's part of Sony's linker team, and
it was a discussion with that team where I became convinced that the
fragmenting idea was feasible using existing defined ELF capabilities,
although perhaps in ways nobody had really taken advantage of. It
involved section groups and/or section ordering, but somebody much more
familiar with ELF than I am would have to explain it. I've cc'd Ben.
Regarding my discussion with our linker team:
They asked me whether it was feasible to use sections to subset the
DWARF, and I described the functional need (top & bottom fragments,
arbitrary stuff in between) and they thought the ELF section-group
and/or section-ordering features would be able to provide that.
I'm not aware that anyone actually tried prototyping that. The work
that James did (mentioned upthread) IIRC was using COMDAT and full
units with unit headers. My fading memory suggests the discussion
described just above was after that.
(basically the ability for an object file to say "here's the start and
end of my contribution to this section, and some bits that /can/ go in
the middle, but you can drop them if you like")
> Types also go here; certainly base types, and other file-scope types
> can be included here or put into type units. (Type units aren't
> fragmented, they are their own thing same as always.)
Separately, it might be worth considering putting types in such a
thing - but, yes, the "How do you reference them when they might be in
your unit or someone else's unit", etc, would have to be figured out.
I guess using an external symbol might be the solution there - again,
with a better understanding of the ^ mentioned linker magic, I'd
probably play around with hand crafting some examples just to see how
this could work.
> There's a matching bottom fragment, which is just the terminating NULL
> for the CU DIE; linker magic makes this last in the output file.
Last of all the contributions from this object file, not last in the
whole output file, right? (please excuse the pedantry, just double
checking)
The object file would (loosely speaking) have a ".debug_info.first",
some number of ".debug_info.excludable-middle", and a ".debug_info.last"
which would all be glommed together in first-middle-last order in the
output .debug_info section. I believe I was told that this would be
per-object-file, otherwise yeah it wouldn't work at all.
This is why we need input from somebody who actually knows ELF. 
> Each function has its own fragment, which is in the same link-group
> (COMDAT or whatever) as the function's .text section; that way, if the
> function is discarded, so is the .debug_info fragment. Offhand I can't
> think of any cases (other than DW_AT_specification, addressed below) of
> references to a subprogram DIE from elsewhere,
The call_site DWARF would want to refer to a subprogram DIE, but that
could be handled by (first pass) having a declaration subprogram in
the initial fragment that the call_site could refer to using the usual
assembler-resolved CU-relative offset. Of course that'd mean a bunch
of (probably the bigger part) of the function's DWARF footprint
wouldn't be deduplicated, but would address this part of the address
tombstone issue (if not using debug_addr) & reduce some of the DWARF -
the addresses are pretty big (if you're not pooling them), etc.
Ah, forgot about call_site. Yeah referring to a declaration should work.
> so it should be fine to
> discard the entire function fragment as needed. Linker magic puts all
> function fragments between the top and bottom fragments, in some
> indeterminate order. Each function fragment is the usual complete
> subtree, rooted in DW_TAG_subprogram.
Rooted at the top level (well, below the DW_TAG_compile_unit) DIE, as
you mention later - namespace, or whatever else.
Right, each fragment would be a complete subtree that would ordinarily
be a direct child of DW_TAG_compile_unit. With whatever DIE it needed.
> References to types are either
> to type units as normal, or to types in the top fragment. Note that
> these references do not require relocations; type units are by signature
> as always, and for types in the top fragment, the offsets into the top
> fragment are known at compile time.
>
> Inlined functions are described as part of the function they have been
> inlined into, being children of the function DIE. DW_AT_specification
> refers to the abstract declaration which is in its own fragment (or the
> top fragment, but that keeps the declaration from being elided if all
> references go away).
Yep, this overlaps with the call_site stuff I mentioned earlier - same
ideas. Either top fragment, or its own fragment. Keeping its own
fragment alive, and figuring out how to reference it (depending on
fragment layout/elision) would require some work, but I think it's
do-able. Might even be do-able so it can be deduplicated across CUs
(use a sec_offset form, use a linker-resolved relocation to it) - this
infrastructure would overlap with type deduplication without type
units too.
Though linker resolved relocations add more bytes...
> If functions are inside namespaces, each function fragment will need
> to have namespace DIEs around the function DIE. This adds overhead
> but it's pretty small.
>
> I hand-wave filling in the CU header's unit length. I'd expect a
> relocation with a reference to the bottom fragment should be able to
> compute the correct value.
*nod*
> That's the story for .debug_info; what about other sections?
>
> Sections referenced by index from .debug_info can't be fragmented;
> this would be: .debug_abbrev, .debug_addr, .debug_str_offsets.
>
> .debug_str doesn't need to be fragmented, linkers DTRT already.
(linkers deduplicate debug_str - but can they be made to remove
unreferenced strings too? in that cas ewe'd have an interesting
tradeoff of maybe using FORM_strp rather than strx - if we wanted the
linker to be able to drop strings from dropped function definitions,
etc)
Future refinements are quite possible!
> .debug_macro contents are not tied to functions and won't be fragmented.
>
> .debug_loclists and .debug_rnglists should be fragmentable the same
> way as .debug_info; they exist only as extensions of .debug_info, and
> the range list for the CU itself is merely a concatenated set of
> contributions from each constituent function, so that should Just Work
> (although it won't be optimal, adjacent ranges won't be coalesced).
At least the way we currently emit loclists and rnglists is by using
an index (the header of loclists and rnglists has an index to offset
mapping) - like strx, this would make it hard/impossible for a
DWARF-agnostic linker to see through to find out which indexes were
actually used. We could potentially not use the loclistx/rnglistx
forms/indexes from fragments - instead using sec_offsets that would
make them relocatable/removable/etc. (so long as all the index-based
referenced lists came in the debug_loclist/debug_rnglist header
fragment)
Ah, I hadn't looked at how we do those lists. But sounds solvable.
> I believe the same is true for .debug_loc and .debug_ranges, although
> I haven't checked.
Yep, those ones are easier - there's no contribution header, they can
only be referenced via sec_offset, so slicing and dicing them is
cheap.
But the tombstone problem still exists for the CU's debug_ranges -
though /maybe/ it could be carefully constructed from fragments...
that's going to be a /lot/ of sections in the end though.
> .debug_aranges is functionally equivalent to the CU rangelist.
Yup. (as we've touched on before, we don't use aranges at Google -
instead relying on CU's ranges which are just a little more expensive
to retrieve - but no need to duplicate the data in both places - if
consumers really find the aranges worthwhile to avoid parsing a few
attributes on the CU DIE, perhaps a future spec could let
debug_aranges reference a range list? so that aranges and the CU could
share the same data?)
> .debug_line can work the same way as .debug_info but is worth a word.
> The top fragment has the header, including the directory/file lists
> because those are referenced by index. DW_LNE_define_file can't be
> used. Each function has a fragment containing the sequence for that
> function, starting with set_address and ending with end_sequence.
> The bottom fragment is empty, existing only to allow the length to
> be computed.
Yep - can't remove dead file and directory names, unfortunately - and
the line table's pretty compact, so not sure it'd be a great savings
(especially compared to the ELF section overhead - at the object file
size at least (though probably a small win for linked executable
size)). Chances are those strings (now in debug_line_str) would be
used /somewhere/ in the program, so linker string deduplication would
get most of the wins - just dead offset entries in the line table
header.
Sony does squeeze out the sequences for dead functions; I think it's
not a huge win, in terms of total debug info size, but the .debug_line
section does not let you skip dead sequences; you still have to parse
the whole thing. Our debugger guys were pleased at not having to
spend time doing something that useless. (Yeah it does mean the
linker has to parse the whole .debug_line section; but our theory is
that you probably run the debugger more than you run the linker, and
in any case you do it interactively, so debugger load time is probably
more annoying than some fractional increase in build/link time.)
The dir/file tables can't be squeezed, but one expects it's not a
huge cost with .debug_line_str having lots of deduplication
opportunities.
> .debug_line_str is a string section and requires nothing special.
>
> .debug_names ... haven't looked at it but I suspect either it doesn't
> survive or it has to be generated post-link (or by the linker).
Generally you're going to want a DWARF-aware linker for debug_names,
same as gdb-index, etc.
> .debug_frame I *think* can be fragmented, but I haven't take the
> time to look at it to make sure.
>
> Those are all the sections I see in DWARF v5 Appendix B.
>
> So that's the blue-sky vision of linker-magic COMDAT DWARF, which
> took me about an hour to write down just now. There is certainly
> a non-trivial overhead in terms of ELF sections; in the general
> case we would have 5 per-function fragments (for .debug_info,
> .debug_line, .debug_rnglists, .debug_loclists, .debug_aranges).
>
> Not small, but then other features in the works are using huge
> quantities of ELF sections too (section-per-basic-block).
That work's being scoped to be fairly selective about which basic
blocks it puts in unique sections - just those that are especially
performance sensitive, so the cost isn't as high as you might
otherwise imagine. Adding 5 new sections per function would be
probably a significantly larger growth than anything else I'm aware
of, but I haven't run the numbers by any means.
Doing it for *every* function would be the worst case, for when
you're trying to squeeze everything (gc + icf). We could likely
get wins if we did it just for the functions that today end up in
a COMDAT section (inline functions, template instantiations) which
previous research has found to be pretty significant (and major
motivation for the Program Repository work that we've previously
described at a Dev Meeting, https://llvm.org/devmtg/2016-11/#talk22)
Thanks again for the write up!
NP, it was fun to trot out this stuff.
--paulr