Smaller object file cost for .debug_names intended only for linking

@cmtice @ayermolo @maskray

Looking at the size impact of .debug_names in indexes - a lot of the size impact comes from string relocations (when using split DWARF and compression, the object size cost of .rela.debug_names is about 18% of the debug info in the object file, but the .debug_names itself is closer to 10% - noncompressed it’s more like 9% rela and 10% .debug_names)

I’m wondering how folks feel about the idea of adding a flag/mode to the compiler that emits .debug_names that would not be naively linked because it would lack string relocations - we could flag the .debug_names as SHF_EXCLUDE[1] so that a linker would ignore/drop them (if it wasn’t asked to use them to compute a unified index) to ensure a naive linking would drop them and avoid creating corrupted indexes (if it didn’t have string relocations, the offsets wouldn’t get updated and would point to the wrong offsets of the unified .debug_str, producing corrupted strings)

One could argue that in non-split DWARF with .debug_str_offsets we could share those offsets (in split DWARF most of the strings we need for the index won’t already be in the string table, or in the str_offsets table - so going through str_offsets wouldn’t save us (m)any relocations, etc) - requiring DWARF work to allow that encoding/reuse of the str offsets. But I don’t think that’d be better - at least for users of a .debug_names smart link/merge in the linker.

So, how would folks feel about that, as an idea? Then it’s a question of spelling, maybe -gpubnames=<something> (though there’s convention for -g<something>-pubnames (eg: -ggnu-pubnames)). -gpubnames=forlinker? (feels bad, but I can’t think of good names) - could check with GCC folks to see if they want to agree on something here.


  1. used for single-file split DWARF, the dwo sections are emitted into the .o file, but marked SHF_EXCLUDE so the linker doesn’t link them into the final binary ↩︎

I am wrapping my head around the string relocations in .debug_names. Do you have a C example that compiles to assembly and annotates things like .long .Linfo_string7 and .long .Lnames4-.Lnames_entries0 to demonstrate what you are proposing?

We have precedents of SHF_EXCLUDE sections which require special merging of lld (.llvm.call-graph-profile). A linker unaware of the semantics simply discards the section. So I think marking .debug_names as SHF_EXCLUDE should be fine.

Any particular aspects you’re trying to understand better? For the purposes of this conversation I would’ve thought it’d be enough to cover that .debug_names is an index for DWARF and contains string relocations for the names in the index - and that if the linker is doing a content-aware merge, it’ll know which things need effective relocation without the need for ELF relocation records to guide it.

But yeah, a simple void f1() { } compiled with clang -g -gpubnames produces this .debug_names section:

        .section        .debug_names,"",@progbits
        .long   .Lnames_end0-.Lnames_start0     # Header: unit length
.Lnames_start0:
        .short  5                               # Header: version
        .short  0                               # Header: padding
        .long   1                               # Header: compilation unit count
        .long   0                               # Header: local type unit count
        .long   0                               # Header: foreign type unit count
        .long   1                               # Header: bucket count
        .long   1                               # Header: name count
        .long   .Lnames_abbrev_end0-.Lnames_abbrev_start0 # Header: abbreviation table size
        .long   8                               # Header: augmentation string size
        .ascii  "LLVM0700"                      # Header: augmentation string
        .long   .Lcu_begin0                     # Compilation unit 0
        .long   1                               # Bucket 0
        .long   5863324                         # Hash in Bucket 0
        .long   .Linfo_string3                  # String in Bucket 0: f1
        .long   .Lnames0-.Lnames_entries0       # Offset in Bucket 0
.Lnames_abbrev_start0:
        .ascii  "\310\013"                      # Abbrev code
        .byte   46                              # DW_TAG_subprogram
        .byte   3                               # DW_IDX_die_offset
        .byte   19                              # DW_FORM_ref4
        .byte   0                               # End of abbrev
        .byte   0                               # End of abbrev
        .byte   0                               # End of abbrev list
.Lnames_abbrev_end0:
.Lnames_entries0:
.Lnames0:
        .ascii  "\310\013"                      # Abbreviation code
        .long   35                              # DW_IDX_die_offset
        .byte   0                               # End of list: f1
        .p2align        2, 0x0
.Lnames_end0:

The end result is one string relocation per unique unqualified indexed name in the debug info.

Nice precedent to have, indeed

I am not quite following the point about split-dwarf. The relocations in .debug_names currently are into .debug_str section.

Sorry, I’ll use a few more words.

Specifically about the string relocation size cost - one tool we added in DWARFv5 to reduce this cost was .debug_str_offsets - but .debug_names can’t use that section. In a non-split DWARF build, it would be beneficial to be able to share the .debug_str_offsets between .debug_names and .debug_info - the strings and relocations would be needed for .debug_info anyway, and .debug_names would just piggy back on that existing info.

But in Split DWARF, if not for the index, those strings (the names of entities in the DWARF) would not be in the .debug_str section and would not have entries in .debug_str_offsets (the string names of entities would all be in the .dwo sections instead). So in Split DWARF, it would not be helpful to have .debug_names indirect through .debug_str_offsets - because there’d be nothing to piggy back on - it’d add another layer of indirection, but not reduce the cost/reduce the number of relocations.

Thanks for elaborating.
Looking at a local example created with trunk llvm when split dwarf is enabled the strings that are in .debug_names entries for DIEs in .dwo are in .debug_str. Which makes sense since llvm-dwp can de-dup strings (from memory), and re-writes .debug_str_offsets.dwo with new offsets. It doesn’t touch an executable.

      Hash: 0xF73809C
      String: 0x00000004 "Foo2a"
      Entry @ 0xa7 {
        Abbrev: 0x26c
        Tag: DW_TAG_structure_type
        DW_IDX_type_unit: 0x00
        DW_IDX_die_offset: 0x00000021
      }
mainTypes.dwo:  file format elf64-x86-64

.debug_str.dwo contents:
0x00000000: "_Z3foov"
0x00000008: "foo"
0x0000000c: "int"
0x00000010: "f"
0x00000012: "c1"
0x00000015: "char"
0x0000001a: "Foo2a"

mainTypes.exe: file format elf64-x86-64


.debug_str contents:
0x00000000: "foo"
0x00000004: "Foo2a"

Spec is a bit vague on the subject.

6.1.1.4.6 Name Table
2 The name table immediately follows the hash lookup table. It consists of two
3 arrays: an array of string offsets, followed immediately by an array of entry
4 offsets. The items in both arrays are section offsets: 4-byte unsigned integers for
5 the DWARF-32 format or 8-byte unsigned integers for the DWARF-64 format.
6 The string offsets in the first array refer to names in the .debug_str (or
7 .debug_str.dwo) section. The entry offsets in the second array refer to index
8 entries, and are relative to the start of the entry pool area.

Which I guess brings a different point. If we do change it to go through .debug_str_offsets thats against the spec, so would require changes to llvm tooling, and will make it not compatible with gnu tools. Unless I am missing something?

Sorry. The whole Split DWARF thing is a bit of a red herring/distracting from the discussion a bit.

My point was roughly:

Relocations for strings in .debug_names are expensive, and it’d be nice to avoid them - how about we add a flag/some support that says “emit .debug_names but I know I’ll link them with a DWARF-aware linker and they don’t need relocations (& we can add a section flag that says “drop these sections” in case they get linked with a non-DWARF-aware linker, so they don’t get emitted in a corrupted state in the resulting executable)”.

The point about .debug_str_offsets was just “the solution that might occur to some people, especially those not using Split DWARF, would be to add a new feature to DWARF for .debug_names to be able to reference strings via .debug_str_offsets - thereby sharing existing string relocation entries already needed for the names of the entities in .debug_info” - but since that won’t help with Split DWARF where the names aren’t present in .debug_str anyway. So it’d be good to do the first thing (have a flag to omit the relocations entirely) and it doesn’t require spec changes/new DWARF features, etc anyway.

Ah got it! I narrowed in on the wrong part. :slight_smile:
Yes that does sound like a better alternative then going against spec.

This stems from the fact that google infra is sensitive to the size of .o files?

Yep

Thanks for the further detail.

I just checked a RelWithDebInfo build of ninja clang:

#!/usr/bin/ruby
totsz = totrel = 0
Dir['**/*.o'].each {|f|
  sz = File.size f   
  rel = 0
  `readelf -WS #{f}`.lines{|l| next unless l=~/rela.debug_names/; rel=l.scan(/ [[:xdigit:]]{6,}/)[2].to_i(16)}
  #puts "#{f}\t#{sz}\t#{rel}"
  totsz += sz
  totrel += rel
}
puts "#{totsz}\t#{totrel}\t#{1.0*totrel/totsz}"

Yes, .rela.debug_names sections contribute to 10% of the total .o size.

%  ruby a.rb
851038608       88289592        0.10374334509627793

So the issue is that .debug_pubnames’s replacement, .debug_names, is too large for relocatable files.

  • hash lookup table. This is optional, so if we know we will use a smart linker, we can set bucket_count to zero to drop this table.
  • name table. The relocations referencing .debug_str are too expensive.

This proposal is to have an opt-in option to create a variant of .debug_names that replaces the string offsets in the name table (associated with relocations to .debug_str).

  • The string offsets can be assemble-time constants without relocations, indexing another string table
  • Alternatively, the string offsets can be replaced with embedded strings (.asciz $name) as in .debug_gnu_pubnames/(pre DWARF v5).debug_pubnames

Either way, removing the relocations will decrease the relocatable file size at the cost of losing SHF_MERGE|SHF_STRINGS deduplication. This variant only works for a linker aware of the format.

I support this proposal. Personally, I prefer embedded strings like in .debug_gnu_pubnames.

(If the string table approach is adopted, I think technically SHF_MERGE|SHF_STRINGS deduplication could be added to the linker, but the change would be a bit intrusive. The return-on-invest may not be so high.)

Remind me why .debug_names should not use .debug_str_offsets? I’d think that names in .debug_names would already be in .debug_str, because the .debug_info DIEs would be using those names.

When using Split DWARF, the names won’t already be in the .debug_str (they’ll be in .debug_str.dwo in the dwo/dwp files).

Yeah, if I only needed to solve this problem for non-Split DWARF I’d propose a DWARF feature to allow .debug_names to use .debug_str_offsets - but since that wouldn’t help (it’d increase the .o size due to the extra layer of indirection) the Split DWARF case, it’s not so interesting to me.

Yeah - that’d require changes to DWARF. By no means impossible, but a longer time scale to reach compatibility across toolchains, etc, in that case. Not that that’s terribly important - don’t know that bfd-ld or gold are going to get debug_names merging any time soon, and in the absence of that - we’re just building a .debug_names for lld anyway…

As for actually designing such a feature for DWARF - Either we’d have an offset table (which would be pretty similar to the current format - maybe the offsets could be smaller though, if you knew they were just local/for the object file) - or they are null terminated with no offset table and the consumer has to search through the strings. It would remove the offsets, which would save a little bit of space - not sure how much in total, though. I guess we could have a bit in the index header that says whether the strings are inline or offsets.

I support inline strings as in .debug_gnu_pubnames /(pre DWARF v5).debug_pubnames because:

  • if saves a .long offset, 4 bytes
  • the generated assembly is easier to read for human and the object file is easy to parse for a consumer

A separate offset table/string table would introduce some complexity…

A separate string table can be separately compressed, but this does not yield a significant advantage than compressing .debug_names if a user cares much about relocatable file sizes.

I’m not sure it’s easy to parse for a consumer - in the sense that either it has embedded offsets, or the consumer has to read all the null terminated strings first so it can know for a given entry index, which string name is associated with it.

This is probably fine for merging the index (when you’re going to read everything anyway - though it does add some memory overhead even then, having to make/allocate the offset table in memory, essentially), but would be unfortunate when processing, let alone using, a large index - you wouldn’t want to have to walk all the strings in the table up-front (the purpose of the table is to do as little work up-front as posible).

.rela.debug_names contains relocations that almost exclusively reference .debug_str. [RFC] RELLEB: A compact relocation format for ELF should alleviate the size increase issue, though the new format could take some time to be ready. (I am trying to bring attention to more folks)

The design has to take a trade-off and from the latest https://maskray.me/blog/2024-03-09-a-compact-relocation-format-for-elf#limitation

RELLEB is not the most optimal format for sections like .rodata, .debug_names, .debug_line, and .debug_addr. These sections often have many relocations with the same type and symbol, a pattern that the generic RELR format (discussed below for dynamic relocations) could exploit more effectively.

Specifically for .debug_names, the RELLEB format results in a size(.relleb.debug_names) / size(.rela.debug_names) ratio of 27.7%. Modifying RELLEB to use the sign bit of the symbol index for omitting the relocation type (instead of the addend) could improve this ratio, but at the cost of larger .relleb.text sections.

but with Split DWARF wouldn’t .debug_names be .debug_names.dwo? and be able to use .debug_str_offsets.dwo? Seems like another case where non-.dwo should use non-.dwo and .dwo should use .dwo.

Or does .debug_info.dwo not use .debug_str_offsets.dwo? given that reducing relocations is irrelevant in Split DWARF. But if we have both .debug_info.dwo and .debug_names.dwo referencing the same strings, it feels like going through .debug_str_offsets.dwo would (could) be a net win.

Nope - as with the indexing solution used pre-.debug_names (.debug_gnu_pubnames and .gdb_index) the index is kept in the .o/executable so that the index can be linked/merged and lookups can be made quickly without having to consult every .dwo file linearly.

This isn’t a perfect solution by any means - the index (& I think .debug_names will be/is larger than gnu_pubnames and gdb_index) is, last I checked, the largest part of the debug info that remains in the .o/executable, reducing the effectiveness of split DWARF (ie: still putting pressure on the linker)

One could argue that the indexes in the dwo files would be a viable strategy - it’s certainly what Apple/MachO does on iterative builds (leaving the indexes in the .o files, then only creating a unified index if a dsym is created) - though I don’t think that scales to the larger programs Google deals with, or the slower (networked) filesystem we store our dwo files on. The theory, at least, is that it’s valuable to not have to download dwo files if they aren’t needed, and to access an efficient/merged index rather than many scattered/small indexes.

.debug_info.dwo does necessarily use .debug_str_offsets.dwo for all references to .debug_str.dwo - because the dwp tool does not parse .debug_info.dwo (it just splats the units one after another together), the dwp only updates string references in .debug_str_offsets.dwo.

Yeah - for non-split DWARF it seems fine/valuable to have .debug_names able to use debug_str_offsets to reduce relocations. Just not high on my list of things that’ll help my users/use cases.

Yeah, I saw the posts - it’s certainly a good direction to consider/figure out - many of the improvements to debug info we’ve done lately (DWARFv5 rnglists, ranges everywhere, addr+offset form added to DWARFv6) have been to reduce relocations because of how expensive they are in object file size. Most of those tradeoffs, even with smaller relocations, are still valuable in Split DWARF at least because they help move bytes out of the objects and into the dwo file (they’re more valuable when they reduce the large relocation records too, so they might not’ve been top of the list had we had a more compact relocation encoding)

I’ll get to responding to the main post on relleb - I was a bit confused by some of the numbers (which numbers relate to reductions in overall size, on what build flags, versus reductions in sizes of specific sections (both numbers are useful to understand how much we’re saving compared to a theoretical maximum/best case scenario, etc)) - but the general idea seems good to me, especially in an environment like ours at Google where we control the compiler and linker completely. For any end-user project that’d apply to them too, but for folks shipping libraries it’s trickier of course.