[RFC] - Deduplication of debug information in linkers (LLD).

pogo59 · December 4, 2017, 6:49pm

Thanks for providing the experimental data! It clearly shows the value of type sections in DWARF.

Regarding why type sections are off by default, aside from the issue of consumers needing to understand them, there is a size penalty to type sections that becomes more evident in smaller projects (meaning, fewer compilation units). The size penalty can be balanced against the amount of deduplication for a net win, if you have enough duplicates that you can eliminate. But it is a tradeoff.

In Sony’s case, it is not uncommon for studios to do what are called “unity” builds, where you have basically one master .cpp file that does #include of each other .cpp file, giving you an LTO-like build. In this case the debug-info production will automatically produce only one copy of each type, and so using type sections would probably make the net debug info bigger. And of course an LTO build will deduplicate type info at the metadata level, with a similar effect.

So, I think whether type sections help or hurt will depend on how a particular project’s build procedure is set up. Clang/LLVM are set up to do lots of smaller compilations and link them all together, in a fairly traditional model, and that is where type sections will provide the most benefit. Your data, then, is essentially for a best-case scenario. Other kinds of projects will not benefit as much.

Regarding DWARF 5 and emitting type sections into the .debug_info section rather than the .debug_types section: The work to support DWARF 5 in LLVM has not gotten very far yet. Conforming to the standard in this respect is certainly on my list, however there are other features that Sony considers higher priority. If you or someone else wants to contribute that feature sooner, that would be excellent! Otherwise, we will get to it in due time.

Thanks,

–paulr

Gordon_Keiser1 · December 4, 2017, 9:23pm

An old co-worker told me that writing a dwarf support library was the most painful experience of his life due to the confusing standards documents, so it’s not surprising DWARF5 is going slow.

echristo · December 4, 2017, 10:59pm

This isn’t a particularly productive email - especially as a number of people on this list are current contributors to the standard. Mostly dwarf5 support is lined up behind one of us having the spare cycles to implement it rather than anything else FWIW

That said, if you have specific feedback about confusing items I’m definitely happy to help figure out:

a) some better way to say it,
b) some other implementation to avoid it being confusing

Having partially implemented a couple of readers and writers at this point I agree that it’s not the friendliest of documents, but sometimes being inside of it makes it harder to see where it’s causing issues.

Thanks!

-eric

Rui_Ueyama · December 5, 2017, 5:08am

Thanks for providing the experimental data! It clearly shows the value of
type sections in DWARF.

Regarding why type sections are off by default, aside from the issue of
consumers needing to understand them, there is a size penalty to type
sections that becomes more evident in smaller projects (meaning, fewer
compilation units). The size penalty can be balanced against the amount of
deduplication for a net win, if you have enough duplicates that you can
eliminate. But it is a tradeoff.

By a size penalty, which do you mean, the size of the final executable or
the intermediate object files? If it is a size penalty of object files, how
much is that? I wonder if the current situation is a reasonable trade-off.

In Sony's case, it is not uncommon for studios to do what are called

George_Rimar · December 5, 2017, 1:50pm

So, I think whether type sections help or hurt will depend on how a particular project’s build procedure is set up. Clang/LLVM are set up >to do lots of smaller compilations and link them all together, in a fairly traditional model, and that is where type sections will provide the >most benefit. Your data, then, is essentially for a best-case scenario. Other kinds of projects will not benefit as much.

This inspired me to do additional tests for LLVM binaries to see how much win they can have if we enable -fdebug-types-section.

(Full table with results is at the end of mail.)

During experiment I observed both object size penalies and a single final executable size penalty:

Size of .a files in LLVM/lib files inreases from 6.5GB to 7.7GB.
One binary which is llvm-PerfectShuffle was larger with flag, size changed from 120064 to 124952.

For all others use of flag usually grants noticable win (up to reduce of size by 41%).

Regarding DWARF 5 and emitting type sections into the .debug_info section rather than the .debug_types section: The work to support >DWARF 5 in LLVM has not gotten very far yet. Conforming to the standard in this respect is certainly on my list, however there are other >features that Sony considers higher priority. If you or someone else wants to contribute that feature sooner, that would be excellent! >Otherwise, we will get to it in due time.

Thanks,

–paulr

I am going to look at it closer. At least I do not think LLD would work correctly with multiple

.debug_info right now for building .gdb_index. We expect to see unique .debug_info in a object file and

probably will do something wrong in another case. Looks llvm/DebugInfo needs to be fixed first, which

also affects tools lile llvm-dwarfdump and probably something else. Going to investigate all of that.

Testing results:

pogo59 · December 5, 2017, 3:13pm

Thanks for providing the experimental data! It clearly shows the value of type sections in DWARF.

Regarding why type sections are off by default, aside from the issue of consumers needing to understand them, there is a size penalty to type sections that becomes more evident in smaller projects (meaning, fewer compilation units). The size penalty can be balanced against the amount of deduplication for a net win, if you have enough duplicates that you can eliminate. But it is a tradeoff.

By a size penalty, which do you mean, the size of the final executable or the intermediate object files? If it is a size penalty of object files, how much is that? I wonder if the current situation is a reasonable trade-off.

When we emit a type section instead of directly emitting the type to .debug_info, we effectively extract the type description and move it into the type section; however the type section also has overhead, consisting of a header and some wrapper around the type information, and possibly some additional context. This is obviously bigger than the original description. Also references to the type become bigger; at a minimum, they are each 8 bytes, rather than the usual 4 bytes. Repeat this overhead for each type moved to a type section. All of this results in a bigger intermediate object file. I have not tried to measure how much this is for “typical” compilation units. IIRC, LLVM chooses to move enums and aggregates into type units; it does not assess the size of a type description as part of its heuristic.

If none of the type sections are duplicated in other object files, then the final executable will be just as much bigger as the linkfiles. To the extent that there are duplicates the linker can eliminate, you start to claw back space consumed by the overhead. If you have enough duplicates to eliminate, you have a net size win in the executable.

–paulr

Topic		Replies	Views
[RFC] - Deduplication of debug information in linkers (LLD). LLVM Dev List Archives	26	358	December 18, 2017
[Debuginfo][DWARF][LLD] Remove obsolete debug info in lld LLVM Dev List Archives	53	292	August 14, 2020
[RFC] - Deduplication of debug information in linkers (LLD) LLVM Dev List Archives	2	92	December 5, 2017
[LLD] Support DWARF64, debug_info "sorting" LLVM Dev List Archives	1	90	November 11, 2020
[DebugInfo][DWARFv5][LLD] .debug_names with fdebug-type-sections LLVM Project debuginfo , llvm	42	1110	January 23, 2024

[RFC] - Deduplication of debug information in linkers (LLD).

Related topics