[Debuginfo][DWARF][LLD] Remove obsolete debug info in lld

Folks, we work on optimization of binary size and improvement of debug info quality.
To reduce the size of the binary we use -ffunction-sections so that unused code would be garbage collected.
When the linker does garbage collection, a lot of abandoned debug info is left behind.
Besides inflated debug info size, we ended up with overlapping address ranges and no way to say valid vs garbage ranges(D59553).
To resolve these two problems, we use implementation extracted from dsymutil https://reviews.llvm.org/D74169.
It adds --gc-debuginfo command line option to the linker to remove obsolete debug info.
Currently, it has the following limitations: does not support DWARF5, modules, -fdebug-types-section, type units, .debug_types, multiple .debug_info sections, split DWARF, thin lto.

Following are size/performance results for the D74169:

B: --function-sections --gc-sections --gc-debuginfo
C: --function-sections --gc-sections --fdebug-types-section
D: --function-sections --gc-sections --gsplit-dwarf
E: --function-sections --gc-sections --gc-debuginfo --compress-debug-sections=zlib

LLVM code base:

Hi Alexey,

Regarding the link performance timings, have you tried profiling to see if there are any obvious performance improvements that could be made? A slow down of 7x seems like an awfully large amount given what this should be doing after all. Also, do you have an idea whether the slow down is exponential for the size/linear etc?

The problem is that if it is opt-in, but the link time cost is so high, it may put people off ever enabling it, which would be a shame, as the debugger load time improvements seem worthwhile having.

James

Broad question: Do you have any specific motivation/users/etc in implementing this (if you can speak about it)? - it might help motivate the work, understand what tradeoffs might be suitable for you/your users, etc.

In general, in the current state, I don’t have strong feelings either way about this going in as-is with the intent to improve it to make it more viable - or some of that work being done out-of-tree until it’s a more viable performance tradeoff. Mostly happy to leave that up to folks more involved with lld.

A couple of minor points…

Folks, we work on optimization of binary size and improvement of debug info quality.
To reduce the size of the binary we use -ffunction-sections so that unused code would be garbage collected.
When the linker does garbage collection, a lot of abandoned debug info is left behind.
Besides inflated debug info size, we ended up with overlapping address ranges and no way to say valid vs garbage ranges(D59553).
To resolve these two problems, we use implementation extracted from dsymutil https://reviews.llvm.org/D74169.
It adds --gc-debuginfo command line option to the linker to remove obsolete debug info.
Currently, it has the following limitations: does not support DWARF5, modules, -fdebug-types-section, type units, .debug_types,

These last 3 ^ are all the same thing, FWIW. (well, in DWARFv5 they go in debug_info, but it’s the same feature)

multiple .debug_info sections, split DWARF, thin lto.

Following are size/performance results for the D74169:

A: --function-sections --gc-sections
B: --function-sections --gc-sections --gc-debuginfo
C: --function-sections --gc-sections --fdebug-types-section

^ not sure of the point of testing/showing comparisons with a situation that’s currently unsupported

D: --function-sections --gc-sections --gsplit-dwarf
E: --function-sections --gc-sections --gc-debuginfo --compress-debug-sections=zlib

LLVM code base:

Options | build time | bin size | lib size |


A | 54min(100%) | 19.0G(100%) | 15.0G(100.0%) |


B | 65min(120%) | 9.7G( 51%) | 12.0G( 80.0%) |


C | 53min( 98%) | 12.0G( 63%) | 15.0G(100.0%) |


D | 52min( 96%) | 12.0G( 63%) | 8.2G( 55.0%) |


E | 64min(118%) | 5.3G( 28%) | 12.0G( 80.0%) |


Clang binary:

Options | size | link time | used memory |


A | 1.50G(100%) | 9sec(100%) | 9307MB(100%) |


B | 0.76G( 50%) | 68sec(755%) | 15055MB(161%) |


C | 0.82G( 54%) | 8sec( 89%) | 8402MB( 90%) |


D | 0.96G( 64%) | 6sec( 67%) | 4273MB( 46%) |


E | 0.43G( 29%) | 77sec(855%) | 15000MB(161%) |


lldb loading time:

Options | time | used memory |


A | 6.4sec(100%) | 1495MB(100%) |


B | 4.0sec( 63%) | 826MB( 55%) |


C | 3.7sec( 58%) | 877MB( 59%) |


D | 4.3sec( 67%) | 1023MB( 69%) |


E | 2.1sec( 33%) | 478MB( 32%) |


I want to discuss the results and to decide whether it is worth to integrate of D74169:

improvements:

  1. Reduces the size of debug info(50%).
  2. Resolves overlapping of address ranges(D59553).
  3. Reduced size of debug info allows tools to work faster and to require less memory.

drawbacks and not implemented features:

  1. linking time is increased(755%).

The --gc-debuginfo option is off by default. So it would affect only those who need it and explicitly specified it.

I think the current DWARFLinker code could be optimized more to improve performance results.

  1. Support of type units.

That could be implemented further.

Enabling type units increases object size to make it easier to deduplicate at link time by a DWARF-unaware linker. With a DWARF aware linker it’d be generally desirable not to have to add that object size overhead to get the linking improvements.

  1. DWARF5.

Current DWARFEmitter/DWARFStreamer has an implementation for DWARF generation, which does not support
DWARF5(only debug_names table). At the same time, there already exists code in CodeGen/AsmPrinter/DwarfDebug.h,
which implements most of DWARF5. It seems that DWARFEmitter/DWARFStreamer should be rewritten using
DwarfDebug/DwarfFile. Though I am not sure whether it would be easy to re-use DwarfDebug/DwarfFile.
It would probably be necessary to separate some intermediate level of DwarfDebug/DwarfFile.

  1. split DWARF support.

This solution does not work with split DWARF currently. But it could be useful for the split dwarf in two ways:

a) The generation of skeleton file could be changed in such a way that address ranges pointing to garbage
collected code would be replaced with lowpc=0, highpc=0. That would solve the problem of overlapping address
ranges(D59553).

This wouldn’t/couldn’t completely address the issue - because some address ranges would be in the .dwo files the linker can’t see - and they’d still end up with the interesting address ranges.

b) The approach similar to dsymutil implementation could be used to generate monolithic debuginfo created
from .dwo files. That suggestion is from - https://reviews.llvm.org/D74169#1888386.
i.e., DWARFLinker could be taught to generate the same output as D74169 but for split DWARF as the source.

  1. -fmodules-debuginfo

That problem was described in this review - https://reviews.llvm.org/D54747#1505462 . Currently, DWARFLinker/dsymutil has the same problem. It could be solved using the fact that DWARFLinker analyzes debuginfo. It could recognize debug info generated for the module and keep it(compile units containing debug info for modules do not have low_pc, high_pc).

  1. -flto=thin

That problem was described in this review https://reviews.llvm.org/D54747#1503720. It also exists in current DWARFLinker/dsymutil implementation. I think that problem should be discussed more: it could probably be fixed by avoiding generation of such incomplete declaration during thinlto,

That would be costly to produce extra/redundant debug info in ThinLTO - actually ThinLTO could be doing more to reduce that redundancy early on (actually removing definitions from some llvm Modules if the type definition is known to exist in another Module, etc)

I don’t know if it’s a problem since that patch was reverted.

Hi Alexey,

Hi James, Thank you for your comments. Please, find my answers below:

Regarding the link performance timings, have you tried profiling to see if there are any obvious performance >improvements that could be made? A slow down of 7x seems like an awfully large amount given what this >should be doing after all.

I do not see “easy to fix” alternatives. But there are some posibilities to improve performance:

  1. ~10% improvement could probably be achieved by optimizing string pools
    (NonRelocatableStringpool/DwarfStringPool).

Measurements show that it is spent ~10 sec in llvm::StringMapImpl::LookupBucketFor(). The problem
is that the same strings, again and again, are added to the string pool. Two attributes
having the same string value would be analyzed (hash calculated) and searched inside
the string pool. Even if these strings are already in string table(DW_FORM_strp, DW_FORM_strx).
The process could be optimized for string tables. So that if some string from the string table were
accessed previously then, it would keep a reference into the string pool. This would eliminate
a lot of string pool searches.

  1. ~20-30% improvement by processing each object file in parallel.

Currently, all object files are analyzed sequentially and cloned sequentially.
Cloning is started in parallel with analyzing. That scheme could be changed:
analyzing and cloning could be done in parallel for each object file.
That requires refactoring of DWARFLinker and making string pools and DeclContextTree
thread-safe.

  1. ~10-20% improvement by support type units.

Currently, dsymutil/DWARFLinker does not support type units. If type units would be supported, then the “analyzing” step could be skipped for significant part of debug info data. This would save time.

  1. ~2-3% improvement could probably be achieved by optimizing DWARF parser classes.
    Following is a list of ideas:

https://reviews.llvm.org/D78672#inline-720056
https://reviews.llvm.org/D78672#2000012
https://reviews.llvm.org/D78672#2000363.

Also, do you have an idea whether the slow down is exponential for the size/linear etc?

It is linear. Following is the data for different runs(Output size is the size of overall binary) :

Hi David, Excuse me for delayed answer. It took some time to prepare. Please, find the answers bellow…

Broad question: Do you have any specific motivation/users/etc in implementing this (if you can speak about it)?

  • it might help motivate the work, understand what tradeoffs might be suitable for you/your users, etc.

There are two general requirements:

  1. Remove (or clean) invalid debug info.
  2. Optimize the DWARF size.

The specifics which our users have:

  • embedded platform which uses 0 as start of .text section.
  • custom toolset which does not support all features yet(f.e. split dwarf).
  • tolerant of the link-time increase.
  • need a useful way to share debug builds.

For the first point: we have a problem “Overlapping address ranges starting from 0”(D59553).
We use custom solution, but the general solution like D74169 would be better here.

For the second point: split dwarf could be a good alternative to have debug info with minimal size.
Still, it has drawbacks (not supported by tools currently, does not solve the “Overlapping address ranges”
problem, not very convenient to share(even using .dwp)).

Thus in long terms, the D74169 looks to be a good solution for us: resolves “Overlapping address ranges”
problem, binary with minimal size, supported by current tools, easy to share debug build(single binary with
minimal size).

In general, in the current state, I don’t have strong feelings either way about this going in as-is with the intent to >improve it to make it more viable - or some of that work being done out-of-tree until it’s a more viable >performance tradeoff. Mostly happy to leave that up to folks more involved with lld.

A couple of minor points…

C: --function-sections --gc-sections --fdebug-types-section

^ not sure of the point of testing/showing comparisons with a situation that’s currently unsupported

that situation is currently supported(–gc-debuginfo is not used in this measurement).
“–fdebug-types-section” is supported functionality.
The purpose of these data is to compare results for “–fdebug-types-section” and “–gc-debuginfo”.

  1. Support of type units.

That could be implemented further.

Enabling type units increases object size to make it easier to deduplicate at link time by a DWARF-unaware
linker. With a DWARF aware linker it’d be generally desirable not to have to add that object size overhead to
get the linking improvements.

But, DWARFLinker should adequately work with type units since they are already implemented.
If someone uses --fdebug-types-section, then it should adequately work when used together
with --gc-debuginfo(if --gc-debuginfo would be accepted).
Right?

Another thing is that the idea behind type units has the potential to help Dwarf-aware linker to work faster.
Currently, DWARFLinker analyzes context to understand whether types are the same or not.
But the context is known when types are generated. So, no need to spent the time analyzing it.
If types could be compared without analyzing context, then Dwarf-aware linker would work faster.
That is just an idea(not for immediate implementation): If types would be stored in some “type table”
(instead of COMDAT section group) and could be accessed through hash-id(like type units)

  • then it would be the solution requiring fewer bits to store but allowing to compare types
    by hash-id(not analysing context).
    In this case, size increasing would be small. And processing time could be done faster.

this is just an idea and could be discussed separately from the problem of integrating of D74169.

  1. split DWARF support.

This solution does not work with split DWARF currently. But it could be useful for the split dwarf in two ways:
a) The generation of skeleton file could be changed in such a way that address ranges pointing to garbage

collected code would be replaced with lowpc=0, highpc=0. That would solve the problem of overlapping

address ranges(D59553).

This wouldn’t/couldn’t completely address the issue - because some address ranges would be in the .dwo files >the linker can’t see - and they’d still end up with the interesting address ranges.

I see, Thank you. Thus it would not be a complete solution.

  1. -flto=thin

That problem was described in this review https://reviews.llvm.org/D54747#1503720. It also exists in

current DWARFLinker/dsymutil implementation. I think that problem should be discussed more: it could

probably be fixed by avoiding generation of such incomplete declaration during thinlto,

That would be costly to produce extra/redundant debug info in ThinLTO - actually ThinLTO could be doing
more to reduce that redundancy early on (actually removing definitions from some llvm Modules if the type
definition is known to exist in another Module, etc)

I don’t know if it’s a problem since that patch was reverted.

Yes. That patch was reverted, but this patch(D74169) has the same problem.
if D74169 would be applied and --gc-debuginfo used then structure type
definition would be removed.

DWARFLinker could handle that case - “removing definitions from some llvm Modules if the type
definition is known to exist in another Module”.
i.e. DWARFLinker could replace the declaration with the definition.

But that problem could be more easily resolved when debug info is generated(probably without
significant increase of debug info size):

Let`s check the example:

0x0000000b: DW_TAG_compile_unit
DW_AT_low_pc (0x0000000000201700)
DW_AT_high_pc (0x0000000000201719)

0x0000002a: DW_TAG_subprogram
0x00000043: DW_TAG_inlined_subroutine
DW_AT_abstract_origin (0x0000000000000086 “_Z1fv”)
DW_AT_low_pc (0x0000000000201700)
DW_AT_high_pc (0x0000000000201718)

0x00000057: DW_TAG_variable
DW_AT_abstract_origin (0x0000000000000096 “var”)
0x00000065: NULL

0x00000073: DW_TAG_compile_unit
DW_AT_stmt_list (0x00000080)

0x00000086: DW_TAG_subprogram
DW_AT_name (“f”)
DW_AT_inline (DW_INL_inlined)

0x00000096: DW_TAG_variable
DW_AT_name (“var”)
DW_AT_type (0x000000a9 “volatile Foo”)
0x000000a1: NULL

0x000000a9: DW_TAG_volatile_type
DW_AT_type (0x000000ae “Foo”)

0x000000ae: DW_TAG_structure_type
DW_AT_name (“Foo”)
DW_AT_declaration (true)

0x000000c1: DW_TAG_compile_unit
DW_AT_low_pc (0x0000000000000000)
DW_AT_high_pc (0x0000000000000019)

0x000000e0: DW_TAG_subprogram
DW_AT_low_pc (0x0000000000000000)
DW_AT_high_pc (0x0000000000000019)
DW_AT_name (“f”)

0x000000fd: DW_TAG_variable
DW_AT_name (“var”)
DW_AT_type (0x00000119 “volatile Foo”)

0x00000119: DW_TAG_volatile_type
DW_AT_type (0x0000011e “Foo”)

0x0000011e: DW_TAG_structure_type
DW_AT_name (“Foo”)
DW_AT_decl_line (1)

Here we have:

DW_TAG_compile_unit(0x0000000b) - compile unit containing concrete instance for function “f”.
DW_TAG_compile_unit(0x00000073) - compile unit containing abstract instance root for function “f”.
DW_TAG_compile_unit(0x000000c1) - compile unit containing function “f” definition.

Code for function “f” was deleted. gc-debuginfo deletes compile unit DW_TAG_compile_unit(0x000000c1)
containing “f” definition (since there is no corresponding code). But it has structure “Foo” definition
DW_TAG_structure_type(0x0000011e) referenced from DW_TAG_compile_unit(0x00000073)
by declaration DW_TAG_structure_type(0x000000ae). That declaration is exactly the case when definition
was removed by thinlto and replaced with declaration.

Would it cost too much if type definition would not be replaced with declaration for “abstract instance root”?
The number of concrete instances is bigger than number of abstract instance roots.
Probably, it would not be too costly to leave definition in abstract instance root?

Alternatively, Would it cost too much if type definition would not be replaced with declaration when declaration references type from not used function? (lto could understand that concrete function is not used).

Thank you, Alexey.

Hi David, Excuse me for delayed answer. It took some time to prepare. Please, find the answers bellow…

Broad question: Do you have any specific motivation/users/etc in implementing this (if you can speak about it)?

  • it might help motivate the work, understand what tradeoffs might be suitable for you/your users, etc.

There are two general requirements:

  1. Remove (or clean) invalid debug info.

Perhaps a simpler direct solution for your immediate needs might be a much narrower, and more efficient linker-DWARF-awareness feature:

With DWARFv5, rnglists present an opportunity for a DWARF linker to rewrite the ranges without parsing the rest of the DWARF. /technically/ this isn’t guaranteed - rnglist entries can be referenced either directly, or by index. If all rnglists are referenced by index, then a linker could parse only the debug_rnglists section and rewrite ranges to remove any address ranges that refer to optimized-out code.

This would only be correct for rnglists that had no direct references to them (that only were referenced via the indexes) - but we could either implement it with that assumption, or could add an LLVM extension attribute on the CU that would say "I promise I only referenced rnglists via rnglistx forms/indexes). If this DWARF-aware linking would have to read the CU DIE (not all the other DIEs) it /could/ also then rewrite high/low_pc if the CU wasn’t using ranges… but that wouldn’t come up in the function-removal case, because then you’d have ranges anyway, so no need for that.

Such a DWARF-aware rnglist linking could also simplify rnglists, in cases where functions ended up being laid out next to each other, the linker could coalesce their ranges together.

I imagine this could be implemented with very little overhead to linking, especially compared to the overhead of full DWARF-aware linking.

Though none of this fixes Split DWARF, where the linker doesn’t get a chance to see the addresses being used - but if you only want/need the CU-level ranges to be correct, this might be a viable fix, and quite efficient.

  1. Optimize the DWARF size.

Do your users care much about this? I imagine if they had significant DWARF size issues, they’d have significant link time issues and the kind of cost to link time this feature has would be prohibitive - but perhaps they’re sharing linked binaries much more often than they’re actually performing linking.

The specifics which our users have:

  • embedded platform which uses 0 as start of .text section.
  • custom toolset which does not support all features yet(f.e. split dwarf).
  • tolerant of the link-time increase.
  • need a useful way to share debug builds.

Sharing two files (executable and dwp) is significantly less useful than sharing one file?

For the first point: we have a problem “Overlapping address ranges starting from 0”(D59553).
We use custom solution, but the general solution like D74169 would be better here.

If CU ranges are the only ones that need fixing, then I think the above solution might be as good/better - if more than CU ranges need fixing, then I think we might want to start talking about how to fix DWARF itself (split and non-split) to signal certain addresses point to dead code with a specific blessed value that linkers would need to implement - because with Split DWARF there’s no way to solve the non-CU addresses at the linker.

For the second point: split dwarf could be a good alternative to have debug info with minimal size.
Still, it has drawbacks (not supported by tools currently, does not solve the “Overlapping address ranges”
problem, not very convenient to share(even using .dwp)).

Thus in long terms, the D74169 looks to be a good solution for us: resolves “Overlapping address ranges”
problem, binary with minimal size, supported by current tools, easy to share debug build(single binary with
minimal size).

In general, in the current state, I don’t have strong feelings either way about this going in as-is with the intent to >improve it to make it more viable - or some of that work being done out-of-tree until it’s a more viable >performance tradeoff. Mostly happy to leave that up to folks more involved with lld.

A couple of minor points…

C: --function-sections --gc-sections --fdebug-types-section

^ not sure of the point of testing/showing comparisons with a situation that’s currently unsupported

that situation is currently supported(–gc-debuginfo is not used in this measurement).

Ah, I was confused because it looks like/the description said it was…

“–fdebug-types-section” is supported functionality.
The purpose of these data is to compare results for “–fdebug-types-section” and “–gc-debuginfo”.

OK

  1. Support of type units.

That could be implemented further.

Enabling type units increases object size to make it easier to deduplicate at link time by a DWARF-unaware
linker. With a DWARF aware linker it’d be generally desirable not to have to add that object size overhead to
get the linking improvements.

But, DWARFLinker should adequately work with type units since they are already implemented.

Maybe - it’d be nice & all, but I don’t think it’s an outright necessity - if someone knows they’re using a DWARF-aware linker, they’d probably not use type units in their object files. It’s possible someone doesn’t know for sure & maybe they have pre-canned debug object files from someone else, etc.

If someone uses --fdebug-types-section, then it should adequately work when used together
with --gc-debuginfo(if --gc-debuginfo would be accepted).
Right?

Another thing is that the idea behind type units has the potential to help Dwarf-aware linker to work faster.
Currently, DWARFLinker analyzes context to understand whether types are the same or not.

When you say “analyzes context” what do you mean? Usually I’d take that to mean “looks at things outside the type itself - like what namespace it’s in, etc” - which, yes, it should do that, but it doesn’t seem very expensive to do. But I guess you actually mean something about doing structural equivalence in some way, looking at things inside the type?

But the context is known when types are generated. So, no need to spent the time analyzing it.
If types could be compared without analyzing context, then Dwarf-aware linker would work faster.
That is just an idea(not for immediate implementation): If types would be stored in some “type table”
(instead of COMDAT section group) and could be accessed through hash-id(like type units)

  • then it would be the solution requiring fewer bits to store but allowing to compare types
    by hash-id(not analysing context).
    In this case, size increasing would be small. And processing time could be done faster.

this is just an idea and could be discussed separately from the problem of integrating of D74169.

  1. split DWARF support.

This solution does not work with split DWARF currently. But it could be useful for the split dwarf in two ways:
a) The generation of skeleton file could be changed in such a way that address ranges pointing to garbage

collected code would be replaced with lowpc=0, highpc=0. That would solve the problem of overlapping

address ranges(D59553).

This wouldn’t/couldn’t completely address the issue - because some address ranges would be in the .dwo files >the linker can’t see - and they’d still end up with the interesting address ranges.

I see, Thank you. Thus it would not be a complete solution.

  1. -flto=thin

That problem was described in this review https://reviews.llvm.org/D54747#1503720. It also exists in

current DWARFLinker/dsymutil implementation. I think that problem should be discussed more: it could

probably be fixed by avoiding generation of such incomplete declaration during thinlto,

That would be costly to produce extra/redundant debug info in ThinLTO - actually ThinLTO could be doing
more to reduce that redundancy early on (actually removing definitions from some llvm Modules if the type
definition is known to exist in another Module, etc)

I don’t know if it’s a problem since that patch was reverted.

Yes. That patch was reverted, but this patch(D74169) has the same problem.
if D74169 would be applied and --gc-debuginfo used then structure type
definition would be removed.

DWARFLinker could handle that case - “removing definitions from some llvm Modules if the type
definition is known to exist in another Module”.
i.e. DWARFLinker could replace the declaration with the definition.

But that problem could be more easily resolved when debug info is generated(probably without
significant increase of debug info size):

Let`s check the example:

0x0000000b: DW_TAG_compile_unit
DW_AT_low_pc (0x0000000000201700)
DW_AT_high_pc (0x0000000000201719)

0x0000002a: DW_TAG_subprogram
0x00000043: DW_TAG_inlined_subroutine
DW_AT_abstract_origin (0x0000000000000086 “_Z1fv”)
DW_AT_low_pc (0x0000000000201700)
DW_AT_high_pc (0x0000000000201718)

0x00000057: DW_TAG_variable
DW_AT_abstract_origin (0x0000000000000096 “var”)
0x00000065: NULL

0x00000073: DW_TAG_compile_unit
DW_AT_stmt_list (0x00000080)

0x00000086: DW_TAG_subprogram
DW_AT_name (“f”)
DW_AT_inline (DW_INL_inlined)

0x00000096: DW_TAG_variable
DW_AT_name (“var”)
DW_AT_type (0x000000a9 “volatile Foo”)
0x000000a1: NULL

0x000000a9: DW_TAG_volatile_type
DW_AT_type (0x000000ae “Foo”)

0x000000ae: DW_TAG_structure_type
DW_AT_name (“Foo”)
DW_AT_declaration (true)

0x000000c1: DW_TAG_compile_unit
DW_AT_low_pc (0x0000000000000000)
DW_AT_high_pc (0x0000000000000019)

0x000000e0: DW_TAG_subprogram
DW_AT_low_pc (0x0000000000000000)
DW_AT_high_pc (0x0000000000000019)
DW_AT_name (“f”)

0x000000fd: DW_TAG_variable
DW_AT_name (“var”)
DW_AT_type (0x00000119 “volatile Foo”)

0x00000119: DW_TAG_volatile_type
DW_AT_type (0x0000011e “Foo”)

0x0000011e: DW_TAG_structure_type
DW_AT_name (“Foo”)
DW_AT_decl_line (1)

Here we have:

DW_TAG_compile_unit(0x0000000b) - compile unit containing concrete instance for function “f”.
DW_TAG_compile_unit(0x00000073) - compile unit containing abstract instance root for function “f”.
DW_TAG_compile_unit(0x000000c1) - compile unit containing function “f” definition.

Code for function “f” was deleted. gc-debuginfo deletes compile unit DW_TAG_compile_unit(0x000000c1)
containing “f” definition (since there is no corresponding code). But it has structure “Foo” definition
DW_TAG_structure_type(0x0000011e) referenced from DW_TAG_compile_unit(0x00000073)
by declaration DW_TAG_structure_type(0x000000ae). That declaration is exactly the case when definition
was removed by thinlto and replaced with declaration.

Would it cost too much if type definition would not be replaced with declaration for “abstract instance root”?
The number of concrete instances is bigger than number of abstract instance roots.
Probably, it would not be too costly to leave definition in abstract instance root?

Alternatively, Would it cost too much if type definition would not be replaced with declaration when declaration references type from not used function? (lto could understand that concrete function is not used).

I don’t follow this example - could you provide a small concrete test case I could reproduce?

Oh, I guess this is happening perhaps because ThinLTO can’t know for sure that a standalone definition of ‘f’ won’t be needed - so it produces one in case one of the inlining opportunities doesn’t end up inlining. Then it turns out all calls got inlined, so the external definition wasn’t needed.

Oh, you’re suggesting that these 3 CUs got emitted into one object file during LTO, but that DWARFLinker drops a CU without any code in it - even though… So far as I know, in LTO, LLVM directly references types across units if the CUs are all emitted in the same object file. (and if they weren’t in the same object file - then the abstract_origin couldn’t be pointing cross-CU).

I guess some basic things to say:

With ThinLTO, the concrete/standalone function definition is emitted in case some call sites don’t end up being inlined. So we know it’ll be emitted (but might not be needed by the actual linker)
ANy number of inline calls might exist - but we shouldn’t put the type information into those, because they aren’t guaranteed to emit it (if the inline function gets optimized away, there would be nothing to enforce the type being emitted) - and even if we forced the type information to be emitted into one object file that has an inline copy of the function - there’s no guarantee that object file will get linked in either.

So, no, I don’t think there’s much we can do to keep the size of object files down, while guaranteeing the type information will be emitted with the usual linker semantics.

Hi David, please find my comments inside:

Broad question: Do you have any specific motivation/users/etc in implementing this (if you can speak about it)?

  • it might help motivate the work, understand what tradeoffs might be suitable for you/your users, etc.

There are two general requirements:

  1. Remove (or clean) invalid debug info.

Perhaps a simpler direct solution for your immediate needs might be a much narrower,
and more efficient linker-DWARF-awareness feature:

With DWARFv5, rnglists present an opportunity for a DWARF linker to rewrite the ranges
without parsing the rest of the DWARF. /technically/ this isn’t guaranteed - rnglist entries
can be referenced either directly, or by index. If all rnglists are referenced by index, then
a linker could parse only the debug_rnglists section and rewrite ranges to remove any
address ranges that refer to optimized-out code.

This would only be correct for rnglists that had no direct references to them (that only were
referenced via the indexes) - but we could either implement it with that assumption, or could
add an LLVM extension attribute on the CU that would say "I promise I only referenced rnglists
via rnglistx forms/indexes). If this DWARF-aware linking would have to read the CU DIE (not
all the other DIEs) it /could/ also then rewrite high/low_pc if the CU wasn’t using ranges…
but that wouldn’t come up in the function-removal case, because then you’d have ranges anyway,
so no need for that.

Such a DWARF-aware rnglist linking could also simplify rnglists, in cases where functions
ended up being laid out next to each other, the linker could coalesce their ranges together.

I imagine this could be implemented with very little overhead to linking, especially compared
to the overhead of full DWARF-aware linking.

Though none of this fixes Split DWARF, where the linker doesn’t get a chance to see the
addresses being used - but if you only want/need the CU-level ranges to be correct, this
might be a viable fix, and quite efficient.

Yes, we think about that alternative. This would resolve our problem of invalid debug info
and would work much faster. Thus, if we would not have good results for D74169 then we
will implement it. Do you think it could be useful to have this solution in upstream?

  1. Optimize the DWARF size.

Do your users care much about this? I imagine if they had significant DWARF size issues,
they’d have significant link time issues and the kind of cost to link time this feature has would
be prohibitive - but perhaps they’re sharing linked binaries much more often than they’re
actually performing linking.

Yes, they do. They also have significant link-time issues.
So current performance results of D74169 are not very acceptable.
We hope to improve it.

The specifics which our users have:

  • embedded platform which uses 0 as start of .text section.
  • custom toolset which does not support all features yet(f.e. split dwarf).
  • tolerant of the link-time increase.
  • need a useful way to share debug builds.

Sharing two files (executable and dwp) is significantly less useful than sharing one file?

Probably not significantly, but yes, it looks less useful comparing to D74169.
Having only two files (executable and .dwp) looks significantly better than having executable and multiple .dwo files.
Having only one file(executable) with minimal size looks better than the two files with a bigger size.

clang compiled with -gsplitdwarf takes 0.9G for executable and 0.9G for .dwp.
clang compiled with -gc-debuginfo takes only 0.76G for single executable.

For the first point: we have a problem “Overlapping address ranges starting from 0”(D59553).

We use custom solution, but the general solution like D74169 would be better here.

If CU ranges are the only ones that need fixing, then I think the above solution might be as
good/better - if more than CU ranges need fixing, then I think we might want to start talking about
how to fix DWARF itself (split and non-split) to signal certain addresses point to dead code with a
specific blessed value that linkers would need to implement - because with Split DWARF there’s
no way to solve the non-CU addresses at the linker.

I think the worthful solution for that signal value would be LowPC > HighPC.
That does not require additional bits in DWARF.
It would be natural to skip such address ranges since they explicitly marked as invalid.
It could be implemented in a linker very easily. Probably, it would make sense to describe that
usage in DWARF standard.

As to the addresses which are not seen by the linker(since they are in .dwo files) - yes,
they need to have another solution. Could you show an example of such a case, please?

  1. Support of type units.

That could be implemented further.

Enabling type units increases object size to make it easier to deduplicate at link time by a DWARF-unaware

linker. With a DWARF aware linker it’d be generally desirable not to have to add that object size overhead to

get the linking improvements.

But, DWARFLinker should adequately work with type units since they are already implemented.

Maybe - it’d be nice & all, but I don’t think it’s an outright necessity - if someone knows they’re using
a DWARF-aware linker, they’d probably not use type units in their object files. It’s possible someone
doesn’t know for sure & maybe they have pre-canned debug object files from someone else, etc.

I see.

Another thing is that the idea behind type units has the potential to help Dwarf-aware linker to work faster.

Currently, DWARFLinker analyzes context to understand whether types are the same or not.

When you say “analyzes context” what do you mean? Usually I’d take that to mean
“looks at things outside the type itself - like what namespace it’s in, etc” - which, yes,
it should do that, but it doesn’t seem very expensive to do. But I guess you actually
mean something about doing structural equivalence in some way, looking at things inside the type?

I think it could be useful for both cases. Currently, dsymutil does only first thing
(look at type name, namespace name, etc…) and does not do the second thing
(doing structural equivalence). Analyzing type names is currently quite expensive
(the only search in string pool takes ~10 sec from 70 sec of overall time).
That is expensive because of many things should be done to work with strings:
parse DWARF, search and resolve relocations, compute a hash for strings,
put data into a string pool, create a fully qualified name(like namespace::function::name).
It looks like it could be optimized and finally require less time, but it still would be a noticeable
part of the overall time.

If dsymutil starts to check for the structural equivalence, then the process would be even more slowly.
So, If instead of comparing types structure, there would be checked single hash-id - then this process
would also be faster.

Thus I think using hash-id to compare types would allow to make current implementation faster and would
allow handling incomplete types by DWARFLinker without massive performance degradation also.

But the context is known when types are generated. So, no need to spent the time analyzing it.

If types could be compared without analyzing context, then Dwarf-aware linker would work faster.

That is just an idea(not for immediate implementation): If types would be stored in some “type table”

(instead of COMDAT section group) and could be accessed through hash-id(like type units

  • then it would be the solution requiring fewer bits to store but allowing to compare types

by hash-id(not analysing context).
In this case, size increasing would be small. And processing time could be done faster.

this is just an idea and could be discussed separately from the problem of integrating of D74169.

  1. -flto=thin

That problem was described in this review https://reviews.llvm.org/D54747#1503720. It also exists in

current DWARFLinker/dsymutil implementation. I think that problem should be discussed more: it could

probably be fixed by avoiding generation of such incomplete declaration during thinlto,

That would be costly to produce extra/redundant debug info in ThinLTO - actually ThinLTO could be doing

more to reduce that redundancy early on (actually removing definitions from some llvm Modules if the type

definition is known to exist in another Module, etc)
I don’t know if it’s a problem since that patch was reverted.

Yes. That patch was reverted, but this patch(D74169) has the same problem.

if D74169 would be applied and --gc-debuginfo used then structure type
definition would be removed.

DWARFLinker could handle that case - “removing definitions from some llvm Modules if the type
definition is known to exist in another Module”.
i.e. DWARFLinker could replace the declaration with the definition.

But that problem could be more easily resolved when debug info is generated(probably without
significant increase of debug info size):

Here we have:

DW_TAG_compile_unit(0x0000000b) - compile unit containing concrete instance for function “f”.
DW_TAG_compile_unit(0x00000073) - compile unit containing abstract instance root for function “f”.
DW_TAG_compile_unit(0x000000c1) - compile unit containing function “f” definition.

Code for function “f” was deleted. gc-debuginfo deletes compile unit DW_TAG_compile_unit(0x000000c1)
containing “f” definition (since there is no corresponding code). But it has structure “Foo” definition
DW_TAG_structure_type(0x0000011e) referenced from DW_TAG_compile_unit(0x00000073)
by declaration DW_TAG_structure_type(0x000000ae). That declaration is exactly the case when definition
was removed by thinlto and replaced with declaration.

Would it cost too much if type definition would not be replaced with declaration for “abstract instance root”?
The number of concrete instances is bigger than number of abstract instance roots.
Probably, it would not be too costly to leave definition in abstract instance root?

Alternatively, Would it cost too much if type definition would not be replaced with declaration when
declaration references type from not used function? (lto could understand that concrete function is not used).

I don’t follow this example - could you provide a small concrete test case I could reproduce?

I would provide a test case if necessary. But it looks like this issue is finally clear, and you already commented on that.

Oh, I guess this is happening perhaps because ThinLTO can’t know for sure that a standalone
definition of ‘f’ won’t be needed - so it produces one in case one of the inlining opportunities
doesn’t end up inlining. Then it turns out all calls got inlined, so the external definition wasn’t needed.

Oh, you’re suggesting that these 3 CUs got emitted into one object file during LTO, but that DWARFLinker
drops a CU without any code in it - even though… So far as I know, in LTO, LLVM directly references
types across units if the CUs are all emitted in the same object file. (and if they weren’t in the same
object file - then the abstract_origin couldn’t be pointing cross-CU).

I guess some basic things to say:

With ThinLTO, the concrete/standalone function definition is emitted in case some call sites don’t end up
being inlined. So we know it’ll be emitted (but might not be needed by the actual linker)
ANy number of inline calls might exist - but we shouldn’t put the type information into those, because
they aren’t guaranteed to emit it (if the inline function gets optimized away, there would be nothing to
enforce the type being emitted) - and even if we forced the type information to be emitted into one
object file that has an inline copy of the function - there’s no guarantee that object file will get linked in either.

So, no, I don’t think there’s much we can do to keep the size of object files down, while guaranteeing
the type information will be emitted with the usual linker semantics.

Then dsymutil/DWARFLinker could be changed to handle that(though it would probably be not very efficient).
If thinlto would understand that function is not used finally(and then must not contain referenced type definition),
then this situation could be handled more effectively.

Thank you, Alexey.

Hi David, please find my comments inside:

>>>Broad question: Do you have any specific motivation/users/etc in implementing this (if you can speak about it)?

>>> - it might help motivate the work, understand what tradeoffs might be suitable for you/your users, etc.

>>There are two general requirements:
>> 1) Remove (or clean) invalid debug info.

>
>Perhaps a simpler direct solution for your immediate needs might be a much narrower,
>and more efficient linker-DWARF-awareness feature:
>
> With DWARFv5, rnglists present an opportunity for a DWARF linker to rewrite the ranges
> without parsing the rest of the DWARF. /technically/ this isn't guaranteed - rnglist entries
> can be referenced either directly, or by index. If all rnglists are referenced by index, then
> a linker could parse only the debug_rnglists section and rewrite ranges to remove any
> address ranges that refer to optimized-out code.
>
> This would only be correct for rnglists that had no direct references to them (that only were
> referenced via the indexes) - but we could either implement it with that assumption, or could
> add an LLVM extension attribute on the CU that would say "I promise I only referenced rnglists
> via rnglistx forms/indexes). If this DWARF-aware linking would have to read the CU DIE (not
> all the other DIEs) it /could/ also then rewrite high/low_pc if the CU wasn't using ranges...
> but that wouldn't come up in the function-removal case, because then you'd have ranges anyway,
> so no need for that.
>
> Such a DWARF-aware rnglist linking could also simplify rnglists, in cases where functions
> ended up being laid out next to each other, the linker could coalesce their ranges together.
>
> I imagine this could be implemented with very little overhead to linking, especially compared
> to the overhead of full DWARF-aware linking.
>
>Though none of this fixes Split DWARF, where the linker doesn't get a chance to see the
> addresses being used - but if you only want/need the CU-level ranges to be correct, this
> might be a viable fix, and quite efficient.

Yes, we think about that alternative. This would resolve our problem of invalid debug info
and would work much faster. Thus, if we would not have good results for D74169 then we
will implement it. Do you think it could be useful to have this solution in upstream?

A pure rnglist rewriting - I think it'd be OK to have in upstream -
again, cost/benefit/etc would have to be weighed. I'm not sure it
would save enough space to be particularly valuable beyond the
correctness issue - and it doesn't completely solve the correctness
issue for zero-address usage or low-address usage (because you could
still have overlapping subprograms inside a CU - so if you were
symbolizing you could use the correct rnglist to filter, but then go
look inside the CU only to find two subprograms that had that address
& not know which one was the correct one an which one was the
discarded one).

rnglist rewriting might be easy enough to prototype - but depends what
you want to spend your time on, I know this whole issue has been a
huge investment of your time already - but maybe this recent
revitalization of the conversation around having an explicit value in
the linker might be sufficient to address everyone's needs... *fingers
crossed*)

It makes me sad that the linker (via a library or otherwise) has to be “DWARF-aware” to be able to effectively handle --gc-sections, COMDATs, --icf etc for debug info, without leaving large blocks of data kicking around.

The patching to -1 (or equivalent) is probably a good lightweight solution (though I’d love it if it could be done based on section type in the future rather than section name, but that’s probably outside the realm of DWARF), as it requires only minimal understanding in the linker, but anything beyond that seems to be complicated logic that is mostly due to the structure of DWARF. Patching to -1 does feel a bit like a sticking plaster/band aid to patch over the issue rather than properly solving it too - there will still be debug data (potentially significant amounts in COMDAT-heavy objects) that the linker has to write and the debugger has to somehow know how to skip (even if it knows that -1 is special-case due to the standard being updated, it needs to get as far as the -1), which is all wasted effort.

We’ve already seen from Alexey’s prototyping, and from our own experiences with the Sony proprietary linker (which tried to rewrite .debug_line only) that deconstructing the DWARF so that it can be more optimally reassembled at link time is slow going, and will probably inevitably be however much effort is put into optimising it. For a start, given the current standards, it’s impossible to know how to deconstruct it without having to parse vast amounts of DWARF, which is typically going to mean a lot more parsing work than the linker would normally have to deal with. Additionally, much of this parsing work is wasted effort, since it seems unlikely in many links that large amounts of the DWARF will be redundant. Having an option to opt-in doesn’t help much there, since it just means the logic exists without most people using it, due to it not being good enough, or potentially they don’t even know it exists.

I don’t have particularly concrete suggestions as to how to solve the structural problems with DWARF at this point. The only thing that seems obvious to me is a more “blessed” approach to fragmentation of sections, similar to what I tried with my prototype mentioned earlier in the thread, although we’d need to figure out the previously stated performance issues. Other ideas might tie into this, like somehow sharing the various table headers a bit like CIEs in .eh_frame that could be merged by the linker - each object could have separate table header sections, which are referenced by the individual .debug_* blocks, which in turn are one per function/data piece and easily discardable/merged by the linker.

Just some thoughts.

James

DWARF was designed in an era when COMDAT and ICF were not a thing, or at least not common, certainly not when talking about function code. The overhead of a unit occurred only once per translation unit, so that expense was reasonably amortized.

Splitting functions into their own object-file sections and making them excludable is an evolution of compiler/linker technology that DWARF has not kept up with. The linker-friendly solutions (COMDAT DWARF) would put function-related .debug_* contributions into a section-group along with the function .text itself; this multiplies the total number of sections to deal with, regardless of the tactics used for the content of each per-function DWARF section. The fully DWARF-conformant solution would create one partial_unit per function, with the corresponding overhead of unit headers (especially painful in the .debug_line section). Alternatively we fragment DWARF into sections without headers and rely on the linker to make everything look right in the linked executable; this produces .o files that are not DWARF conformant (unless we can standardize this in DWARF v6) and would be a big hassle for consumers other than the linker.

Or we pay the cost of parsing, trimming, and rewriting all the DWARF in the linker.

–paulr

Splitting functions into their own object-file sections and making them excludable is an evolution of

compiler/linker technology that DWARF has not kept up with. The linker-friendly solutions (COMDAT

DWARF) would put function-related .debug_* contributions into a section-group along with the function

.text itself; this multiplies the total number of sections to deal with, regardless of the tactics used for the

content of each per-function DWARF section. The fully DWARF-conformant solution would create one

partial_unit per function, with the corresponding overhead of unit headers (especially painful in the

.debug_line section). Alternatively we fragment DWARF into sections without headers and rely on the

linker to make everything look right in the linked executable; this produces .o files that are not DWARF

conformant (unless we can standardize this in DWARF v6) and would be a big hassle for consumers

other than the linker.

Or we pay the cost of parsing, trimming, and rewriting all the DWARF in the linker.

Probably we could try to make DWARF easy to parsing, trimming, rewriting so that full DWARF

parsing solution would not take too much time?

f.e. -debug-types-section solution uses COMDAT sections to split and deduplicate types.

That solution works quite fast. It has already mentioned drawback with a big size

overhead(because of section headers/type unit headers sizes). But, the fact that type units

could be identified just by hash-id(without parsing type names and types hierarchies)

allows the linker to reject duplications quickly. Another thing is that the linker drops

duplicated COMDAT sections without any additional check. After duplications are deleted,

the debug info is still consistent.

There could be done DWARF aware solution working using the same two principles:

  1. compare types by hash-id.
  2. drop duplications without analyzing contents.

If all types are put into a separate type table and have hash-id, then it would be much easier to

deduplicate them. The idea demonstrated here - https://reviews.llvm.org/P8164. (It still has a

questions: whether base types should be put into type table, whether references into type table

should be done by DW_AT_signature or just by offset, etc… ) While handling that separate type table

the DWARF aware linker would check the only hash_id and put only one type description

with the same id in the final type table. It also would allow us to solve that -flto=thin problem -

http://lists.llvm.org/pipermail/llvm-dev/2020-May/141938.html (there is dsymutil example there).

i.e., the case when type definition would be removed will not occur.

Thank you, Alexey.

DWARF was designed in an era when COMDAT and ICF were not a thing, or at least not common, certainly not when talking about function code. The overhead of a unit occurred only once per translation unit, so that expense was reasonably amortized.

Splitting functions into their own object-file sections and making them excludable is an evolution of compiler/linker technology that DWARF has not kept up with. The linker-friendly solutions (COMDAT DWARF) would put function-related .debug_* contributions into a section-group along with the function .text itself; this multiplies the total number of sections to deal with, regardless of the tactics used for the content of each per-function DWARF section. The fully DWARF-conformant solution would create one partial_unit per function, with the corresponding overhead of unit headers (especially painful in the .debug_line section). Alternatively we fragment DWARF into sections without headers and rely on the linker to make everything look right in the linked executable; this produces .o files that are not DWARF conformant (unless we can standardize this in DWARF v6) and would be a big hassle for consumers other than the linker.

"object files don't contain DWARF, but they contain stuff that the
linker will turn into DWARF" wouldn't seem like the worst thing to me
- what sort of pre-linking parsing of DWARF use cases do you have in
mind, other than for our own compiler development uses?
(notwithstanding in-object Split DWARF (where the .dwo sections would
have to be remain usable without linking) or the MachO style debug
info distribution model which is similar)

But even then, I'm not sure how viable it would be - as Fangrui
pointed out on another thread about this: ELF section overhead itself
is non-trivial ("sizeof(Elf64_Shdr) = 64.") & it would probably be
rather difficult to reconstruct header-less slice-and-dicable sections
in some cases. For type information (a reduced overhead version of
-fdebug-types-section) I could see it - but for functions, they need
to refer to addresses - preferably in the debug_addr section, and
that's accessed by index, so taking chunks out of it would break other
references to it, etc... adding the header would be expensive, and how
would the CU construct its DW_AT_ranges value if that has to be sliced
and diced? Again, some amount of linker magic might solve some of
these problems - but I think there's still a lot of overhead to making
a solution that's workable with a DWARF-agnostic linker (or even with
a DWARF aware one, but in an efficient amount of time/space where it's
not only usable for small programs, or for linking when you're
shipping a final production binary, etc)

& as always, not sure how any of this would work for Split DWARF -
just a debug_adr section that has some addresses that point to
discardable functions... if we want those addresses themselves to be
discardable (so we don't have to use a tombstone value inserted by the
linker) then they'd need to be in separate debug_addr contributions
with headers, etc - the overhead just seems too high to me in all the
ways I can look at that.

>DWARF was designed in an era when COMDAT and ICF were not a thing, or at least not common,
>certainly not when talking about function code. The overhead of a unit occurred only once per
>translation unit, so that expense was reasonably amortized.
>
>Splitting functions into their own object-file sections and making them excludable is an evolution of
>compiler/linker technology that DWARF has not kept up with. The linker-friendly solutions (COMDAT
>DWARF) would put function-related .debug_* contributions into a section-group along with the function
>.text itself; this multiplies the total number of sections to deal with, regardless of the tactics used for the
> content of each per-function DWARF section. The fully DWARF-conformant solution would create one
> partial_unit per function, with the corresponding overhead of unit headers (especially painful in the

> .debug_line section). Alternatively we fragment DWARF into sections without headers and rely on the
> linker to make everything look right in the linked executable; this produces .o files that are not DWARF
>conformant (unless we can standardize this in DWARF v6) and would be a big hassle for consumers
>other than the linker.
>Or we pay the cost of parsing, trimming, and rewriting all the DWARF in the linker.

Probably we could try to make DWARF easy to parsing, trimming, rewriting so that full DWARF
parsing solution would not take too much time?

f.e. -debug-types-section solution uses COMDAT sections to split and deduplicate types.
That solution works quite fast. It has already mentioned drawback with a big size
overhead(because of section headers/type unit headers sizes). But, the fact that type units
could be identified just by hash-id(without parsing type names and types hierarchies)
allows the linker to reject duplications quickly. Another thing is that the linker drops
duplicated COMDAT sections without any additional check. After duplications are deleted,
the debug info is still consistent.
There could be done DWARF aware solution working using the same two principles:
1. compare types by hash-id.
2. drop duplications without analyzing contents.

If all types are put into a separate type table and have hash-id, then it would be much easier to
deduplicate them. The idea demonstrated here - https://reviews.llvm.org/P8164. (It still has a
questions: whether base types should be put into type table, whether references into type table
should be done by DW_AT_signature or just by offset, etc.. ) While handling that separate type table
the DWARF aware linker would check the only hash_id and put only one type description
with the same id in the final type table. It also would allow us to solve that -flto=thin problem -
http://lists.llvm.org/pipermail/llvm-dev/2020-May/141938.html (there is dsymutil example there).
i.e., the case when type definition would be removed will not occur.

I think there is scope for lower-overhead type deduplication,
especially now with type units being merged into the debug_info
section. Perhaps we could drop dwo_ids and use section references to
refer to types & rely on the linker to keep those referenced sections
alive - though section references are longer than CU-relative
references. (but we need the extra length - because if the linker
deduplicates a type definition - one CU may be referencing a type very
far away, so the shorter reference might be inadequate) I don't think
the indirection through the type hash is /super/ significant to the
cost - I think it's more in the duplication of many DIEs especially
for function definitions (since the type unit sig8 system only
provides a way to reference the type - not its member functions, their
parameters, etc - so all those DIEs get duplicated in any CU that
needs to provide a definition of a member function). We could
prototype cross-unit DIE references to lower the cost of that
duplication, though rumor has it that constructor based type homing
might provide enough value to obviate the need for type units (or at
least make the overhead not worthwhile - so revisiting the overhead to
reduce it might make it worthwhile again... ).

Probably wouldn't be super hard to use LLVM's existing cross-unit DIE
Referencing machinery (implemented for LTO) to refer directly to DIEs
in a type unit without using the signature system... - hmm, that'd
only work if your type unit DIEs were identical? /maybe/ ? Not sure
how that'd work if you wanted to refer into a type unit, but the type
unit got deduplicated. Might be able to rely on the linker to preserve
every unique copy of the type unit that's referenced if we phrase
things carefully - so if your compiler does produce exactly identical
type units they get deduplicated and sec_refs refer to the uniquely
preserved copy - but otherwise it preserves as many distinct copies as
needed. (I don't know enough about how that works to be sure - but I
know that these linkonce/inline function deduplication does seem to
cause the DWARF to refer to the singular function if that function is
identical, and if it isn't, then you get 0 - so there's /something/ in
the linker that can adjust for deduplicating identical duplicates... )

From: David Blaikie <dblaikie@gmail.com>
Sent: Wednesday, June 3, 2020 5:31 PM
To: Robinson, Paul <paul.robinson@sony.com>
Cc: jh7370.2008@my.bristol.ac.uk; llvm-dev@lists.llvm.org
Subject: Re: [llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info
in lld.

>
> DWARF was designed in an era when COMDAT and ICF were not a thing, or at
least not common, certainly not when talking about function code. The
overhead of a unit occurred only once per translation unit, so that
expense was reasonably amortized.
>
>
>
> Splitting functions into their own object-file sections and making them
excludable is an evolution of compiler/linker technology that DWARF has
not kept up with. The linker-friendly solutions (COMDAT DWARF) would put
function-related .debug_* contributions into a section-group along with
the function .text itself; this multiplies the total number of sections to
deal with, regardless of the tactics used for the content of each per-
function DWARF section. The fully DWARF-conformant solution would create
one partial_unit per function, with the corresponding overhead of unit
headers (especially painful in the .debug_line section). Alternatively we
fragment DWARF into sections without headers and rely on the linker to
make everything look right in the linked executable; this produces .o
files that are not DWARF conformant (unless we can standardize this in
DWARF v6) and would be a big hassle for consumers other than the linker.

"object files don't contain DWARF, but they contain stuff that the
linker will turn into DWARF" wouldn't seem like the worst thing to me
- what sort of pre-linking parsing of DWARF use cases do you have in
mind, other than for our own compiler development uses?

No, that wouldn't seem like the worst thing. Obviously llvm-dwarfdump
would want to be able to report what's actually happening, but indeed
all the other use-cases that come to mind are not looking at .o files.

(notwithstanding in-object Split DWARF (where the .dwo sections would
have to be remain usable without linking) or the MachO style debug
info distribution model which is similar)

I expect Split DWARF would be incompatible with fragments. I don't
know details about MachO but seems likely the same is true there.

But even then, I'm not sure how viable it would be - as Fangrui
pointed out on another thread about this: ELF section overhead itself
is non-trivial ("sizeof(Elf64_Shdr) = 64.") & it would probably be
rather difficult to reconstruct header-less slice-and-dicable sections
in some cases. For type information (a reduced overhead version of
-fdebug-types-section) I could see it - but for functions, they need
to refer to addresses - preferably in the debug_addr section, and
that's accessed by index, so taking chunks out of it would break other
references to it, etc... adding the header would be expensive, and how
would the CU construct its DW_AT_ranges value if that has to be sliced
and diced? Again, some amount of linker magic might solve some of
these problems - but I think there's still a lot of overhead to making
a solution that's workable with a DWARF-agnostic linker (or even with
a DWARF aware one, but in an efficient amount of time/space where it's
not only usable for small programs, or for linking when you're
shipping a final production binary, etc)

The idea we have blue-skied internally would work something like this
(initially explicated in terms of the .debug_info section, then seeing
how that tactic applies to other sections):

There's a top fragment, containing the CU header and the CU DIE itself.
Linker magic makes this first in the output file.
Types also go here; certainly base types, and other file-scope types
can be included here or put into type units. (Type units aren't
fragmented, they are their own thing same as always.)
There's a matching bottom fragment, which is just the terminating NULL
for the CU DIE; linker magic makes this last in the output file.

Each function has its own fragment, which is in the same link-group
(COMDAT or whatever) as the function's .text section; that way, if the
function is discarded, so is the .debug_info fragment. Offhand I can't
think of any cases (other than DW_AT_specification, addressed below) of
references to a subprogram DIE from elsewhere, so it should be fine to
discard the entire function fragment as needed. Linker magic puts all
function fragments between the top and bottom fragments, in some
indeterminate order. Each function fragment is the usual complete
subtree, rooted in DW_TAG_subprogram. References to types are either
to type units as normal, or to types in the top fragment. Note that
these references do not require relocations; type units are by signature
as always, and for types in the top fragment, the offsets into the top
fragment are known at compile time.

Inlined functions are described as part of the function they have been
inlined into, being children of the function DIE. DW_AT_specification
refers to the abstract declaration which is in its own fragment (or the
top fragment, but that keeps the declaration from being elided if all
references go away).

If functions are inside namespaces, each function fragment will need
to have namespace DIEs around the function DIE. This adds overhead
but it's pretty small.

I hand-wave filling in the CU header's unit length. I'd expect a
relocation with a reference to the bottom fragment should be able to
compute the correct value.

That's the story for .debug_info; what about other sections?

Sections referenced by index from .debug_info can't be fragmented;
this would be: .debug_abbrev, .debug_addr, .debug_str_offsets.

.debug_str doesn't need to be fragmented, linkers DTRT already.
.debug_macro contents are not tied to functions and won't be fragmented.

.debug_loclists and .debug_rnglists should be fragmentable the same
way as .debug_info; they exist only as extensions of .debug_info, and
the range list for the CU itself is merely a concatenated set of
contributions from each constituent function, so that should Just Work
(although it won't be optimal, adjacent ranges won't be coalesced).
I believe the same is true for .debug_loc and .debug_ranges, although
I haven't checked.
.debug_aranges is functionally equivalent to the CU rangelist.

.debug_line can work the same way as .debug_info but is worth a word.
The top fragment has the header, including the directory/file lists
because those are referenced by index. DW_LNE_define_file can't be
used. Each function has a fragment containing the sequence for that
function, starting with set_address and ending with end_sequence.
The bottom fragment is empty, existing only to allow the length to
be computed.
.debug_line_str is a string section and requires nothing special.

.debug_names ... haven't looked at it but I suspect either it doesn't
survive or it has to be generated post-link (or by the linker).
.debug_frame I *think* can be fragmented, but I haven't take the
time to look at it to make sure.

Those are all the sections I see in DWARF v5 Appendix B.

So that's the blue-sky vision of linker-magic COMDAT DWARF, which
took me about an hour to write down just now. There is certainly
a non-trivial overhead in terms of ELF sections; in the general
case we would have 5 per-function fragments (for .debug_info,
.debug_line, .debug_rnglists, .debug_loclists, .debug_aranges).

Not small, but then other features in the works are using huge
quantities of ELF sections too (section-per-basic-block).

& as always, not sure how any of this would work for Split DWARF -
just a debug_adr section that has some addresses that point to
discardable functions... if we want those addresses themselves to be
discardable (so we don't have to use a tombstone value inserted by the
linker) then they'd need to be in separate debug_addr contributions
with headers, etc - the overhead just seems too high to me in all the
ways I can look at that.

Yeah I think .dwo sections can't take advantage of fragmenting, and
.debug_addr is referenced by index so it can't be fragmented. Although
the point is not to avoid tombstone values, but to do a more efficient
job of editing the final DWARF to omit gc'd functions; it's no problem
at all to use a tombstone value in .debug_addr IMO.
--paulr

> From: David Blaikie <dblaikie@gmail.com>
> Sent: Wednesday, June 3, 2020 5:31 PM
> To: Robinson, Paul <paul.robinson@sony.com>
> Cc: jh7370.2008@my.bristol.ac.uk; llvm-dev@lists.llvm.org
> Subject: Re: [llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info
> in lld.
>
> >
> > DWARF was designed in an era when COMDAT and ICF were not a thing, or at
> least not common, certainly not when talking about function code. The
> overhead of a unit occurred only once per translation unit, so that
> expense was reasonably amortized.
> >
> >
> >
> > Splitting functions into their own object-file sections and making them
> excludable is an evolution of compiler/linker technology that DWARF has
> not kept up with. The linker-friendly solutions (COMDAT DWARF) would put
> function-related .debug_* contributions into a section-group along with
> the function .text itself; this multiplies the total number of sections to
> deal with, regardless of the tactics used for the content of each per-
> function DWARF section. The fully DWARF-conformant solution would create
> one partial_unit per function, with the corresponding overhead of unit
> headers (especially painful in the .debug_line section). Alternatively we
> fragment DWARF into sections without headers and rely on the linker to
> make everything look right in the linked executable; this produces .o
> files that are not DWARF conformant (unless we can standardize this in
> DWARF v6) and would be a big hassle for consumers other than the linker.
>
> "object files don't contain DWARF, but they contain stuff that the
> linker will turn into DWARF" wouldn't seem like the worst thing to me
> - what sort of pre-linking parsing of DWARF use cases do you have in
> mind, other than for our own compiler development uses?

No, that wouldn't seem like the worst thing. Obviously llvm-dwarfdump
would want to be able to report what's actually happening, but indeed
all the other use-cases that come to mind are not looking at .o files.

> (notwithstanding in-object Split DWARF (where the .dwo sections would
> have to be remain usable without linking) or the MachO style debug
> info distribution model which is similar)

I expect Split DWARF would be incompatible with fragments. I don't
know details about MachO but seems likely the same is true there.

Yep, if they're sub-contribution regions, that wouldn't play well with
Split DWARF. (& full contribution isolation have the DWARF header
overhead, etc)

I'd still be concerned about the ELF header overhead even of this
sub-contribution scheme, but could be interesting to see how it plays
out in practice.

All that said, to avoid burying the lede here, I'll splice something
from the end up here:

Although the point is not to avoid tombstone values, but to do a more efficient job of editing the final DWARF to omit gc'd functions; it's no problem at all to use a tombstone value in .debug_addr IMO.

But the tombstone values are Alexey's underlying issue (this ongoing
design discussion for over a year now) & /sort/ of mine too recently
(which, unfortunately, is what's reinvigoraetd this discussion -
would've been nice if I/we/someone had identified this sooner &
could've helped Alexey in a more timely manner): Alexey is dealing
with a platform where 0 is a valid address so the lld/gold strategy of
resolving relocations to dead code to "0+addend" creates ambiguous
DWARF. I'm dealing with a case of zero-length functions ("int f1() {
}" or "void f2() { __builtin_unreachable(); }") causing early
termination of DWARFv4 range lists.

The reason for the DWARF-aware linker proposal was because the "let's
choose a better tombstone" discussion didn't go anywhere & people sort
of encouraged in this direction of "what if we didn't need a
tombstone/the linker fixed up the debug info instead". So if the DWARF
redundancy elimination doesn't address the issue of zero as a valid
address, it doesn't address Alexey's needs, unfortunately. :confused:

That said, I super appreciate the time you've put into writing this up
and it is valuable & I'd love to see some (even hand-crafted assembly)
prototypes, maybe do some back-of-the-envelope numbers to see whether
the ELF header overhead would be worth it, etc.

> But even then, I'm not sure how viable it would be - as Fangrui
> pointed out on another thread about this: ELF section overhead itself
> is non-trivial ("sizeof(Elf64_Shdr) = 64.") & it would probably be
> rather difficult to reconstruct header-less slice-and-dicable sections
> in some cases. For type information (a reduced overhead version of
> -fdebug-types-section) I could see it - but for functions, they need
> to refer to addresses - preferably in the debug_addr section, and
> that's accessed by index, so taking chunks out of it would break other
> references to it, etc... adding the header would be expensive, and how
> would the CU construct its DW_AT_ranges value if that has to be sliced
> and diced? Again, some amount of linker magic might solve some of
> these problems - but I think there's still a lot of overhead to making
> a solution that's workable with a DWARF-agnostic linker (or even with
> a DWARF aware one, but in an efficient amount of time/space where it's
> not only usable for small programs, or for linking when you're
> shipping a final production binary, etc)

The idea we have blue-skied internally would work something like this
(initially explicated in terms of the .debug_info section, then seeing
how that tactic applies to other sections):

There's a top fragment, containing the CU header and the CU DIE itself.
Linker magic makes this first in the output file.

Quick curiosity: Is there existing linker magic for this? What does it
look like? I'd love to know so I can play around with hand crafted
prototypes/keep it in mind for such things.

(basically the ability for an object file to say "here's the start and
end of my contribution to this section, and some bits that /can/ go in
the middle, but you can drop them if you like")

Types also go here; certainly base types, and other file-scope types
can be included here or put into type units. (Type units aren't
fragmented, they are their own thing same as always.)

Separately, it might be worth considering putting types in such a
thing - but, yes, the "How do you reference them when they might be in
your unit or someone else's unit", etc, would have to be figured out.
I guess using an external symbol might be the solution there - again,
with a better understanding of the ^ mentioned linker magic, I'd
probably play around with hand crafting some examples just to see how
this could work.

There's a matching bottom fragment, which is just the terminating NULL
for the CU DIE; linker magic makes this last in the output file.

Last of all the contributions from this object file, not last in the
whole output file, right? (please excuse the pedantry, just double
checking)

Each function has its own fragment, which is in the same link-group
(COMDAT or whatever) as the function's .text section; that way, if the
function is discarded, so is the .debug_info fragment. Offhand I can't
think of any cases (other than DW_AT_specification, addressed below) of
references to a subprogram DIE from elsewhere,

The call_site DWARF would want to refer to a subprogram DIE, but that
could be handled by (first pass) having a declaration subprogram in
the initial fragment that the call_site could refer to using the usual
assembler-resolved CU-relative offset. Of course that'd mean a bunch
of (probably the bigger part) of the function's DWARF footprint
wouldn't be deduplicated, but would address this part of the address
tombstone issue (if not using debug_addr) & reduce some of the DWARF -
the addresses are pretty big (if you're not pooling them), etc.

so it should be fine to
discard the entire function fragment as needed. Linker magic puts all
function fragments between the top and bottom fragments, in some
indeterminate order. Each function fragment is the usual complete
subtree, rooted in DW_TAG_subprogram.

Rooted at the top level (well, below the DW_TAG_compile_unit) DIE, as
you mention later - namespace, or whatever else.

References to types are either
to type units as normal, or to types in the top fragment. Note that
these references do not require relocations; type units are by signature
as always, and for types in the top fragment, the offsets into the top
fragment are known at compile time.

Inlined functions are described as part of the function they have been
inlined into, being children of the function DIE. DW_AT_specification
refers to the abstract declaration which is in its own fragment (or the
top fragment, but that keeps the declaration from being elided if all
references go away).

Yep, this overlaps with the call_site stuff I mentioned earlier - same
ideas. Either top fragment, or its own fragment. Keeping its own
fragment alive, and figuring out how to reference it (depending on
fragment layout/elision) would require some work, but I think it's
do-able. Might even be do-able so it can be deduplicated across CUs
(use a sec_offset form, use a linker-resolved relocation to it) - this
infrastructure would overlap with type deduplication without type
units too.

Though linker resolved relocations add more bytes...

If functions are inside namespaces, each function fragment will need
to have namespace DIEs around the function DIE. This adds overhead
but it's pretty small.

I hand-wave filling in the CU header's unit length. I'd expect a
relocation with a reference to the bottom fragment should be able to
compute the correct value.

*nod*

That's the story for .debug_info; what about other sections?

Sections referenced by index from .debug_info can't be fragmented;
this would be: .debug_abbrev, .debug_addr, .debug_str_offsets.

.debug_str doesn't need to be fragmented, linkers DTRT already.

(linkers deduplicate debug_str - but can they be made to remove
unreferenced strings too? in that cas ewe'd have an interesting
tradeoff of maybe using FORM_strp rather than strx - if we wanted the
linker to be able to drop strings from dropped function definitions,
etc)

.debug_macro contents are not tied to functions and won't be fragmented.

.debug_loclists and .debug_rnglists should be fragmentable the same
way as .debug_info; they exist only as extensions of .debug_info, and
the range list for the CU itself is merely a concatenated set of
contributions from each constituent function, so that should Just Work
(although it won't be optimal, adjacent ranges won't be coalesced).

At least the way we currently emit loclists and rnglists is by using
an index (the header of loclists and rnglists has an index to offset
mapping) - like strx, this would make it hard/impossible for a
DWARF-agnostic linker to see through to find out which indexes were
actually used. We could potentially not use the loclistx/rnglistx
forms/indexes from fragments - instead using sec_offsets that would
make them relocatable/removable/etc. (so long as all the index-based
referenced lists came in the debug_loclist/debug_rnglist header
fragment)

I believe the same is true for .debug_loc and .debug_ranges, although
I haven't checked.

Yep, those ones are easier - there's no contribution header, they can
only be referenced via sec_offset, so slicing and dicing them is
cheap.

But the tombstone problem still exists for the CU's debug_ranges -
though /maybe/ it could be carefully constructed from fragments...
that's going to be a /lot/ of sections in the end though.

.debug_aranges is functionally equivalent to the CU rangelist.

Yup. (as we've touched on before, we don't use aranges at Google -
instead relying on CU's ranges which are just a little more expensive
to retrieve - but no need to duplicate the data in both places - if
consumers really find the aranges worthwhile to avoid parsing a few
attributes on the CU DIE, perhaps a future spec could let
debug_aranges reference a range list? so that aranges and the CU could
share the same data?)

.debug_line can work the same way as .debug_info but is worth a word.
The top fragment has the header, including the directory/file lists
because those are referenced by index. DW_LNE_define_file can't be
used. Each function has a fragment containing the sequence for that
function, starting with set_address and ending with end_sequence.
The bottom fragment is empty, existing only to allow the length to
be computed.

Yep - can't remove dead file and directory names, unfortunately - and
the line table's pretty compact, so not sure it'd be a great savings
(especially compared to the ELF section overhead - at the object file
size at least (though probably a small win for linked executable
size)). Chances are those strings (now in debug_line_str) would be
used /somewhere/ in the program, so linker string deduplication would
get most of the wins - just dead offset entries in the line table
header.

.debug_line_str is a string section and requires nothing special.

.debug_names ... haven't looked at it but I suspect either it doesn't
survive or it has to be generated post-link (or by the linker).

Generally you're going to want a DWARF-aware linker for debug_names,
same as gdb-index, etc.

.debug_frame I *think* can be fragmented, but I haven't take the
time to look at it to make sure.

Those are all the sections I see in DWARF v5 Appendix B.

So that's the blue-sky vision of linker-magic COMDAT DWARF, which
took me about an hour to write down just now. There is certainly
a non-trivial overhead in terms of ELF sections; in the general
case we would have 5 per-function fragments (for .debug_info,
.debug_line, .debug_rnglists, .debug_loclists, .debug_aranges).

Not small, but then other features in the works are using huge
quantities of ELF sections too (section-per-basic-block).

That work's being scoped to be fairly selective about which basic
blocks it puts in unique sections - just those that are especially
performance sensitive, so the cost isn't as high as you might
otherwise imagine. Adding 5 new sections per function would be
probably a significantly larger growth than anything else I'm aware
of, but I haven't run the numbers by any means.

Thanks again for the write up!

- Dave

It makes me sad that the linker (via a library or otherwise) has to be
"DWARF-aware" to be able to effectively handle --gc-sections, COMDATs,
--icf etc for debug info, without leaving large blocks of data kicking
around.

The patching to -1 (or equivalent) is probably a good lightweight solution
(though I'd love it if it could be done based on section type in the future
rather than section name, but that's probably outside the realm of DWARF),
as it requires only minimal understanding in the linker, but anything
beyond that seems to be complicated logic that is mostly due to the
structure of DWARF. Patching to -1 does feel a bit like a sticking
plaster/band aid to patch over the issue rather than properly solving it
too - there will still be debug data (potentially significant amounts in
COMDAT-heavy objects) that the linker has to write and the debugger has to
somehow know how to skip (even if it knows that -1 is special-case due to
the standard being updated, it needs to get as far as the -1), which is all
wasted effort.

We've already seen from Alexey's prototyping, and from our own experiences
with the Sony proprietary linker (which tried to rewrite .debug_line only)
that deconstructing the DWARF so that it can be more optimally reassembled
at link time is slow going, and will probably inevitably be however much
effort is put into optimising it. For a start, given the current standards,
it's impossible to know how to deconstruct it without having to parse vast
amounts of DWARF, which is typically going to mean a lot more parsing work
than the linker would normally have to deal with. Additionally, much of
this parsing work is wasted effort, since it seems unlikely in many links
that large amounts of the DWARF will be redundant. Having an option to
opt-in doesn't help much there, since it just means the logic exists
without most people using it, due to it not being good enough, or
potentially they don't even know it exists.

I don't have particularly concrete suggestions as to how to solve the
structural problems with DWARF at this point. The only thing that seems
obvious to me is a more "blessed" approach to fragmentation of sections,
similar to what I tried with my prototype mentioned earlier in the thread,
although we'd need to figure out the previously stated performance issues.
Other ideas might tie into this, like somehow sharing the various table
headers a bit like CIEs in .eh_frame that could be merged by the linker -
each object could have separate table header sections, which are referenced
by the individual .debug_* blocks, which in turn are one per function/data
piece and easily discardable/merged by the linker.

Just some thoughts.

James

Your proposed option --dead-reloc-addend=.debug_info=0xffffffffffffffff
seems like a good idea. (I'd expect it to support signed -1 and -2 for
convenience & consistency in some other places (we sometimes use addends
as signed values)).

LLD only supports absolute relocation types (plus R_PPC64_DTPREL64 which
can go to .debug_addr, plus R_RISCV_{ADD,SUB}*).

The computed value is S + A.
We still consider the symbolic value S as zero, but override A with the
supplied option --dead-reloc-addend=.debug_info=-1
I particularly like that `addend` is part of the option name.

My mere complaint is that the relocation record is not dead, but rather
its referenced symbol is dead. However, I can't think of a better
name...

Checked with Martin Storsjö, this option may be useful for other binary
formats supporting DWARF. (binutils does not like ELF-specific options
not called -z foobar).
I think it is fine to add this option to LLD if GNU ld is also happy
with the name. I'll check with them.

"There is a danger that one community won't accept an extension that
they haven't been involved in the design process for." :slight_smile: (Coutesy of Peter)

The built-in rules of the linker are the following:

--dead-reloc-addend=.debug_loc=-2
--dead-reloc-addend=.debug_ranges=-2
--dead-reloc-addend=.debug_*=-1

They can be overridden.

FWIW, I think it's probably best to at least initially frame the
discussion around non-configurable value for the sake of reducing the
scope/possible surface area of the feature/users/etc. I'd probably
only encourage adding the user-configurable flag if/when someone has a
use case for it.

+ Ben Dunbobbin, whose name I take in vain below.
He's my local expert on weird ELF features.

From: David Blaikie <dblaikie@gmail.com>
Sent: Thursday, June 4, 2020 2:43 PM
To: Robinson, Paul <paul.robinson@sony.com>
Cc: jh7370.2008@my.bristol.ac.uk; llvm-dev@lists.llvm.org
Subject: Re: [llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info
in lld.

>
>
>
> > From: David Blaikie <dblaikie@gmail.com>
> > Sent: Wednesday, June 3, 2020 5:31 PM
> > To: Robinson, Paul <paul.robinson@sony.com>
> > Cc: jh7370.2008@my.bristol.ac.uk; llvm-dev@lists.llvm.org
> > Subject: Re: [llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug
info
> > in lld.
> >
> > >
> > > DWARF was designed in an era when COMDAT and ICF were not a thing,
or at
> > least not common, certainly not when talking about function code. The
> > overhead of a unit occurred only once per translation unit, so that
> > expense was reasonably amortized.
> > >
> > >
> > >
> > > Splitting functions into their own object-file sections and making
them
> > excludable is an evolution of compiler/linker technology that DWARF
has
> > not kept up with. The linker-friendly solutions (COMDAT DWARF) would
put
> > function-related .debug_* contributions into a section-group along
with
> > the function .text itself; this multiplies the total number of
sections to
> > deal with, regardless of the tactics used for the content of each per-
> > function DWARF section. The fully DWARF-conformant solution would
create
> > one partial_unit per function, with the corresponding overhead of unit
> > headers (especially painful in the .debug_line section).
Alternatively we
> > fragment DWARF into sections without headers and rely on the linker to
> > make everything look right in the linked executable; this produces .o
> > files that are not DWARF conformant (unless we can standardize this in
> > DWARF v6) and would be a big hassle for consumers other than the
linker.
> >
> > "object files don't contain DWARF, but they contain stuff that the
> > linker will turn into DWARF" wouldn't seem like the worst thing to me
> > - what sort of pre-linking parsing of DWARF use cases do you have in
> > mind, other than for our own compiler development uses?
>
> No, that wouldn't seem like the worst thing. Obviously llvm-dwarfdump
> would want to be able to report what's actually happening, but indeed
> all the other use-cases that come to mind are not looking at .o files.
>
> > (notwithstanding in-object Split DWARF (where the .dwo sections would
> > have to be remain usable without linking) or the MachO style debug
> > info distribution model which is similar)
>
> I expect Split DWARF would be incompatible with fragments. I don't
> know details about MachO but seems likely the same is true there.

Yep, if they're sub-contribution regions, that wouldn't play well with
Split DWARF. (& full contribution isolation have the DWARF header
overhead, etc)

I'd still be concerned about the ELF header overhead even of this
sub-contribution scheme, but could be interesting to see how it plays
out in practice.

All that said, to avoid burying the lede here, I'll splice something
from the end up here:

> Although the point is not to avoid tombstone values, but to do a more
efficient job of editing the final DWARF to omit gc'd functions; it's no
problem at all to use a tombstone value in .debug_addr IMO.

But the tombstone values are Alexey's underlying issue (this ongoing
design discussion for over a year now) & /sort/ of mine too recently
(which, unfortunately, is what's reinvigoraetd this discussion -
would've been nice if I/we/someone had identified this sooner &
could've helped Alexey in a more timely manner): Alexey is dealing
with a platform where 0 is a valid address so the lld/gold strategy of
resolving relocations to dead code to "0+addend" creates ambiguous
DWARF. I'm dealing with a case of zero-length functions ("int f1() {
}" or "void f2() { __builtin_unreachable(); }") causing early
termination of DWARFv4 range lists.

The reason for the DWARF-aware linker proposal was because the "let's
choose a better tombstone" discussion didn't go anywhere & people sort
of encouraged in this direction of "what if we didn't need a
tombstone/the linker fixed up the debug info instead". So if the DWARF
redundancy elimination doesn't address the issue of zero as a valid
address, it doesn't address Alexey's needs, unfortunately. :confused:

But, upthread we had a tombstone discussion IIRC, which seemed to converge
on "-1 except .debug_loc/.debug_ranges use -2" didn't it? If we're still
going on about having the linker rewriting DWARF, then the fragmenting
idea is worth pursuing as an alternative to Alexey's current work.

That said, I super appreciate the time you've put into writing this up
and it is valuable & I'd love to see some (even hand-crafted assembly)
prototypes, maybe do some back-of-the-envelope numbers to see whether
the ELF header overhead would be worth it, etc.

It would be nice to verify that the section-fragment idea would produce
something that looked usable. Hand-written assembly... would require
research into how to specify the right section attributes, but would
likely be less effort than trying to make LLVM do something plausible.

I'll see about creating an internal task for this.

> > But even then, I'm not sure how viable it would be - as Fangrui
> > pointed out on another thread about this: ELF section overhead itself
> > is non-trivial ("sizeof(Elf64_Shdr) = 64.") & it would probably be
> > rather difficult to reconstruct header-less slice-and-dicable sections
> > in some cases. For type information (a reduced overhead version of
> > -fdebug-types-section) I could see it - but for functions, they need
> > to refer to addresses - preferably in the debug_addr section, and
> > that's accessed by index, so taking chunks out of it would break other
> > references to it, etc... adding the header would be expensive, and how
> > would the CU construct its DW_AT_ranges value if that has to be sliced
> > and diced? Again, some amount of linker magic might solve some of
> > these problems - but I think there's still a lot of overhead to making
> > a solution that's workable with a DWARF-agnostic linker (or even with
> > a DWARF aware one, but in an efficient amount of time/space where it's
> > not only usable for small programs, or for linking when you're
> > shipping a final production binary, etc)
>
> The idea we have blue-skied internally would work something like this
> (initially explicated in terms of the .debug_info section, then seeing
> how that tactic applies to other sections):
>
> There's a top fragment, containing the CU header and the CU DIE itself.
> Linker magic makes this first in the output file.

Quick curiosity: Is there existing linker magic for this? What does it
look like? I'd love to know so I can play around with hand crafted
prototypes/keep it in mind for such things.

Ben Dunbobbin did research into this some time ago, under the auspices
of a "COMDAT DWARF" investigation. He's part of Sony's linker team, and
it was a discussion with that team where I became convinced that the
fragmenting idea was feasible using existing defined ELF capabilities,
although perhaps in ways nobody had really taken advantage of. It
involved section groups and/or section ordering, but somebody much more
familiar with ELF than I am would have to explain it. I've cc'd Ben.

Regarding my discussion with our linker team:
They asked me whether it was feasible to use sections to subset the
DWARF, and I described the functional need (top & bottom fragments,
arbitrary stuff in between) and they thought the ELF section-group
and/or section-ordering features would be able to provide that.

I'm not aware that anyone actually tried prototyping that. The work
that James did (mentioned upthread) IIRC was using COMDAT and full
units with unit headers. My fading memory suggests the discussion
described just above was after that.

(basically the ability for an object file to say "here's the start and
end of my contribution to this section, and some bits that /can/ go in
the middle, but you can drop them if you like")

> Types also go here; certainly base types, and other file-scope types
> can be included here or put into type units. (Type units aren't
> fragmented, they are their own thing same as always.)

Separately, it might be worth considering putting types in such a
thing - but, yes, the "How do you reference them when they might be in
your unit or someone else's unit", etc, would have to be figured out.
I guess using an external symbol might be the solution there - again,
with a better understanding of the ^ mentioned linker magic, I'd
probably play around with hand crafting some examples just to see how
this could work.

> There's a matching bottom fragment, which is just the terminating NULL
> for the CU DIE; linker magic makes this last in the output file.

Last of all the contributions from this object file, not last in the
whole output file, right? (please excuse the pedantry, just double
checking)

The object file would (loosely speaking) have a ".debug_info.first",
some number of ".debug_info.excludable-middle", and a ".debug_info.last"
which would all be glommed together in first-middle-last order in the
output .debug_info section. I believe I was told that this would be
per-object-file, otherwise yeah it wouldn't work at all.

This is why we need input from somebody who actually knows ELF. :blush:

> Each function has its own fragment, which is in the same link-group
> (COMDAT or whatever) as the function's .text section; that way, if the
> function is discarded, so is the .debug_info fragment. Offhand I can't
> think of any cases (other than DW_AT_specification, addressed below) of
> references to a subprogram DIE from elsewhere,

The call_site DWARF would want to refer to a subprogram DIE, but that
could be handled by (first pass) having a declaration subprogram in
the initial fragment that the call_site could refer to using the usual
assembler-resolved CU-relative offset. Of course that'd mean a bunch
of (probably the bigger part) of the function's DWARF footprint
wouldn't be deduplicated, but would address this part of the address
tombstone issue (if not using debug_addr) & reduce some of the DWARF -
the addresses are pretty big (if you're not pooling them), etc.

Ah, forgot about call_site. Yeah referring to a declaration should work.

> so it should be fine to
> discard the entire function fragment as needed. Linker magic puts all
> function fragments between the top and bottom fragments, in some
> indeterminate order. Each function fragment is the usual complete
> subtree, rooted in DW_TAG_subprogram.

Rooted at the top level (well, below the DW_TAG_compile_unit) DIE, as
you mention later - namespace, or whatever else.

Right, each fragment would be a complete subtree that would ordinarily
be a direct child of DW_TAG_compile_unit. With whatever DIE it needed.

> References to types are either
> to type units as normal, or to types in the top fragment. Note that
> these references do not require relocations; type units are by signature
> as always, and for types in the top fragment, the offsets into the top
> fragment are known at compile time.
>
> Inlined functions are described as part of the function they have been
> inlined into, being children of the function DIE. DW_AT_specification
> refers to the abstract declaration which is in its own fragment (or the
> top fragment, but that keeps the declaration from being elided if all
> references go away).

Yep, this overlaps with the call_site stuff I mentioned earlier - same
ideas. Either top fragment, or its own fragment. Keeping its own
fragment alive, and figuring out how to reference it (depending on
fragment layout/elision) would require some work, but I think it's
do-able. Might even be do-able so it can be deduplicated across CUs
(use a sec_offset form, use a linker-resolved relocation to it) - this
infrastructure would overlap with type deduplication without type
units too.

Though linker resolved relocations add more bytes...

> If functions are inside namespaces, each function fragment will need
> to have namespace DIEs around the function DIE. This adds overhead
> but it's pretty small.
>
> I hand-wave filling in the CU header's unit length. I'd expect a
> relocation with a reference to the bottom fragment should be able to
> compute the correct value.

*nod*

> That's the story for .debug_info; what about other sections?
>
> Sections referenced by index from .debug_info can't be fragmented;
> this would be: .debug_abbrev, .debug_addr, .debug_str_offsets.
>
> .debug_str doesn't need to be fragmented, linkers DTRT already.

(linkers deduplicate debug_str - but can they be made to remove
unreferenced strings too? in that cas ewe'd have an interesting
tradeoff of maybe using FORM_strp rather than strx - if we wanted the
linker to be able to drop strings from dropped function definitions,
etc)

Future refinements are quite possible!

> .debug_macro contents are not tied to functions and won't be fragmented.
>
> .debug_loclists and .debug_rnglists should be fragmentable the same
> way as .debug_info; they exist only as extensions of .debug_info, and
> the range list for the CU itself is merely a concatenated set of
> contributions from each constituent function, so that should Just Work
> (although it won't be optimal, adjacent ranges won't be coalesced).

At least the way we currently emit loclists and rnglists is by using
an index (the header of loclists and rnglists has an index to offset
mapping) - like strx, this would make it hard/impossible for a
DWARF-agnostic linker to see through to find out which indexes were
actually used. We could potentially not use the loclistx/rnglistx
forms/indexes from fragments - instead using sec_offsets that would
make them relocatable/removable/etc. (so long as all the index-based
referenced lists came in the debug_loclist/debug_rnglist header
fragment)

Ah, I hadn't looked at how we do those lists. But sounds solvable.

> I believe the same is true for .debug_loc and .debug_ranges, although
> I haven't checked.

Yep, those ones are easier - there's no contribution header, they can
only be referenced via sec_offset, so slicing and dicing them is
cheap.

But the tombstone problem still exists for the CU's debug_ranges -
though /maybe/ it could be carefully constructed from fragments...
that's going to be a /lot/ of sections in the end though.

> .debug_aranges is functionally equivalent to the CU rangelist.

Yup. (as we've touched on before, we don't use aranges at Google -
instead relying on CU's ranges which are just a little more expensive
to retrieve - but no need to duplicate the data in both places - if
consumers really find the aranges worthwhile to avoid parsing a few
attributes on the CU DIE, perhaps a future spec could let
debug_aranges reference a range list? so that aranges and the CU could
share the same data?)

> .debug_line can work the same way as .debug_info but is worth a word.
> The top fragment has the header, including the directory/file lists
> because those are referenced by index. DW_LNE_define_file can't be
> used. Each function has a fragment containing the sequence for that
> function, starting with set_address and ending with end_sequence.
> The bottom fragment is empty, existing only to allow the length to
> be computed.

Yep - can't remove dead file and directory names, unfortunately - and
the line table's pretty compact, so not sure it'd be a great savings
(especially compared to the ELF section overhead - at the object file
size at least (though probably a small win for linked executable
size)). Chances are those strings (now in debug_line_str) would be
used /somewhere/ in the program, so linker string deduplication would
get most of the wins - just dead offset entries in the line table
header.

Sony does squeeze out the sequences for dead functions; I think it's
not a huge win, in terms of total debug info size, but the .debug_line
section does not let you skip dead sequences; you still have to parse
the whole thing. Our debugger guys were pleased at not having to
spend time doing something that useless. (Yeah it does mean the
linker has to parse the whole .debug_line section; but our theory is
that you probably run the debugger more than you run the linker, and
in any case you do it interactively, so debugger load time is probably
more annoying than some fractional increase in build/link time.)

The dir/file tables can't be squeezed, but one expects it's not a
huge cost with .debug_line_str having lots of deduplication
opportunities.

> .debug_line_str is a string section and requires nothing special.
>
> .debug_names ... haven't looked at it but I suspect either it doesn't
> survive or it has to be generated post-link (or by the linker).

Generally you're going to want a DWARF-aware linker for debug_names,
same as gdb-index, etc.

> .debug_frame I *think* can be fragmented, but I haven't take the
> time to look at it to make sure.
>
> Those are all the sections I see in DWARF v5 Appendix B.
>
> So that's the blue-sky vision of linker-magic COMDAT DWARF, which
> took me about an hour to write down just now. There is certainly
> a non-trivial overhead in terms of ELF sections; in the general
> case we would have 5 per-function fragments (for .debug_info,
> .debug_line, .debug_rnglists, .debug_loclists, .debug_aranges).
>
> Not small, but then other features in the works are using huge
> quantities of ELF sections too (section-per-basic-block).

That work's being scoped to be fairly selective about which basic
blocks it puts in unique sections - just those that are especially
performance sensitive, so the cost isn't as high as you might
otherwise imagine. Adding 5 new sections per function would be
probably a significantly larger growth than anything else I'm aware
of, but I haven't run the numbers by any means.

Doing it for *every* function would be the worst case, for when
you're trying to squeeze everything (gc + icf). We could likely
get wins if we did it just for the functions that today end up in
a COMDAT section (inline functions, template instantiations) which
previous research has found to be pretty significant (and major
motivation for the Program Repository work that we've previously
described at a Dev Meeting, https://llvm.org/devmtg/2016-11/#talk22)

Thanks again for the write up!

NP, it was fun to trot out this stuff.
--paulr

+ Ben Dunbobbin, whose name I take in vain below.
He's my local expert on weird ELF features.

Hey, I have read
https://groups.google.com/forum/#!msg/generic-abi/A-1rbP8hFCA/EDA7Sf3KBwAJ
"monolithic input section handling" from Ben:)

From: David Blaikie <dblaikie@gmail.com>
Sent: Thursday, June 4, 2020 2:43 PM
To: Robinson, Paul <paul.robinson@sony.com>
Cc: jh7370.2008@my.bristol.ac.uk; llvm-dev@lists.llvm.org
Subject: Re: [llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info
in lld.

>
> > From: David Blaikie <dblaikie@gmail.com>
> > Sent: Wednesday, June 3, 2020 5:31 PM
> > To: Robinson, Paul <paul.robinson@sony.com>
> > Cc: jh7370.2008@my.bristol.ac.uk; llvm-dev@lists.llvm.org
> > Subject: Re: [llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug
info
> > in lld.
> >
> > >
> > > DWARF was designed in an era when COMDAT and ICF were not a thing,
or at
> > least not common, certainly not when talking about function code. The
> > overhead of a unit occurred only once per translation unit, so that
> > expense was reasonably amortized.
> > >
> > > Splitting functions into their own object-file sections and making
them
> > excludable is an evolution of compiler/linker technology that DWARF
has
> > not kept up with. The linker-friendly solutions (COMDAT DWARF) would
put
> > function-related .debug_* contributions into a section-group along
with
> > the function .text itself; this multiplies the total number of
sections to
> > deal with, regardless of the tactics used for the content of each per-
> > function DWARF section. The fully DWARF-conformant solution would
create
> > one partial_unit per function, with the corresponding overhead of unit
> > headers (especially painful in the .debug_line section).
Alternatively we
> > fragment DWARF into sections without headers and rely on the linker to
> > make everything look right in the linked executable; this produces .o
> > files that are not DWARF conformant (unless we can standardize this in
> > DWARF v6) and would be a big hassle for consumers other than the
linker.
> >
> > "object files don't contain DWARF, but they contain stuff that the
> > linker will turn into DWARF" wouldn't seem like the worst thing to me
> > - what sort of pre-linking parsing of DWARF use cases do you have in
> > mind, other than for our own compiler development uses?
>
> No, that wouldn't seem like the worst thing. Obviously llvm-dwarfdump
> would want to be able to report what's actually happening, but indeed
> all the other use-cases that come to mind are not looking at .o files.
>
> > (notwithstanding in-object Split DWARF (where the .dwo sections would
> > have to be remain usable without linking) or the MachO style debug
> > info distribution model which is similar)
>
> I expect Split DWARF would be incompatible with fragments. I don't
> know details about MachO but seems likely the same is true there.

Yep, if they're sub-contribution regions, that wouldn't play well with
Split DWARF. (& full contribution isolation have the DWARF header
overhead, etc)

I'd still be concerned about the ELF header overhead even of this
sub-contribution scheme, but could be interesting to see how it plays
out in practice.

All that said, to avoid burying the lede here, I'll splice something
from the end up here:

> Although the point is not to avoid tombstone values, but to do a more
efficient job of editing the final DWARF to omit gc'd functions; it's no
problem at all to use a tombstone value in .debug_addr IMO.

But the tombstone values are Alexey's underlying issue (this ongoing
design discussion for over a year now) & /sort/ of mine too recently
(which, unfortunately, is what's reinvigoraetd this discussion -
would've been nice if I/we/someone had identified this sooner &
could've helped Alexey in a more timely manner): Alexey is dealing
with a platform where 0 is a valid address so the lld/gold strategy of
resolving relocations to dead code to "0+addend" creates ambiguous
DWARF. I'm dealing with a case of zero-length functions ("int f1() {
}" or "void f2() { __builtin_unreachable(); }") causing early
termination of DWARFv4 range lists.

The reason for the DWARF-aware linker proposal was because the "let's
choose a better tombstone" discussion didn't go anywhere & people sort
of encouraged in this direction of "what if we didn't need a
tombstone/the linker fixed up the debug info instead". So if the DWARF
redundancy elimination doesn't address the issue of zero as a valid
address, it doesn't address Alexey's needs, unfortunately. :confused:

But, upthread we had a tombstone discussion IIRC, which seemed to converge
on "-1 except .debug_loc/.debug_ranges use -2" didn't it? If we're still
going on about having the linker rewriting DWARF, then the fragmenting
idea is worth pursuing as an alternative to Alexey's current work.

+1 for "-1 except .debug_loc/.debug_ranges use -2"

That said, I super appreciate the time you've put into writing this up
and it is valuable & I'd love to see some (even hand-crafted assembly)
prototypes, maybe do some back-of-the-envelope numbers to see whether
the ELF header overhead would be worth it, etc.

It would be nice to verify that the section-fragment idea would produce
something that looked usable. Hand-written assembly... would require
research into how to specify the right section attributes, but would
likely be less effort than trying to make LLVM do something plausible.

I'll see about creating an internal task for this.

According to Peter Smith, Arm Compiler 5 splits up DWARF v3 debugging
information and puts these sections into comdat groups:

"This approach did produce significantly more debug information than gcc
  did. For small microcontroller projects this wasn't a problem. For
  larger feature phone problems we had to put a lot of work into keeping
  the linker's memory usage down as many of our customers at the time were
  using 32-bit Windows machines with a default maximum virtual memory of 2Gb."

I'd also love to see some examples (even hand-crafted assembly).

> > But even then, I'm not sure how viable it would be - as Fangrui
> > pointed out on another thread about this: ELF section overhead itself
> > is non-trivial ("sizeof(Elf64_Shdr) = 64.") & it would probably be
> > rather difficult to reconstruct header-less slice-and-dicable sections
> > in some cases. For type information (a reduced overhead version of
> > -fdebug-types-section) I could see it - but for functions, they need
> > to refer to addresses - preferably in the debug_addr section, and
> > that's accessed by index, so taking chunks out of it would break other
> > references to it, etc... adding the header would be expensive, and how
> > would the CU construct its DW_AT_ranges value if that has to be sliced
> > and diced? Again, some amount of linker magic might solve some of
> > these problems - but I think there's still a lot of overhead to making
> > a solution that's workable with a DWARF-agnostic linker (or even with
> > a DWARF aware one, but in an efficient amount of time/space where it's
> > not only usable for small programs, or for linking when you're
> > shipping a final production binary, etc)
>
> The idea we have blue-skied internally would work something like this
> (initially explicated in terms of the .debug_info section, then seeing
> how that tactic applies to other sections):
>
> There's a top fragment, containing the CU header and the CU DIE itself.
> Linker magic makes this first in the output file.

Quick curiosity: Is there existing linker magic for this? What does it
look like? I'd love to know so I can play around with hand crafted
prototypes/keep it in mind for such things.

Ben Dunbobbin did research into this some time ago, under the auspices
of a "COMDAT DWARF" investigation. He's part of Sony's linker team, and
it was a discussion with that team where I became convinced that the
fragmenting idea was feasible using existing defined ELF capabilities,
although perhaps in ways nobody had really taken advantage of. It
involved section groups and/or section ordering, but somebody much more
familiar with ELF than I am would have to explain it. I've cc'd Ben.

Regarding my discussion with our linker team:
They asked me whether it was feasible to use sections to subset the
DWARF, and I described the functional need (top & bottom fragments,
arbitrary stuff in between) and they thought the ELF section-group
and/or section-ordering features would be able to provide that.

I'm not aware that anyone actually tried prototyping that. The work
that James did (mentioned upthread) IIRC was using COMDAT and full
units with unit headers. My fading memory suggests the discussion
described just above was after that.

(basically the ability for an object file to say "here's the start and
end of my contribution to this section, and some bits that /can/ go in
the middle, but you can drop them if you like")

> Types also go here; certainly base types, and other file-scope types
> can be included here or put into type units. (Type units aren't
> fragmented, they are their own thing same as always.)

Separately, it might be worth considering putting types in such a
thing - but, yes, the "How do you reference them when they might be in
your unit or someone else's unit", etc, would have to be figured out.
I guess using an external symbol might be the solution there - again,
with a better understanding of the ^ mentioned linker magic, I'd
probably play around with hand crafting some examples just to see how
this could work.

> There's a matching bottom fragment, which is just the terminating NULL
> for the CU DIE; linker magic makes this last in the output file.

Last of all the contributions from this object file, not last in the
whole output file, right? (please excuse the pedantry, just double
checking)

The object file would (loosely speaking) have a ".debug_info.first",
some number of ".debug_info.excludable-middle", and a ".debug_info.last"
which would all be glommed together in first-middle-last order in the
output .debug_info section. I believe I was told that this would be
per-object-file, otherwise yeah it wouldn't work at all.

This is why we need input from somebody who actually knows ELF. :blush:

We probably have to reuse the ".debug_info" string (in assembly this requires
unique linkage, which has been implemented in LLVM for a while but relatively
new in binutils (future 2.35)) which is already an entry in .strtab, otherwise
the string itself can cost quite a lot.

(Mostly Range lists, zero-length functions, linker gc )

  • Ben Dunbobbin, whose name I take in vain below.
    He’s my local expert on weird ELF features.

Hey, I have read
https://groups.google.com/forum/#!msg/generic-abi/A-1rbP8hFCA/EDA7Sf3KBwAJ
“monolithic input section handling” from Ben:)

Just for full clarity - I’m one of Ben’s team-mates on the linker and binutils team, so hopefully my ELF knowledge is also up to scratch! Ben and I have bounced a number of these ideas off of each other, so should have a roughly equivalent understanding of the topic. I believe he’s got today and next week off, so I don’t know if he’ll answer anything on here for the next week or two.

From: David Blaikie <dblaikie@gmail.com>
Sent: Thursday, June 4, 2020 2:43 PM
To: Robinson, Paul <paul.robinson@sony.com>
Cc: jh7370.2008@my.bristol.ac.uk; llvm-dev@lists.llvm.org
Subject: Re: [llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info
in lld.

From: David Blaikie <dblaikie@gmail.com>
Sent: Wednesday, June 3, 2020 5:31 PM
To: Robinson, Paul <paul.robinson@sony.com>
Cc: jh7370.2008@my.bristol.ac.uk; llvm-dev@lists.llvm.org
Subject: Re: [llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug
info
in lld.

DWARF was designed in an era when COMDAT and ICF were not a thing,
or at
least not common, certainly not when talking about function code. The
overhead of a unit occurred only once per translation unit, so that
expense was reasonably amortized.

Splitting functions into their own object-file sections and making
them
excludable is an evolution of compiler/linker technology that DWARF
has
not kept up with. The linker-friendly solutions (COMDAT DWARF) would
put
function-related .debug_* contributions into a section-group along
with
the function .text itself; this multiplies the total number of
sections to
deal with, regardless of the tactics used for the content of each per-
function DWARF section. The fully DWARF-conformant solution would
create
one partial_unit per function, with the corresponding overhead of unit
headers (especially painful in the .debug_line section).
Alternatively we
fragment DWARF into sections without headers and rely on the linker to
make everything look right in the linked executable; this produces .o
files that are not DWARF conformant (unless we can standardize this in
DWARF v6) and would be a big hassle for consumers other than the
linker.

“object files don’t contain DWARF, but they contain stuff that the
linker will turn into DWARF” wouldn’t seem like the worst thing to me

  • what sort of pre-linking parsing of DWARF use cases do you have in
    mind, other than for our own compiler development uses?

No, that wouldn’t seem like the worst thing. Obviously llvm-dwarfdump
would want to be able to report what’s actually happening, but indeed
all the other use-cases that come to mind are not looking at .o files.

I think it should be fairly straightforward for dumping tools that know about fragmented DWARF to just glue it all together before dumping. In an ideal world, it would be something in the section or DWARF header that told the tool that this needed to be done, although I’m not entirely sure what.

(notwithstanding in-object Split DWARF (where the .dwo sections would
have to be remain usable without linking) or the MachO style debug
info distribution model which is similar)

I expect Split DWARF would be incompatible with fragments. I don’t
know details about MachO but seems likely the same is true there.

Yep, if they’re sub-contribution regions, that wouldn’t play well with
Split DWARF. (& full contribution isolation have the DWARF header
overhead, etc)

I’d still be concerned about the ELF header overhead even of this
sub-contribution scheme, but could be interesting to see how it plays
out in practice.

All that said, to avoid burying the lede here, I’ll splice something
from the end up here:

Although the point is not to avoid tombstone values, but to do a more
efficient job of editing the final DWARF to omit gc’d functions; it’s no
problem at all to use a tombstone value in .debug_addr IMO.

But the tombstone values are Alexey’s underlying issue (this ongoing
design discussion for over a year now) & /sort/ of mine too recently
(which, unfortunately, is what’s reinvigoraetd this discussion -
would’ve been nice if I/we/someone had identified this sooner &
could’ve helped Alexey in a more timely manner): Alexey is dealing
with a platform where 0 is a valid address so the lld/gold strategy of
resolving relocations to dead code to “0+addend” creates ambiguous
DWARF. I’m dealing with a case of zero-length functions (“int f1() {
}” or “void f2() { __builtin_unreachable(); }”) causing early
termination of DWARFv4 range lists.

The reason for the DWARF-aware linker proposal was because the “let’s
choose a better tombstone” discussion didn’t go anywhere & people sort
of encouraged in this direction of “what if we didn’t need a
tombstone/the linker fixed up the debug info instead”. So if the DWARF
redundancy elimination doesn’t address the issue of zero as a valid
address, it doesn’t address Alexey’s needs, unfortunately. :confused:

But, upthread we had a tombstone discussion IIRC, which seemed to converge
on “-1 except .debug_loc/.debug_ranges use -2” didn’t it? If we’re still
going on about having the linker rewriting DWARF, then the fragmenting
idea is worth pursuing as an alternative to Alexey’s current work.

+1 for “-1 except .debug_loc/.debug_ranges use -2”

Also +1. I’m happy for this approach to go ahead for current DWARF versions, since we already actually do this in our downstream port.

That said, I super appreciate the time you’ve put into writing this up
and it is valuable & I’d love to see some (even hand-crafted assembly)
prototypes, maybe do some back-of-the-envelope numbers to see whether
the ELF header overhead would be worth it, etc.

It would be nice to verify that the section-fragment idea would produce
something that looked usable. Hand-written assembly… would require
research into how to specify the right section attributes, but would
likely be less effort than trying to make LLVM do something plausible.

I’ll see about creating an internal task for this.

According to Peter Smith, Arm Compiler 5 splits up DWARF v3 debugging
information and puts these sections into comdat groups:

“This approach did produce significantly more debug information than gcc
did. For small microcontroller projects this wasn’t a problem. For
larger feature phone problems we had to put a lot of work into keeping
the linker’s memory usage down as many of our customers at the time were
using 32-bit Windows machines with a default maximum virtual memory of 2Gb.”

I’d also love to see some examples (even hand-crafted assembly).

But even then, I’m not sure how viable it would be - as Fangrui
pointed out on another thread about this: ELF section overhead itself
is non-trivial (“sizeof(Elf64_Shdr) = 64.”) & it would probably be
rather difficult to reconstruct header-less slice-and-dicable sections
in some cases. For type information (a reduced overhead version of
-fdebug-types-section) I could see it - but for functions, they need
to refer to addresses - preferably in the debug_addr section, and
that’s accessed by index, so taking chunks out of it would break other
references to it, etc… adding the header would be expensive, and how
would the CU construct its DW_AT_ranges value if that has to be sliced
and diced? Again, some amount of linker magic might solve some of
these problems - but I think there’s still a lot of overhead to making
a solution that’s workable with a DWARF-agnostic linker (or even with
a DWARF aware one, but in an efficient amount of time/space where it’s
not only usable for small programs, or for linking when you’re
shipping a final production binary, etc)

The idea we have blue-skied internally would work something like this
(initially explicated in terms of the .debug_info section, then seeing
how that tactic applies to other sections):

There’s a top fragment, containing the CU header and the CU DIE itself.
Linker magic makes this first in the output file.

Quick curiosity: Is there existing linker magic for this? What does it
look like? I’d love to know so I can play around with hand crafted
prototypes/keep it in mind for such things.

Ben Dunbobbin did research into this some time ago, under the auspices
of a “COMDAT DWARF” investigation. He’s part of Sony’s linker team, and
it was a discussion with that team where I became convinced that the
fragmenting idea was feasible using existing defined ELF capabilities,
although perhaps in ways nobody had really taken advantage of. It
involved section groups and/or section ordering, but somebody much more
familiar with ELF than I am would have to explain it. I’ve cc’d Ben.

Regarding my discussion with our linker team:
They asked me whether it was feasible to use sections to subset the
DWARF, and I described the functional need (top & bottom fragments,
arbitrary stuff in between) and they thought the ELF section-group
and/or section-ordering features would be able to provide that.

I’m not aware that anyone actually tried prototyping that. The work
that James did (mentioned upthread) IIRC was using COMDAT and full
units with unit headers. My fading memory suggests the discussion
described just above was after that.

I definitely looked at this myself at some point, and IIRC, the prototype performance figures I posted earlier actually had very minimal linker work required to get this to work, but I might be getting myself mixed up with a different experiment! Anyway, LLD does already have sufficient support to do most, maybe all of this, I believe. Linkers not using linker scripts automatically group sections with the same name into a single output section. Sections with the same name within the same object are grouped consecutively, in order according to their input order, so a series of ,,, sections within the same CU would end up as a single cohesive section, as long as they were all named the same thing (strictly speaking, I don’t think anything in the ELF standard requires this, but every linker that I know of behaves this way). It’s possible to do different things using linker scripts (e.g. grouping sections with different names into a single output section), but I don’t think that’s needed for this approach.

When it comes to making the discarding happen naturally, there are two approaches. One is to use COMDATs. The idea is that the header and footer sections would not be in a group, but the other fragments would be in the same group as their corresponding function section. The problem with this approach is that the function sections must be COMDATs themselves, so this wouldn’t help with functions that are not in COMDATs for semantic reasons, even if they are in their own sections.

A second approach is similar to the first with the addition that functions don’t have to be COMDATs. However, it would require linker and assembler changes are “non-COMDAT” groups. From the ELF spec, it’s technically not a requirement that all section groups are COMDATs. Groups merely are kept or discarded all at once, whilst COMDAT groups are a special case that say there can be more than one group, of which only one is kept in the end. Last time I checked (it was a while ago), LLD didn’t support section groups other than COMDAT groups, so this probably won’t fly. Similarly, the assembler only provided syntax to support COMDAT groups, although I did experiment with a version that supported non-COMDAT groups too. I don’t think I have any performance numbers or similar for the results though.

The third approach, which probably makes the second approach redundant, and maybe also the first, uses the ELF SHF_LINK_ORDER flag to achieve the goal of discarding debug information. The SHF_LINK_ORDER section flag causes a set of sections at link time to be concatenated in the same order as their linked-to section, and if the linked-to section is discarded, so is the referencing section. Thus, if there were text sections .text.1, text.2, etc in that order in the object, with corresponding debug data fragments associated with each, all called .debug_info (same applies for the other sections), the fragment for text.1 would appear first, then that for .text.2 etc. The problem is what to do with the header and footer fragments. In these cases, I believe they both end up at one end, which is obviously no good. I circumvented this in my prototype by using linker scripts, IIRC In the ELF spec, it doesn’t say what to do with sections without the flag, if they end up in the same output section as those with the flag. In an ideal world, they’d be preserved in the output in the same relative order (i.e. a section before the ordered ones would appear first, and after the ordered ones last), but I don’t know how viable that is in a general sense.

.stack_sizes is a section that already follows a combination of approaches 1 and 3 - 1 for .stack_sizes contributions related to COMDAT sections, and 3 for those that aren’t COMDATs. However, that section doesn’t have a header/footer need, so doesn’t quite get us the whole way. Here’s example assembly snippets for the first and third approaches using .stack_sizes, but the section name could be switched for .debug_info/.debug_line etc etc easily enough:

Non-COMDAT pair. This .stack_sizes is linked via SHF_LINK_ORDER

.section .text.main,“ax”,@progbits
.section .stack_sizes,“o”,@progbits,.text.main,unique,0

COMDAT pair. This .stack_sizes is linked via SHF_LINK_ORDER and a group. The SHF_LINK_ORDER ensures it is ordered the same as the non-COMDAT versions.

.section .text.bar,“axG”,@progbits,bar,comdat
.section .stack_sizes,“Go”,@progbits,bar,comdat,.text._Z3barILi42EEiv,unique,1

The “o” in the attributes indicates the SHF_LINK_ORDER flag, and the name before the “unique” bit is the associated section. The “,comdat” bit and “G” make it part of a COMDAT group with the specified symbol as the identifier for that group, whilst the “unique, ” simply is the way to make unique sections with the same name.