[LLD] How to get rid of debug info of sections deleted by garbage collector

Hi,

After compiling an example.cpp file with “-c -ffunction-sections” and linking with “–gc-sections” (used ld.lld), I am still seeing debug info for the sections deleted by garbage collector in the generated executable.

Are there any compiler/linker options and/or other tools in LLVM to get rid of the above mentioned unneeded debug info?

If such options does not exist, what needs to be changed in the linker (lld)?

Thanks,
Ramana

It's not easy. It's also format dependent. I assume you're talking
about ELF here. In first approximation, the linker does not GC section
marked without SHF_ALLOC. At some point we did an analysis and in
practice it turns out most of them are debug info.
I seem to recall that Cary Coutant had a proposal for ld.gold on how
to reclaim them without breaking, but I can't find it easily (cc:ing
him directly).

Thanks,

From: llvm-dev [mailto:llvm-dev-bounces@lists.llvm.org] On Behalf Of
Davide Italiano via llvm-dev
Sent: Thursday, September 20, 2018 10:55 AM
To: ramana.venkat83@gmail.com; Cary Coutant
Cc: llvm-dev; LLDB
Subject: Re: [llvm-dev] [lldb-dev] [LLD] How to get rid of debug info of
sections deleted by garbage collector

>
> Hi,
>
> After compiling an example.cpp file with "-c -ffunction-sections" and
linking with "--gc-sections" (used ld.lld), I am still seeing debug info
for the sections deleted by garbage collector in the generated executable.
>
> Are there any compiler/linker options and/or other tools in LLVM to get
rid of the above mentioned unneeded debug info?
>
> If such options does not exist, what needs to be changed in the linker
(lld)?
>

It's not easy. It's also format dependent. I assume you're talking
about ELF here. In first approximation, the linker does not GC section
marked without SHF_ALLOC. At some point we did an analysis and in
practice it turns out most of them are debug info.
I seem to recall that Cary Coutant had a proposal for ld.gold on how
to reclaim them without breaking, but I can't find it easily (cc:ing
him directly).

The short answer is: Nothing you can do currently.

I had a chat with some of the Sony linker guys last week about this.
Currently .debug_info is monolithic; we'd have to break it up in some
fashion that would correspond with the way .text is broken up with
-ffunction-sections, in such a way that the linker would automatically
paste the right pieces back together to form a syntactically correct
.debug_info section in the final executable. There are some gotchas
that would need to be designed correctly (e.g. reference from an
inlined-subprogram to its abstract instance) but it didn't seem like
the problems were insurmountable.

The ultimate design almost certainly requires agreement about what the
ELF pieces should look like, and a description in the DWARF spec so
that consumers (e.g. dumpers) of the .o files would understand about
the fragmented sections. And then the linkers and dumpers have to
be modified to implement it all. :slight_smile:

Even without gc-sections, there is duplicate info to get rid of:
everything that ends up in a COMDAT, like template instantiations
and inline functions. This is actually a much bigger win than
anything you'd see left behind by GC.
--paulr

Right. Technically we can get rid of debug info that corresponds to dead sections, but in order to do that, you have to scan the entire debug info. Debug info is actually one of only few pieces of information that the linker has to have a special logic to merge them, and that is already slow. IIUC, debug info for dead sections doesn’t do any harm, so spending time in the linker to get rid of it isn’t probably worth the cost. If we really need to do, I want a new mechanism as Paul wrote.

Thank you all for your time in responding to my query.

My understanding was also similar to what you all mentioned here but wanted to check if there are any recent developments in solving this problem.

Thanks,
Ramana

Yep, in theory maybe “partial units” could be used to address this - though any solution that’s linker-agnostic will have some size overhead most likely (like type units) & I’ve never looked at them closely enough to know if just saying “partial units” is enough to describe the solution in detail or whether there’s lots of other unknowns/options to pick between.

The alternative is full DWARF-aware merging, which would be much more expensive - and then you’d really want to only do this in something like a DWP tool, not in the hot-path of the linker. (this is what dsymutil already does - would be great to generalize its DWARF-aware merging logic and see what it’d be like to use it DWP and maybe to use it in the linker for folks where adding work to the hot-path isn’t such a concern (or maybe to find out that it doesn’t have a very bad effect on that situation - especially if it parallelizes well (I think LLD doesn’t scale beyond a few cores - so if we have cores to spare and we can use one or more of them for DWARF merging, that might be totally fine))

Yep, in theory maybe “partial units” could be used to address this - though any solution that’s linker-agnostic will have some size overhead most likely (like type units) & I’ve never looked at them closely enough to know if just saying “partial units” is enough to describe the solution in detail or whether there’s lots of other unknowns/options to pick between.

The alternative is full DWARF-aware merging, which would be much more expensive - and then you’d really want to only do this in something like a DWP tool, not in the hot-path of the linker. (this is what dsymutil already does - would be great to generalize its DWARF-aware merging logic and see what it’d be like to use it DWP and maybe to use it in the linker for folks where adding work to the hot-path isn’t such a concern (or maybe to find out that it doesn’t have a very bad effect on that situation - especially if it parallelizes well (I think LLD doesn’t scale beyond a few cores - so if we have cores to spare and we can use one or more of them for DWARF merging, that might be totally fine))

lld is essentially a sequential program that uses parallel-for-loops at certain places to use multi-cores. We parallelize ICF and string merging that are computationally intensive tasks, so if you are using these features, lld actually scales pretty well. That said, I think that DWARF merging is mostly a sequential task which is perhaps hard to parallelize.