[RFC] - Deduplication of debug information in linkers (LLD)

At least one proprietary linker put a lot of effort into deduplicating and rewriting debug information. This took up the majority of the link time despite serious engineering time on performance optimisation. For example, some sections were written from scratch by the linker because that proved faster than parsing the input. Teaching LLD to dedup DWARF should be expected to dramatically slow it down (when enabled, ideally not when disabled).

Is a more incremental approach viable? In particular, are there IR passes that fold debug strings etc that could be deployed before feeding everything into a linker?


I think what George suggested is different from making lld to parse, deduplicate and rewrite the DWARF debug info. What he suggested is to make the compiler emit multiple debug sections so that the linker can eliminate them just like it does for, for example, inline functions. The elimination is done by (essentially) section name, so it should be quite fast. Parsing all debug info and reconstructing it is completely different IMO.

And, IMO, a good time for a post processing tool similar to dsymutil or dwz.