LTO, ThinLTO, and Split DWARF

@ayermolo @clayborg - guess you fb/meta folks are about the only other ones who care about Split DWARF+LTO, so maybe you’ve got some thoughts on this issue
(@pogo59 @jmorse - in case this overlaps with anything you care about)
(@adrian.prantl @JDevlieghere just in case you’re curious)

So, I have some questions about the intersection of (Thin and thick) LTO, Split DWARF. (with a bonus helping of Split DWARF inlining)

I guess some terminology:

DWARF compile unit: The description of one original compilation action.
Split DWARF: Where the compiler, instead of putting the DWARF in the .o file, it puts a little bit (the “skeleton”) in the .o file, and the rest in the .dwo file
Split DWARF inlining: In the interests of allowing symbolizing with inline frames without needing the .dwo files, there’s a mode llvm can use where it puts a bit of extra debug info (just enough to describe inlining) in the skeleton unit in the .o file.
LTO: Link time optimization (basically cramming multiple compilation actions into a single optimization step/generate a single object file)
ThinLTO: In an effort to scale LTO up (smooshing the whole program into a single, single-threaded, compilation doesn’t scale well) ThinLTO does a “thin link” step to discover important cross-references and then suggests narrower cross-module importing before resuming the compilation down to object files - so you still get many separate .o files, but they have pieces of other modules/compilations imported into them for optimization purposes.
DWP (DWarf Package) file: This is the equivalent to a linked binary, but for .dwo files.

Observations:

  1. DWP files certainly, and dwo files to a slightly lesser extent, aren’t really designed to cope with LTO (each CU is separate, there’s no allowance for how to reference between CUs - so no way to describe a cross-CU inlining that’s basically the main thing in LTO)

  2. DWARF in general doesn’t have a good answer for ThinLTO - where fragments of one CU are imported into a given compilation. So there’s no way to emit the CU in its entirety in a given compilation, and no way to stitch the fragments of CUs emitted by backend compilation back into a full CU.

  3. We’ve worked around some of this, to varying degrees - basically I ignored LTO+Split DWARF (Split DWARF is less valuable with full LTO anyway (since you’re probably not distributing this one action, so don’t need to worry about shipping all the DWARF from compilation to linking), when you’re only emitting one monolithic object file), and owing to ThinLTO+Split DWARF being not ideal anyway, combined with (1), we just mash all the imported code into the same compilation unit. (technically you can opt out of this mashing functionality with -split-dwarf-cross-cu-references but the resulting DWARF probably isn’t great/terribly usable by DWARF consumers - they might not expect multiple CUs in a single .dwo file - maybe they’d just parse it twice, maybe they’d fail to find the CU in the dwo because they only expect one, etc - worse in a dwp file, the cross-cu references wouldn’t be usable)
    Another aspect of threading this needle, and as implied by the flag name - this only accounted for cross-cu references due to inlining. Since that was the only expected way to reference the imported code in ThinLTO. So this avoided ever trying to emit a second CU in a ThinLTO+Split DWARF build.
    One extra wrinkle is that in, maybe an unnecessary attempt to provide more accurate functionality, the Split DWARF inlining info could still be associated with the right CU (or a fragment of it) since that was emitted in the .o, not in the .dwo, so doesn’t have the problems. Though CU fragments aren’t ideal, have a fair bit of overhead (having way more CUs, which might hurt DWARF consumer performance since they aren’t expecting lots of tiny CUs, hurts DWARF size because the CU header is not insignificant, etc).

Instigating incident:
Corrupted strings in a dwp.

Cause:
The dwp file reprocessed some string offsets twice due to two compilation units in one input .dwo file, causing the string offsets to be corrupted.
Multiple CUs came from the recent introduction of the Function Specialization pass - this lead to imported functions that could get code generated into the .o file even though they’re a separate function, not just an inlining. So the existing “don’t do cross-cu references” mitigation didn’t catch this case - it wasn’t a cross-cu reference, just a standalone function from another CU in a ThinLTO backend compile.

Obvious fix:
Pull the specialized function definition into the same CU.

Wrinkles:

  1. Detecting which is the authoritative CU to smoosh everything into isn’t obvious/unambiguous - for the previous/existing cases it was relatively easy to use the same CU as the enclosing/calling function to put the inlined function into. But now, in theory, if the imported function were code generated (I don’t think this happens in reality - the nature of importing is that the imported functions are put at the end of the module, and we wouldn’t import into a module that had no functions - but that seems a bit subtle to depend on) first it might be the first CU we create - and so we wouldn’t know which CU to put the function into.

  2. the split-dwarf-inlining gets more complicated, because then we still need to create the extra CUs but only for this inlining info…

  3. When doing ThinLTO importing, we don’t know if we’re going to use Split DWARF, otherwise we could potentially do the mashing-into-one-CU at that point, rather than trying to figure it out in the backend.

So, questions:

  1. is it worth keeping any options to emit multiple CUs (either -split-dwarf-cross-cu-references or the split dwarf inlining info) when using Split DWARF?

  2. any ideas how we should figure out the canonical CU to use in the cases (possibly all cases, if the answer to (1) is “no”) where we do have multiple CUs in a (Thin)LTO compile with Split DWARF

( DWARF Fission + ThinLTO was where some of this was discussed/designed many a year ago)

You do keep running into weird corner cases. My guess is that Sony licensees probably don’t use split DWARF but my only data is that I can’t recall ever seeing an external bug report about it. LTO is moderately popular, though, both thin and full, so we probably would have heard about it being combined with split DWARF.

I haven’t looked at function specialization at all. From a DIE tree perspective, the original function and a specialization would look like duplicates, although they’d (presumably) have different linkage names, and (surely) have different ranges and whatnot. Does the specialization end up in a separate CU? The other representation that comes to mind would imitate how inlined instances work, with an abstract DIE and each specialization (along with the original) being concrete DIEs that refer back to the same abstract DIE. They just wouldn’t actually be inlined instances. I’d guess this would be a little more consumer-friendly than partial units or whatever, at least in the same-CU case.

Modeling specialization on how inlining gets described makes these equivalent problems as far as the cross-CU (LTO) cases are concerned. I don’t have any brilliant ideas there, unfortunately.

'preciate you chiming in @pogo59
I ended up going with broadening the -split-dwarf-cross-cu-references to apply to any multiple-cu-in-a-single-dwo situation (so by default we’ll never produce more than one CU in a dwo): [DebugInfo][Split DWARF][LTO]: Ensure only a single CU is emitted · llvm/llvm-project@e731a26 · GitHub

Because I basically don’t trust the ecosystem to handle multiple CUs in a dwo, even if they don’t have cross-CU references.

Hopefully we’ll address that formally enough in the future to be trustworthy. If someone’s got tools that can handle this situation, they can use the backend flag to experiment with it.

Still not sure what I’d want to do, ideally, for ThinLTO imported code.

I haven’t looked at function specialization at all. From a DIE tree perspective, the original function and a specialization would look like duplicates, although they’d (presumably) have different linkage names, and (surely) have different ranges and whatnot.

Yep - as for how to best represent this in DWARF, check Jakub Jelinek’s recent new issue proposal on dwarf-workgroup entitled “New issue proposal: Outlined subroutines” - I think there’s sufficient overlap between that and function specialization.

@dblaikie
Sorry somehow I missed this post.
Internally we do not use -fsplit-dwarf-inlining. In some of the bigger builds it tends to blow up debug information. At this point all of our internal tools (most llvm based), can handle split dwarf on it’s own.

As for -split-dwarf-cross-cu-references. I am like 90% sure BOLT won’t handle multiple CUs in the same dwo file either.

So my understanding is that [DebugInfo][Split DWARF][LTO]: Ensure only a single CU is emitted · llvm/llvm-project@e731a26 · GitHub
Fixes the issue where ThinLTO + split dwarf (after Function Specialization pass was introduced), would result in multiple CUs in the .debug_info.dwo, correct?

I think it’s the right way to go. Thanks for working on it. :smiley:

Good to know the details, @ayermolo - thanks for chiming in!