@ayermolo @clayborg - guess you fb/meta folks are about the only other ones who care about Split DWARF+LTO, so maybe you’ve got some thoughts on this issue
(@pogo59 @jmorse - in case this overlaps with anything you care about)
(@adrian.prantl @JDevlieghere just in case you’re curious)
So, I have some questions about the intersection of (Thin and thick) LTO, Split DWARF. (with a bonus helping of Split DWARF inlining)
I guess some terminology:
DWARF compile unit: The description of one original compilation action.
Split DWARF: Where the compiler, instead of putting the DWARF in the .o file, it puts a little bit (the “skeleton”) in the .o file, and the rest in the .dwo file
Split DWARF inlining: In the interests of allowing symbolizing with inline frames without needing the .dwo files, there’s a mode llvm can use where it puts a bit of extra debug info (just enough to describe inlining) in the skeleton unit in the .o file.
LTO: Link time optimization (basically cramming multiple compilation actions into a single optimization step/generate a single object file)
ThinLTO: In an effort to scale LTO up (smooshing the whole program into a single, single-threaded, compilation doesn’t scale well) ThinLTO does a “thin link” step to discover important cross-references and then suggests narrower cross-module importing before resuming the compilation down to object files - so you still get many separate .o files, but they have pieces of other modules/compilations imported into them for optimization purposes.
DWP (DWarf Package) file: This is the equivalent to a linked binary, but for .dwo files.
DWP files certainly, and dwo files to a slightly lesser extent, aren’t really designed to cope with LTO (each CU is separate, there’s no allowance for how to reference between CUs - so no way to describe a cross-CU inlining that’s basically the main thing in LTO)
DWARF in general doesn’t have a good answer for ThinLTO - where fragments of one CU are imported into a given compilation. So there’s no way to emit the CU in its entirety in a given compilation, and no way to stitch the fragments of CUs emitted by backend compilation back into a full CU.
We’ve worked around some of this, to varying degrees - basically I ignored LTO+Split DWARF (Split DWARF is less valuable with full LTO anyway (since you’re probably not distributing this one action, so don’t need to worry about shipping all the DWARF from compilation to linking), when you’re only emitting one monolithic object file), and owing to ThinLTO+Split DWARF being not ideal anyway, combined with (1), we just mash all the imported code into the same compilation unit. (technically you can opt out of this mashing functionality with
-split-dwarf-cross-cu-referencesbut the resulting DWARF probably isn’t great/terribly usable by DWARF consumers - they might not expect multiple CUs in a single .dwo file - maybe they’d just parse it twice, maybe they’d fail to find the CU in the dwo because they only expect one, etc - worse in a dwp file, the cross-cu references wouldn’t be usable)
Another aspect of threading this needle, and as implied by the flag name - this only accounted for cross-cu references due to inlining. Since that was the only expected way to reference the imported code in ThinLTO. So this avoided ever trying to emit a second CU in a ThinLTO+Split DWARF build.
One extra wrinkle is that in, maybe an unnecessary attempt to provide more accurate functionality, the Split DWARF inlining info could still be associated with the right CU (or a fragment of it) since that was emitted in the .o, not in the .dwo, so doesn’t have the problems. Though CU fragments aren’t ideal, have a fair bit of overhead (having way more CUs, which might hurt DWARF consumer performance since they aren’t expecting lots of tiny CUs, hurts DWARF size because the CU header is not insignificant, etc).
Corrupted strings in a dwp.
The dwp file reprocessed some string offsets twice due to two compilation units in one input .dwo file, causing the string offsets to be corrupted.
Multiple CUs came from the recent introduction of the Function Specialization pass - this lead to imported functions that could get code generated into the .o file even though they’re a separate function, not just an inlining. So the existing “don’t do cross-cu references” mitigation didn’t catch this case - it wasn’t a cross-cu reference, just a standalone function from another CU in a ThinLTO backend compile.
Pull the specialized function definition into the same CU.
Detecting which is the authoritative CU to smoosh everything into isn’t obvious/unambiguous - for the previous/existing cases it was relatively easy to use the same CU as the enclosing/calling function to put the inlined function into. But now, in theory, if the imported function were code generated (I don’t think this happens in reality - the nature of importing is that the imported functions are put at the end of the module, and we wouldn’t import into a module that had no functions - but that seems a bit subtle to depend on) first it might be the first CU we create - and so we wouldn’t know which CU to put the function into.
the split-dwarf-inlining gets more complicated, because then we still need to create the extra CUs but only for this inlining info…
When doing ThinLTO importing, we don’t know if we’re going to use Split DWARF, otherwise we could potentially do the mashing-into-one-CU at that point, rather than trying to figure it out in the backend.
is it worth keeping any options to emit multiple CUs (either
-split-dwarf-cross-cu-referencesor the split dwarf inlining info) when using Split DWARF?
any ideas how we should figure out the canonical CU to use in the cases (possibly all cases, if the answer to (1) is “no”) where we do have multiple CUs in a (Thin)LTO compile with Split DWARF
( DWARF Fission + ThinLTO was where some of this was discussed/designed many a year ago)