So I’ve been looking at a particular performance problem with LLVM’s symbolizer due to the use of ThinLTO, split DWARF, and split DWARF inlining info.
This combination has a couple of problems:
-
it means multiple CUs in a single DWO, which isn’t well defined/specified, and best avoided - so I’m working on fixing that here (won’t fix split DWARF+Full LTO) because we already don’t use cross-CU references in the split units (because there’s no supported way to express that in DWARF), so we clone/move any DIEs (like subprograms) referenced cross-CU into the CU that references them (eg: cross-CU inlining places the abstract subprogram definition for the inlined subroutine into the CU that has the inlining - rather than cross-CU referencing into the other CU)) - and in ThinLTO the only reason other units exist is to cross-CU optimize/inline, no code for imported CUs is ever emitted (except where it’s been inlined) - so a ThinLTO compile has one primary unit, and some other units it inlines from - so those other units never emit anything in the split unit, just a few DIEs in the skeleton unit if you’re using split DWARF inlining (or no unit at all if you aren’t using that feature) - so I’m working on making it so those units are non-split (rather than having a degenerate/empty split unit)
-
symbolizer performance is hurt because whenever it sees a unit without ranges at the unit DIE, it assumes the producer just skipped those - and goes searching through the implementation DIEs (which may mean going over to the .dwo, or loading a whole .dwp) to see where their addresses are.
It’s this second step that’s a bit painfully unnecessary, especially for a large DWP on a remote filesystem, etc.
So, anyone have opinions on whether we should
a) decide that a unit without ranges covers no ranges - and don’t do the search
b) emit zero-length ranges on any unit that has no code ranges (low/high pc zero? Could pick anything, but that seems the most obvious)
Thanks,
So I've been looking at a particular performance problem with LLVM's symbolizer due to the use of ThinLTO, split DWARF, and split DWARF inlining info.
This combination has a couple of problems:
1) it means multiple CUs in a single DWO, which isn't well defined/specified, and best avoided - so I'm working on fixing that here (won't fix split DWARF+Full LTO) because we already don't use cross-CU references in the split units (because there's no supported way to express that in DWARF), so we clone/move any DIEs (like subprograms) referenced cross-CU into the CU that references them (eg: cross-CU inlining places the abstract subprogram definition for the inlined subroutine into the CU that has the inlining - rather than cross-CU referencing into the other CU)) - and in ThinLTO the only reason other units exist is to cross-CU optimize/inline, no code for imported CUs is ever emitted (except where it's been inlined) - so a ThinLTO compile has one primary unit, and some other units it inlines from - so those other units never emit anything in the split unit, just a few DIEs in the skeleton unit if you're using split DWARF inlining (or no unit at all if you aren't using that feature) - so I'm working on making it so those units are non-split (rather than having a degenerate/empty split unit)
2) symbolizer performance is hurt because whenever it sees a unit without ranges at the unit DIE, it assumes the producer just skipped those - and goes searching through the implementation DIEs (which may mean going over to the .dwo, or loading a whole .dwp) to see where their addresses are.
It's this second step that's a bit painfully unnecessary, especially for a large DWP on a remote filesystem, etc.
So, anyone have opinions on whether we should
a) decide that a unit without ranges covers no ranges - and don't do the search
Are there compilers that do this ("forget" to emit ranges) that we care to support with llvm-symbolizer?
-- adrian
So I’ve been looking at a particular performance problem with LLVM’s symbolizer due to the use of ThinLTO, split DWARF, and split DWARF inlining info.
This combination has a couple of problems:
-
it means multiple CUs in a single DWO, which isn’t well defined/specified, and best avoided - so I’m working on fixing that here (won’t fix split DWARF+Full LTO) because we already don’t use cross-CU references in the split units (because there’s no supported way to express that in DWARF), so we clone/move any DIEs (like subprograms) referenced cross-CU into the CU that references them (eg: cross-CU inlining places the abstract subprogram definition for the inlined subroutine into the CU that has the inlining - rather than cross-CU referencing into the other CU)) - and in ThinLTO the only reason other units exist is to cross-CU optimize/inline, no code for imported CUs is ever emitted (except where it’s been inlined) - so a ThinLTO compile has one primary unit, and some other units it inlines from - so those other units never emit anything in the split unit, just a few DIEs in the skeleton unit if you’re using split DWARF inlining (or no unit at all if you aren’t using that feature) - so I’m working on making it so those units are non-split (rather than having a degenerate/empty split unit)
-
symbolizer performance is hurt because whenever it sees a unit without ranges at the unit DIE, it assumes the producer just skipped those - and goes searching through the implementation DIEs (which may mean going over to the .dwo, or loading a whole .dwp) to see where their addresses are.
It’s this second step that’s a bit painfully unnecessary, especially for a large DWP on a remote filesystem, etc.
So, anyone have opinions on whether we should
a) decide that a unit without ranges covers no ranges - and don’t do the search
Are there compilers that do this (“forget” to emit ranges) that we care to support with llvm-symbolizer?
I’m not specifically aware of any, though haven’t gone looking.
Just in case this wasn’t obvious in the sub-text:
I think we should figure out whether this assumption in llvm-symbolizer is actually needed to support a compiler we care about and then potentially remove it, or enforce it only when the CU is < DWARF 5 or something like that.
– adrian
Yeah, fair - I’ll give it a week or something, see if Paul or anyone else has ideas about why the existing behavior might be useful before I remove it.
Looks like I argued (& then tested) previously for support for the case where the CU has no ranges, but sub-DIEs do: http://lists.llvm.org/pipermail/llvm-dev/2017-November/119131.html
(Just for the record, LLVM gained support for CU ranges were implemented r197776, December 2013 (& shortly after that became the default in r203968, March 2014 - in the 3.5 release) - looks like GCC got this somewhere between GCC 4.1 and GCC 4.4 according to godbolt testing, so on/before March 2012 I think)
So I’ve gone ahead and committed this change in r349333 - open to further discussion, reverting it, etc.