Llvm-symbolizer has gotten extremely slow

arsenm · December 16, 2022, 4:53pm

Sometime recently-ish, llvm-symbolizer has gotten unacceptably slow for me. Any test which crashes (or hits report_fatal_error, which for some reason produces a backtrace and is treated as a crash) ends up hanging for an extended period. For example:

./bin/llvm-lit -v  ~/src/llvm-project/llvm/test/CodeGen/X86/pr49087.ll 
llvm-lit: /home/matt/src/llvm-project/llvm/utils/lit/lit/llvm/subst.py:122: note: Did not find llvm-debuginfod in /home/matt/src/llvm-project/build_debug/bin
-- Testing: 1 tests, 1 workers --
XFAIL: LLVM :: CodeGen/X86/pr49087.ll (1 of 1)

Testing Time: 21.57s
  Expectedly Failed: 1

I see 20-40 second test times for cases that crash instantly. This is greatly slowing down ninja check times. Does anyone have any idea what broke? This used to be completely unnoticeable.

LebedevRI · December 16, 2022, 6:17pm

I haven’t noticed slow-downs, but some time ago llvm-symbolizer became unreliable for me,
the LLVM backtraces symbolized by it are downright wrong. I don’t have a reproducer, though.

s-barannikov · December 16, 2022, 6:40pm

Works fine for me, too. Same time as a week ago (less than two seconds with the supplied reproducer).

MaskRay · December 17, 2022, 4:18am

The description of ⚙ D86170 PrintStackTrace: don't symbolize if LLVM_DISABLE_SYMBOLIZATION is set may be useful. You may set LLVM_DISABLE_SYMBOLIZATION=1 when running ninja check-$something, especially with -O0 built llvm-symbolizer.

arsenm · December 17, 2022, 12:58pm

I was always using debug builds before and never noticed this; just turning it off is an ugly workaround. I extracted a sample output file, and something seems to have gotten worse between llvm 14 and 15:

On one sample, the packaged release build of llvm-symbolizer-14 takes ~0.8s. llvm-symbolizer-15 takes ~2.4, so something between those 2 releases made it 3x slower

arsenm · December 18, 2022, 12:36pm

I bisected this to [symbolizer] Parse DW_TAG_variable DIs to show line info for globals · llvm/llvm-project@cead4ec · GitHub

After this commit, my sample takes about 3x longer with a release build and >4x longer with a debug build. Specifically if I comment out this part, most of the slowdown is avoided. compile_units has about 1800 entries. It’s still taking about 5 seconds in a debug build, which is a lot for something happening in most XFAILed tests. Can we do better than the std::map lookups here?

s-barannikov · December 18, 2022, 2:20pm

Some caching mechanism could help. E.g. once updateVariableDieMap is called for a CU, store the min/max address somewhere, so that this unit can be skipped if the address is out of bounds.
This should give log2(N) for finding the CU containing the address plus log2(M) for query into the CU, instead of current N * log2(M).
Same could be achieved by storing one VariableDieMap in DWARFContext rather than N maps one in each DWARFUnit.

The link does not show exact location.

s-barannikov · December 18, 2022, 4:34pm

This small patch should give an approximation of what could be achieved:

  if (VariableDieMap.empty() || VariableDieMap.begin()->first < Address ||
      VariableDieMap.rbegin()->second.first >= Address)
    return DWARFDie();
  auto R = VariableDieMap.upper_bound(Address);

It gives nothing for me, but I have different times (2 secs rather than 20 secs in debug build, 0.16 secs in release build). Do you happen to have a static build?

arsenm · December 18, 2022, 10:12pm

It’s supposed to be DWARFContext.cpp:1053, the loop over compile_units in getCompileUnitForAddress

github.com

llvm/llvm-project/blob/cead4eceb01b935fae07bf4a7e91911b344d2fec/llvm/lib/DebugInfo/DWARF/DWARFContext.cpp#L1053


      
            // Global variables are often not found by the above search, for one of two
            // reasons:
            //   1. .debug_aranges may not include global variables. On clang, it seems we
            //      put the globals in the aranges, but this isn't true for gcc.
            //   2. Even if the global variable is in a .debug_arange, global variables
            //      may not be captured in the [start, end) addresses described by the
            //      parent compile unit.
            //
            // So, we walk the CU's and their child DI's manually, looking for the
            // specific global variable.
            for (std::unique_ptr<DWARFUnit> &CU : compile_units()) {
              if (DWARFDie Die = CU->getVariableForAddress(Address)) {
                return static_cast<DWARFCompileUnit *>(CU.get());
              }
            }
            return nullptr;
          }
          
          DWARFContext::DIEsForAddress DWARFContext::getDIEsForAddress(uint64_t Address) {
            DIEsForAddress Result;

s-barannikov · December 19, 2022, 12:11am

Here is the flame graph

So it basically spends 66.6% of symbolization time reading [perhaps all] debug info and 33.3% of time updating maps. Lookup takes 0%.
This is happening when it tries to symbolize symbol _start, which could not be found via usual function lookup. If remove the _start from the symbolizer input, the flame graph is different and the total time reduces from 10s to 3.5s.

It would be nice if the author of the patch could comment on this.

Simple workaround:
Don’t_try_to_symbolize_entry_point.patch (712 Bytes)

hctim · December 19, 2022, 6:24pm

Hi, I’m the author of the patch. @dblaikie and I have had various discussions about this, but the problem is basically that global variable addresses can fall outside the range in a DWARF CU. Practically, this means when symbolizing DATA addresses, we have to walk the entire DWARF tree, O(n), to find all the symbols.

The right solution (IMHO) is to expand the CU ranges to also contain symbols that are in that CU. This would allow us to walk just the CU’s instead, and use the regular range-checking logic. AIUI, this would require a change to DWARF (or at least to the regular practice), however I don’t see this being a problem as two CUs shouldn’t be overlapping in address ranges (right?).

Maybe David can help me out here as he’s the DWARF expert, I’m just a person who wanted llvm-symbolizer to be able to give line numbers for symbols, like addr2line does

dblaikie · December 19, 2022, 9:48pm

Does addr2line have the same performance problems?

Personally I think an immediate issue we probably should’ve fixed even before this is to provide a way to disable LLVM symbolizing for GUnit death tests - I wouldn’t necessarily want to disable symbolization for my whole llvm build (I frequently use it to debug clang crashes, so symbolizing is useful) - but we shouldn’t be symbolizing stack traces we never render anyway because we /expected/ the crash.

Is anyone interested in implementing that? I’ve looked at it before but didn’t manage to actually implement it. Not sure if there’s actual discourse/email/etc history of my previous attempts.

As for fixing the symbolizer performance - if it’s already similar to addr2line for the same functionality, I guess there might not be much hope in pure symbolizer improvements. (please confirm, @hctim)

Is there any way for the symbolizer to be told whether it’s symbolizing a code or data address? If it knows it’s symbolizing a code address (true when doing crash backtrace symbolizing) we shouldn’t be paying the cost for global variable searching?

aeubanks · December 19, 2022, 9:52pm

not sets the LLVM_DISABLE_SYMBOLIZATION environment variable

maybe lit could do the same for XFAIL tests?

pogo59 · December 19, 2022, 10:03pm

So that API loops over all CUs, trolling through DIEs for the variable associated with an address, and then… throws it away? Returning only the CU? So the caller has to re-parse the DWARF looking for the variable again? That seems sub-optimal.

s-barannikov · December 19, 2022, 10:07pm

It searches for debug info corresponding to address of _start. First as a subprogram; this fails quickly and not even visible on the first graph. After that it searches it as a variable, and this is where it spends 5.5 seconds. No debug info is found, of course.

pogo59 · December 19, 2022, 10:14pm

Okay, so there’s what the symbolizer is doing, which (at least sometimes) is looking for a symbol that doesn’t exist; and as I learned long ago, it takes longer to find something that isn’t there. But if we have DWARF v5 then there’s a name index and we can look it up there, and discover it doesn’t exist fairly quickly.

Then there’s what the API mentioned above is doing, which seems terribly inefficient, and might be an entirely separate question.

pogo59 · December 19, 2022, 10:23pm

Ah, don’t mind me. I misread that as it’s searching for the symbol _start when in fact you said it’s searching for something with the address of _start.

s-barannikov · December 19, 2022, 10:52pm

I indeed said “symbol _start” in one comment and “address of _start” in the other. Sorry for the confusion.

royjacobson · December 20, 2022, 10:22am

FWIW, I manually run gdb-add-index on the clang binary and it speeds it up A LOT. Maybe CMake should apply it (or some equivalent debug flags) for everyone doing debug builds?

jrtc27 · December 20, 2022, 3:39pm

Interestingly we do use -Wl,--gdb-index if LLVM_USE_SPLIT_DWARF is enabled, but not otherwise (⚙ D108776 [CMake] Add `--gdb-index` option to linker if split dwarf compiling). That should probably be for any build with debug info, unless it makes linking with BFD even more atrocious than it already is?

Topic		Replies	Views
bugpoint, LLVM tools (opt, llc, etc), and symbolizing costs LLVM Dev List Archives	3	75	June 11, 2017
New LLD performance builder LLVM Dev List Archives	13	114	March 27, 2018
llvm is getting slower, January edition LLVM Dev List Archives	16	109	January 20, 2017
Phronix does another speed test LLVM Dev List Archives	5	70	November 9, 2010
System hangs during last stages of LLVM build \| Tips on speeding it up ? LLVM Dev List Archives	6	134	February 23, 2017

Llvm-symbolizer has gotten extremely slow

Related topics