Llvm-symbolizer has gotten extremely slow

Sometime recently-ish, llvm-symbolizer has gotten unacceptably slow for me. Any test which crashes (or hits report_fatal_error, which for some reason produces a backtrace and is treated as a crash) ends up hanging for an extended period. For example:

./bin/llvm-lit -v  ~/src/llvm-project/llvm/test/CodeGen/X86/pr49087.ll 
llvm-lit: /home/matt/src/llvm-project/llvm/utils/lit/lit/llvm/subst.py:122: note: Did not find llvm-debuginfod in /home/matt/src/llvm-project/build_debug/bin
-- Testing: 1 tests, 1 workers --
XFAIL: LLVM :: CodeGen/X86/pr49087.ll (1 of 1)

Testing Time: 21.57s
  Expectedly Failed: 1

I see 20-40 second test times for cases that crash instantly. This is greatly slowing down ninja check times. Does anyone have any idea what broke? This used to be completely unnoticeable.

I haven’t noticed slow-downs, but some time ago llvm-symbolizer became unreliable for me,
the LLVM backtraces symbolized by it are downright wrong. I don’t have a reproducer, though.

Works fine for me, too. Same time as a week ago (less than two seconds with the supplied reproducer).

The description of âš™ D86170 PrintStackTrace: don't symbolize if LLVM_DISABLE_SYMBOLIZATION is set may be useful. You may set LLVM_DISABLE_SYMBOLIZATION=1 when running ninja check-$something, especially with -O0 built llvm-symbolizer.

I was always using debug builds before and never noticed this; just turning it off is an ugly workaround. I extracted a sample output file, and something seems to have gotten worse between llvm 14 and 15:

On one sample, the packaged release build of llvm-symbolizer-14 takes ~0.8s. llvm-symbolizer-15 takes ~2.4, so something between those 2 releases made it 3x slower

I bisected this to [symbolizer] Parse DW_TAG_variable DIs to show line info for globals · llvm/llvm-project@cead4ec · GitHub

After this commit, my sample takes about 3x longer with a release build and >4x longer with a debug build. Specifically if I comment out this part, most of the slowdown is avoided. compile_units has about 1800 entries. It’s still taking about 5 seconds in a debug build, which is a lot for something happening in most XFAILed tests. Can we do better than the std::map lookups here?

Some caching mechanism could help. E.g. once updateVariableDieMap is called for a CU, store the min/max address somewhere, so that this unit can be skipped if the address is out of bounds.
This should give log2(N) for finding the CU containing the address plus log2(M) for query into the CU, instead of current N * log2(M).
Same could be achieved by storing one VariableDieMap in DWARFContext rather than N maps one in each DWARFUnit.

The link does not show exact location.

This small patch should give an approximation of what could be achieved:

  if (VariableDieMap.empty() || VariableDieMap.begin()->first < Address ||
      VariableDieMap.rbegin()->second.first >= Address)
    return DWARFDie();
  auto R = VariableDieMap.upper_bound(Address);

It gives nothing for me, but I have different times (2 secs rather than 20 secs in debug build, 0.16 secs in release build). Do you happen to have a static build?

1 Like

It’s supposed to be DWARFContext.cpp:1053, the loop over compile_units in getCompileUnitForAddress

Here is the flame graph

So it basically spends 66.6% of symbolization time reading [perhaps all] debug info and 33.3% of time updating maps. Lookup takes 0%.
This is happening when it tries to symbolize symbol _start, which could not be found via usual function lookup. If remove the _start from the symbolizer input, the flame graph is different and the total time reduces from 10s to 3.5s.

It would be nice if the author of the patch could comment on this.

Simple workaround:
Don’t_try_to_symbolize_entry_point.patch (712 Bytes)

Hi, I’m the author of the patch. @dblaikie and I have had various discussions about this, but the problem is basically that global variable addresses can fall outside the range in a DWARF CU. Practically, this means when symbolizing DATA addresses, we have to walk the entire DWARF tree, O(n), to find all the symbols.

The right solution (IMHO) is to expand the CU ranges to also contain symbols that are in that CU. This would allow us to walk just the CU’s instead, and use the regular range-checking logic. AIUI, this would require a change to DWARF (or at least to the regular practice), however I don’t see this being a problem as two CUs shouldn’t be overlapping in address ranges (right?).

Maybe David can help me out here as he’s the DWARF expert, I’m just a person who wanted llvm-symbolizer to be able to give line numbers for symbols, like addr2line does :slight_smile:

1 Like

Does addr2line have the same performance problems?

Personally I think an immediate issue we probably should’ve fixed even before this is to provide a way to disable LLVM symbolizing for GUnit death tests - I wouldn’t necessarily want to disable symbolization for my whole llvm build (I frequently use it to debug clang crashes, so symbolizing is useful) - but we shouldn’t be symbolizing stack traces we never render anyway because we /expected/ the crash.

Is anyone interested in implementing that? I’ve looked at it before but didn’t manage to actually implement it. Not sure if there’s actual discourse/email/etc history of my previous attempts.

As for fixing the symbolizer performance - if it’s already similar to addr2line for the same functionality, I guess there might not be much hope in pure symbolizer improvements. (please confirm, @hctim)

Is there any way for the symbolizer to be told whether it’s symbolizing a code or data address? If it knows it’s symbolizing a code address (true when doing crash backtrace symbolizing) we shouldn’t be paying the cost for global variable searching?

not sets the LLVM_DISABLE_SYMBOLIZATION environment variable

maybe lit could do the same for XFAIL tests?

So that API loops over all CUs, trolling through DIEs for the variable associated with an address, and then… throws it away? Returning only the CU? So the caller has to re-parse the DWARF looking for the variable again? That seems sub-optimal.

1 Like

It searches for debug info corresponding to address of _start. First as a subprogram; this fails quickly and not even visible on the first graph. After that it searches it as a variable, and this is where it spends 5.5 seconds. No debug info is found, of course.

Okay, so there’s what the symbolizer is doing, which (at least sometimes) is looking for a symbol that doesn’t exist; and as I learned long ago, it takes longer to find something that isn’t there. :slight_smile: But if we have DWARF v5 then there’s a name index and we can look it up there, and discover it doesn’t exist fairly quickly.

Then there’s what the API mentioned above is doing, which seems terribly inefficient, and might be an entirely separate question.

Ah, don’t mind me. I misread that as it’s searching for the symbol _start when in fact you said it’s searching for something with the address of _start.

I indeed said “symbol _start” in one comment and “address of _start” in the other. Sorry for the confusion.

FWIW, I manually run gdb-add-index on the clang binary and it speeds it up A LOT. Maybe CMake should apply it (or some equivalent debug flags) for everyone doing debug builds?

Interestingly we do use -Wl,--gdb-index if LLVM_USE_SPLIT_DWARF is enabled, but not otherwise (âš™ D108776 [CMake] Add `--gdb-index` option to linker if split dwarf compiling). That should probably be for any build with debug info, unless it makes linking with BFD even more atrocious than it already is?