Sometime recently-ish, llvm-symbolizer has gotten unacceptably slow for me. Any test which crashes (or hits report_fatal_error, which for some reason produces a backtrace and is treated as a crash) ends up hanging for an extended period. For example:
./bin/llvm-lit -v ~/src/llvm-project/llvm/test/CodeGen/X86/pr49087.ll
llvm-lit: /home/matt/src/llvm-project/llvm/utils/lit/lit/llvm/subst.py:122: note: Did not find llvm-debuginfod in /home/matt/src/llvm-project/build_debug/bin
-- Testing: 1 tests, 1 workers --
XFAIL: LLVM :: CodeGen/X86/pr49087.ll (1 of 1)
Testing Time: 21.57s
Expectedly Failed: 1
I see 20-40 second test times for cases that crash instantly. This is greatly slowing down ninja check times. Does anyone have any idea what broke? This used to be completely unnoticeable.
I was always using debug builds before and never noticed this; just turning it off is an ugly workaround. I extracted a sample output file, and something seems to have gotten worse between llvm 14 and 15:
On one sample, the packaged release build of llvm-symbolizer-14 takes ~0.8s. llvm-symbolizer-15 takes ~2.4, so something between those 2 releases made it 3x slower
After this commit, my sample takes about 3x longer with a release build and >4x longer with a debug build. Specifically if I comment out this part, most of the slowdown is avoided. compile_units has about 1800 entries. It’s still taking about 5 seconds in a debug build, which is a lot for something happening in most XFAILed tests. Can we do better than the std::map lookups here?
Some caching mechanism could help. E.g. once updateVariableDieMap is called for a CU, store the min/max address somewhere, so that this unit can be skipped if the address is out of bounds.
This should give log2(N) for finding the CU containing the address plus log2(M) for query into the CU, instead of current N * log2(M).
Same could be achieved by storing one VariableDieMap in DWARFContext rather than N maps one in each DWARFUnit.
So it basically spends 66.6% of symbolization time reading [perhaps all] debug info and 33.3% of time updating maps. Lookup takes 0%.
This is happening when it tries to symbolize symbol _start, which could not be found via usual function lookup. If remove the _start from the symbolizer input, the flame graph is different and the total time reduces from 10s to 3.5s.
Hi, I’m the author of the patch. @dblaikie and I have had various discussions about this, but the problem is basically that global variable addresses can fall outside the range in a DWARF CU. Practically, this means when symbolizing DATA addresses, we have to walk the entire DWARF tree, O(n), to find all the symbols.
The right solution (IMHO) is to expand the CU ranges to also contain symbols that are in that CU. This would allow us to walk just the CU’s instead, and use the regular range-checking logic. AIUI, this would require a change to DWARF (or at least to the regular practice), however I don’t see this being a problem as two CUs shouldn’t be overlapping in address ranges (right?).
Maybe David can help me out here as he’s the DWARF expert, I’m just a person who wanted llvm-symbolizer to be able to give line numbers for symbols, like addr2line does
Does addr2line have the same performance problems?
Personally I think an immediate issue we probably should’ve fixed even before this is to provide a way to disable LLVM symbolizing for GUnit death tests - I wouldn’t necessarily want to disable symbolization for my whole llvm build (I frequently use it to debug clang crashes, so symbolizing is useful) - but we shouldn’t be symbolizing stack traces we never render anyway because we /expected/ the crash.
Is anyone interested in implementing that? I’ve looked at it before but didn’t manage to actually implement it. Not sure if there’s actual discourse/email/etc history of my previous attempts.
As for fixing the symbolizer performance - if it’s already similar to addr2line for the same functionality, I guess there might not be much hope in pure symbolizer improvements. (please confirm, @hctim)
Is there any way for the symbolizer to be told whether it’s symbolizing a code or data address? If it knows it’s symbolizing a code address (true when doing crash backtrace symbolizing) we shouldn’t be paying the cost for global variable searching?
So that API loops over all CUs, trolling through DIEs for the variable associated with an address, and then… throws it away? Returning only the CU? So the caller has to re-parse the DWARF looking for the variable again? That seems sub-optimal.
It searches for debug info corresponding to address of _start. First as a subprogram; this fails quickly and not even visible on the first graph. After that it searches it as a variable, and this is where it spends 5.5 seconds. No debug info is found, of course.
Okay, so there’s what the symbolizer is doing, which (at least sometimes) is looking for a symbol that doesn’t exist; and as I learned long ago, it takes longer to find something that isn’t there. But if we have DWARF v5 then there’s a name index and we can look it up there, and discover it doesn’t exist fairly quickly.
Then there’s what the API mentioned above is doing, which seems terribly inefficient, and might be an entirely separate question.