RFC: DWARF reduction tool

Proposal

To add a new tool similar to llvm-reduce that works on DWARF object files. The tool should progressively remove parts of debug info not essential to reproduce a bug, while making sure that it still passes the test program.

Motivation

Recently I’ve been debugging Unreal Engine with LLDB, and it was quite tricky to catch some of the issues. Debug info is ~11GB combined, and the engine takes 20 minutes to start under the debugger, making troubleshooting of LLDB issues especially fun.

In some cases it is possible to narrow down the problem by reducing the source code, by disabling debug info for a part of the project, or by reducing it on LLVM-IR level. These methods are quite difficult to perform on UE due to scale of the codebase, build system complexity, or when a bug happens with pre-built binaries.

However, we can attempt to reduce binaries instead, debug info sections specifically. Only a small portion of debug info may be relevant to the problem, so by cutting away everything else, we can get a much smaller reproducer.

I’ve tried this method with Unreal Engine, and cobbled together a naive reduction program by using LLVM DWARFLinker (taking llvm-dwarfutil as an example) and parts of llvm-reduce. It required a few changes to DWARFLinker to override what DWARF elements we want to discard or keep. I’m not sure if it is a good idea to use DWARFLinker, since we don’t really need any “linking”, just filtering. Perhaps a separate implementation can be simpler.

The algorithm to decide what parts of DWARF are not necessary is probably the most important here. As a quick solution I used an algorithm that removes DWARF elements randomly. Restricting it to only some tags (compile units, subprograms, etc.) helps the algorithm to converge faster.

With this approach I was able to reduce one of UE libraries from ~34MB of debug info down to 1KB, and converted it into a LIT test (bug report LLDB "forcefully completed" types cause an assert in clang · Issue #90938 · llvm/llvm-project · GitHub). It took about 3 hours to reduce it with a simple algorithm.

Questions

I wonder how people deal with troubleshooting DWARF info without a reduction tool like this? Do you think such a tool is useful enough to have it in LLVM? Is there a better way of doing this?

The code is not published yet, though I plan to make it public eventually and contribute to LLVM mainline. I wanted to get some feedback, and maybe change the design accordingly.

3 Likes

11 gigs of debug info? And I thought lldb’s 1.5 gigs was a lot!

How do people troubleshoot issues on large codebases (not just DWARF issues!)?

  • Suffer
  • Use DWO/spit DWARF (-gsplit-dwarf)
  • Selectively strip debug info from objects/libraries that don’t matter for the issue

As an occasional LLDB contributor, I see the value in such tool.

My experience with creduce suggests that users of such tool would benefit from exposing additional knobs. For instance, creduce allows individual passes to be disabled. Not something a first iteration needs, but useful to keep in mind.

1 Like

@asavonic, is your target using static linking or dynamic linking. If most of the debug info are evenly distributed across many shared libraries (dynamic linking), you can try enable symbol on-demand feature which will lazily parse debug info on-demand until required by breakpoint or callstack etc…
https://lldb.llvm.org/use/ondemand.html

Generally on Linux, large debug info can cause slow startup time because lldb has to manually index symbols. If that is the bottleneck, you should considering enabling dwarf5 .debug_names accelerator table which will require near zero indexing time.

As some example data, 2/3 targets in our company are > 10GB, and some targets can have debug info more than 70GB. We commonly have to enable split dwarf + gsimple_template_names to even build targets successfully. The two features above are the key features enabling our customers to debug targets in scale.

Btw: a quick way to identify the cost distribution is running “statistics dump” command.

I’d worry that DWARF consumers aren’t that robust to arbitrary DWARF - so such a tool is likely to trip over situations the consumer would never see in practice, and has limited interest in supporting. This is why I’ve generally advocated for source reduction/source fuzzing to make the debugger more robust to the sort of things it’s likely to see in practice.

but I realize that isn’t always practical, as you say, with situations of prebuilt code (though prebuilt code with debug info? if you’ve got debug info presumably you’ve also got the source, or you may be better off just stripping out the debug info?)

I’d generally encourage more work into automated tools for the techniques you mentioned, like selectively disabling debug info, selectively disabling LTO, etc.

If we were going to go for DWARF reduction that’s likely to look more like the DWARF generated by existing compilers - I’d consider a graph analysis of DWARF (starting at function definitions and global/namespace-scoped variables) and then trimming things in certain ways - eg: remove one of these top level entities, and discard any referenced entities that aren’t otherwise referenced - and demoting type definitions to declarations, removing anything that becomes unreferenced by that transformation.

(that latter one is more likely to trip up debuggers, it could create some more novel/unrealistic situations - but should be closer to common idioms/reality)

(also, I’d love a tool to render the DWARF reference graph - to help identify pinch-points in the graph that might be broken to help reduce debug info size… (I’d love such a tool to be really generic - to be able to reuse it for linker dependence/gc graph, build dependence graph, etc)

1 Like

Good point. Split DWARF is very helpful and it allows to reduce debug info without affecting libraries in any way.

Selectively removing debug info from libraries can be the first step of reduction. Very coarse grained, but effective at removing completely irrelevant libraries.

Agree, the tool must be adjustable. An algorithm that performs reduction on DWARF is not trivial, so if it gets stuck somewhere we need a way to guide it.

The target uses dynamic linking. There are ~830 shared libraries in total, but most of them are small (maybe not even loaded at the same time, not sure).

Thank you, I’ll try the “on demand” feature. Sounds great for actually using LLDB, but it may change things too much and hide some of the LLDB issues that I’m trying to catch. Easy to check though and fallback to the regular mode if it doesn’t work.

Great to see these optimizations. I’m stuck with prebuilt binaries for now that don’t have this section, but it sure is helpful.

So if LLDB crashes on one of these monsters, how do you reduce the problem down to something that can be used as a LIT test, for example?

Oh, I didn’t know about this option. Thank you!

My assumption is that if we start from a valid DWARF, and remove parts of it that are not relevant, we should still get a valid DWARF, just incomplete. Granted, if remove fields of structures, for example, then it will not have the same memory layout, but it should be OK for troubleshooting LLDB itself.

The goal for a reduced DWARF is to execute the same code path in LLDB as the original does, and trigger the same bug. Anything else should be interpreted as a “bad” attempt at reduction, and the tool should try again.

Build systems can be tricky to setup. It can be customer’s code and they are not willing to share it. Or, as you’ve mentioned, there may not be an automated way to perform source code reduction. Actually, I don’t know if anything like this is available. It was always a manual process for me.

Therefore if a DWARF reduction tool can bypass all of this, and just give me a snippet of DWARF that triggers the issue - great.

Thank you for summarizing this! Some of this is how the prototype based on DWARFLinker works - DWARFLinker has to pull all dependencies (like fields for classes, or argument types for functions), and with a bit of hacking we can trim them all at once. Order of operations matters to keep the original bug present, and not erase everything.

Building a graph doesn’t seem too difficult, and perhaps it is already done in LLVM tools to some degree. What I couldn’t find is an ability to build a graph, modify it without breaking, and write it back.

So perhaps having this as a separate library can also enable tooling that you mentioned.

so I have quite a bit of experience reducing things and what David is saying here is absolutely the way to go. focus on the most coarse-grained possible transformations, and aim specifically at those that do not seem likely to create weirdly broken DWARF. test input reducers like C-Reduce often end up accidentally being highly capable fuzzers. if that’s what you want, then great. but if you just need to get a specific bug fixed, then that’s not very useful.

the specific bad thing that can happen here is something we’ve called hijacking: you start out reducing a trigger for one bug and end up reducing a trigger for a totally different bug. this is extremely likely to happen when:

  1. the software under test has a high latent bug rate
  2. the reducer does all sorts of crazy transformations
  3. the interestingness test is somewhat non-specific, for example it simply looks for a crash

item 1 isn’t really under our control. item 2 can be addressed by taming down the reducer. item 3 is the user’s responsibility, but it’s tricky in practice.

2 Likes