I’ve been the maintainer of dsymutil since I joined Apple, so I think I’m probably best positioned to provide some context.
The story for debugging on Apple platform was purposely designed to make the compile-link-debug cycle as fast as possible. Not having to deal with DWARF during linking (which used to be the least parallelizable step) played a key role in that.
For those that are not familiar: the idea is that when you’re debugging locally, we don’t link DWARF at all, and the debugger finds the unlinked DWARF in the object files. However, that doesn’t work if you don’t have the object files around, so for releases, you need a way to archive the debug info. That’s where dsymutil comes in. But it’s doing more than just linking the DWARF together; it’s an optimizing linker which uses the One-Definition Rule to unique the linked debug info. That’s where most of the time is spent. The result is fast development builds, and slower archive builds with smaller DWARF.
Another design goal of the original dsymutil implementation was to stream the DWARF, both its input and output. The input DWARF can easily exceed your available memory. I don’t think the output is realistically a concern anymore, and despite its name, MCStreamer isn’t truly streaming anyway. The DWARF linking algorithm used to be single-threaded, and now is done by two threads in lockstep. We also process different architectures in parallel.
The biggest challenge with dsymutil is its qualification. When Fred upstreamed our original internal implementation, we did so by generating bug-for-bug identical DWARF. I did the same thing when implementing the lockstep algorithm mentioned above. We have no good way to assess the quality of the generated DWARF. It’s relatively easy to spot-check small things, like we do in our tests, but the real tricky issues only start showing up at debug-time when the debugger starts misbehaving.
I see two options to speed up dsymutil:
- Stick to binary compatibility of the generated DWARF. Over the years, I’ve spent quite some time thinking about this, and I can’t come up with a plan that would move the needle significantly enough to be worth the investment. I brainstormed with the folks that worked on the linker, and we had some ideas but nothing that would really, unless you change the generated output or are willing to keep potentially very large amounts of DWARF in memory.
- Find a way to qualify the generated DWARF by semantically comparing the debug info. Alexey’s parallel linker doesn’t rely on binary compatibility. However, I don’t have a plan for this mythical debug info semantic diffing tool.
As Alexey mentions in the issue you linked, ODR uniquing in parallel dwarflinker is still in an experimental state. The generated DWARF is non-deterministic, which is a show-stopper for us and anyone that cares about reproducible builds. I vaguely remember that we had a plan to solve it, but Alexey’s contract ended before he was able to get around to it.
Personally, I would recommend pursuing option (2). The parallel linker is passing all the existing tests, and sticking to binary compatibility won’t scale forever. I also believe that such a tool to semantically compare DWARF would benefit the whole debug info community, not just dsymutil.
I don’t have the bandwidth to invest in this in the short term. I would be more motivated to explore the qualification strategy if ODR uniquing was working, or work on the ODR uniquing if I had a qualification strategy. But without a concrete plan for either it’s hard to prioritize this over my other responsibilities.