Rationale for the native LLDB unwinder?


What's the rationale for creating/using a native LLDB unwinder, instead of the Apple libunwind based unwinder?

It's my understanding that the libunwind unwinder is still enabled in LLDB.


Hi Björn,

Partially it was due to the restrictions of following the libunwind API, partially due to our implementation of libunwind (modified to implement the libunwind-remote API), and partially due to the fact that we paid real costs by having libunwind and lldb not share information or trust each other. There were fundamental performance issues with the way the libunwind API works.

The libunwind-remote implementation was an excellent test bed to try creating an adaptive backtracer which can switch between eh_frame CFI, assembly language prologue inspection, and architectural defaults. But by the time we had a functioning unwinder, looking at the compromises imposed by the API and the changes needed to the implementation, there wasn't any clear benefit to sticking with that source base. Originally we had a few groups inside Apple who wanted to have a libunwind-remote API for their program but by the time it was finished, they were all moving their interest to using lldb to provide this service.

The libunwind-remote API was designed to only follow eh_frame information and to have the library built separately for each architecture supported. We wanted to have multiple unwind schemes in play and we wanted to have a single library for all of them. Already the libunwind-remote API has half a dozen or so API extensions, including the instruction_length() callback which necessitates the driver program include a disassembler on variable-length-instruction architectures (read: x86). The EnhancedDisassembly library from llvm is a great source for this but it's still a rather large burden for users of the library. Another simple example of an API problem, the libunwind-remote API makes no allowance for a debugger's inferior function calls where the register contents are stored in the debugger's memory space instead of the inferior program's memory space.

libunwind and lldb don't share any of their knowledge between themselves - so for instance both may read and parse an eh_frame table from an executable. For assembly language inspection it's critical that we know the start address of a function. The libunwind-remote includes the find_proc_info() call which it could use but on a stripped program (which may only have one visible linker symbol), the program that implements find_proc_info() naively will say "Oh yes, this is 2 megabytes into this symbol I saw, it's all one function". If eh_frame information is in the binary, a smarter approach is to grab the function start/end addresses from eh_frame - but a typical program using libunwind-remote isn't going to do that. So libunwind-remote tries to find the information on its own, if possible - it doesn't trust the driver program.

(You might ask why we're not just using the eh_frame -- a rule of thumb we've come up with is to not trust eh_frame information if it's on the currently executing frame; it is only guaranteed to be valid at call sites and where an exception can be thrown. The compiler emits some information so that it *tends* to be valid at other locations as well but we've seen cases where it isn't and it confuses the debugger.)

The final issue was performance. The libunwind API assumes that you unwind every frame in its entirety - you step the "cursor" - and have full register context available at that point. On Mac OS X it is typical to see applications with a dozen threads and when execution stops in a GUI debugger, the first thing it does is backtrace all the threads so it can show function names. The speed of this backtrace is critical. The native unwinder allows us to do a "fast backtrace" if it can; the fast backtrace only knows how to restore a couple of registers (rip, rbp on x86-64) - if we need to retrieve additional registers, it knows how to do a "full" unwind which may involve reading the eh_frame information for that ObjectFile (which is phenomenally slow). libunwind-remote reads/parses the eh_frame for every shared library mentioned in a stack on startup so the first unwind is very slow. I spent a good amount of time trying to optimize things but it was hard to do with that design.

As an aside, there were some issues with libunwind-remote that were solvable but would have been a good amount of work to do. For instance, writing a register value in the middle of the stack requires that libunwind-remote track not just register values, but where those registers were retrieved from. (in memory? In the same register or a different one? In the debugger's address space in the case of an inferior function call?) There was also the problem that libunwind was designed with an assert style of error handling because following the eh_frame chain for an exception throw is reliable and safe... unwinding a stack during a debug session is neither reliable nor safe. I pulled out most of the asserts but some remain and they take down lldb every time it goes wrong on an unwind.

The native unwinder is coming along - I'm still working on a few bugs but I expect we'll switch over to using it as the default unwinder very soon. I like the approach of having UnwindPlans contributed from different sources (eh_frame information, DWARF CFI, assembly language prologue inspection, architectural defaults, etc) that all feed into the unwinder. I'd like to see multiple UnwindPlans for attempting to unwind from frame 0 when we have no known start address. The "follow the rbp chain" technique often works but it'd be cool if we could see that this method gives us an invalid saved-rip and try a fallback UnwindPlan, maybe one that assumes the saved rip can be found via rsp and assume the stack hasn't been modified so far during the lifetime of the function.

In the end we could have accomplished many of these things while continuing to use libunwind but it was clear we were imposing an unnecessary set of restrictions on ourselves and building the unwinder into lldb as an integral subsystem was going to be a lot easier to enhanced/maintaing long term.


Hi Jason,

Thanks a lot for your elaborate answer!


29 okt 2010 kl. 23.23 skrev Jason Molenda: