LLVM libunwind stack usage

I've started the process of bringing LLVM's libunwind into the FreeBSD
base system[1]. As part of that process we've tested building the
approximately 25,000 third party software packages in the FreeBSD
ports collection against a modified FreeBSD with libunwind
included[2]. Of course, I wouldn't expect much in the way of build
failures -- I'd expect any issues to be largely run-time ones.

We did observe one build failure on x86-64 though, in a software
package that builds and runs exception-using tool at build time. The
failure turned out to be a stack overflow[3] during the forced-unwind
cleanup of a thread with a minimum-sized (4K) stack.

LLVM libunwind has allows for 120 saved registers, common across all
architectures (kMaxRegisterNumber in src/DwarfParser.hpp). In contrast
the GCC unwinder has a target-dependent maximum; on x86-64 it's 18.
LLVM libunwind requires 1920 bytes for register storage, vs. 288 for
the GCC unwinder.

Is it reasonable to change LLVM libunwind to use an approach similar
to GCC's unwinder, and have a target-specific maximum DWARF register
number? X86 does have DWARF register numbers above the 18 that GCC
accommodates, but they're not going to be useful in the unwinder
anyhow.

[1] [base] Revision 293450
[2] 206039 – [exp-run] Enable LLVM libunwind by default on x86 and arm
[3] 206384 – llvm libunwind requires larger stack than old unwinder (segfault while building lang/polyml)

I am not 100% sure, but I was under the impression that the unwinder would restore any registers that had DWARF instructions related to them and does not necessarily assume that the default calling convention applies. If I implement a language that uses its own calling convention where all registers are callee save (including vector / FPU ones), for example, then the unwinder should be able to restore my stack frame correctly, similarly if I use a calling convention like this for side exits from JIT’d code, to avoid bloating the JIT’d code with register saves. This kind of use would be blocked by the GCC approach, which sounds as if it is baking C ABI details into a language-agnostic unwinder.

David

Are you suggesting that the size of the data arrays in unw_context_t and unw_cursor_t be made smaller? If so, then I am concerned that this might break binary compatibility for shared-object builds of libunwind (assuming that's a thing). Both of those structures are mentioned and used in the libunwind.h header.

If binary compatibility turns out to be a non-issue, then I'm fine with making those data array sizes platform dependent. I would want there to be static_asserts in various locations to make sure that the size of the Registers_xxx classes stays in sync with the data arrays though.

I don't know if you can get away with just saving the callee spilled registers or not.

Yeah, the question of unw_context_t is one of the reasons why I decided
to not include the HP interface the system unwinder in NetBSD. For the
"normal" exception handling code path, the allocation is contained
completely within the library and only exposed via pointer references.
This makes it much easier to only require as much stack space as
necessary.

Joerg

There are a number of complications here. First, it is nowhere really
documented what registers are valid for .eh_frame use -- which is
surprisingly nasty on some platforms like PPC. There is the issue of the
HP interface wanting to use different mappings. I haven't seen any x86
code using FP registers in the EH path, that was a good enough reason to
not mess with the mapping. Further complications are questions like
whether the code should be able to deal with MMX vs FP overlap etc.

For ARM, NetBSD is using normal Itanium based unwinding with lazy FP
save/restore. That has the huge advantage of keeping most of the
instruction selection issues out of the code -- if the application is
using VFP, it can be safely assumed that VFP is available, but
otherwise, no need to bother with it.

For non-FP registers, there rarely is a point in trying to skip
saving/restoring all of them, so no platform in NetBSD does that.

Joerg

This sounds pretty reasonable. Expecting EH to work on a 4K stack seems
silly, but exceptions can be thrown from deep in a call tree and I think it
makes sense to keep stack usage down.