[RFC] Improving compact x86-64 compact unwind descriptors

tl;dr: I think we can get to 16B/function and I’m fairly certain that we will need format changes.


I think we can get down to ~16 bytes per function in the common case by increasing the size of a .eh_frame_hdr entry to 16 bytes and omitting FDEs.

eh_frame_hdr entry: 4B address, 4B LSDA, 8B unwind descriptor.

The unwind descriptor from the OpenVMS proposal needs to be modified to add as follows: bits 46..0 unmodified; bit 47 (formerly epilogue indicator) gets expanded to >=7 bits “epilogue size”; some of the reserved bits (e.g. 4 bits) get repurposed into a “personality function ID”.

  • Epilogue size: we can roughly infer the size of the function from the next eh_frame_hdr entry, but due to padding between functions (and shrink wrapping), we can’t do so not exactly. I’d expect being able to encode 127 bytes of padding should cover most cases (leaving 127=no epilogue).

  • Personality function ID: most binaries have 1 or 2 (rarely a few more) different personality functions (including the absence, typically one per language). Therefore, add a table of personality function references as separate array to .eh_frame_hdr and use a few bits to index this table.

There can be multiple .eh_frame_hdr entries for a function, if one unwind descriptor is insufficient (this is needed for multiple epilogues). For entries with a compact unwinding descriptor in .eh_frame_hdr, the linker can drop the FDE. This would require minor additional effort in the linker implementation. As an additional benefit, unwinding requires no indirection.


The proposed (extended) compact unwinding format (link to mail here, the message in Discourse is truncated) will likely need some changes, in particular to handle the extremely common case of an epilogue in the middle of the function (I’d rather not change the generated code here to avoid performance losses). (Tail duplication is also costly and every tail would require a .eh_frame_hdr entry, we might want to do that more conservatively.) The prologue/epilogue description for RSP-based descriptors is insufficient to accurately describe the stack adjustments. It should describe exactly that it must be a push*-add (prologue) or sub-pop* sequence; the size of the pop instructions is derivable from the register encoding. The register permutation is unspecified. With the variable epilogue offset, the RET needs no longer to be specified as being part of the epilogue.

(I wanted to gather data on the applicability of the format to current binaries and started to write a script that encodes DWARF CFI into compact unwind descriptors and ran into these issues. The code is very WIP, so no data yet, but I already noted that many functions required more than one unwind descriptor with the current format.)

APX instructions PUSHP/POPP/PUSH2P/POP2P need to be supported through flags and the instruction sequence needs to be specified, otherwise it is impossible to have correct async unwind info.


That’s fascinating and rather unexpected – this means that >25% of your functions are leaf functions? Or did you also eliminated the frame info for functions that don’t need it for unwinding? For stack-less leaf functions, I see how frame info could be omitted, but in other cases, it would break async unwinding.

1 Like