Regarding the technical review questions: Asking for concrete data on performance overhead, memory footprint, and how kernel measurements translate to userspace is a standard part of the review process. These foundational questions help ensure the proposal is well-supported by empirical evidence.
Regarding the performance data discussion: The kernel-space measurements with ORC are valuable, but there’s a critical gap in our understanding. The kernel environment (ORC+omitfp vs FP+no-omitfp) differs fundamentally from userspace, where this RFC proposes SFrame adoption. Direct userspace performance measurements would strengthen the case by avoiding the need to extrapolate kernel savings to a different environment.
On LLD robustness: We should improve testing and maintain high standards for what we upstream.
AutoFDO uses LBR, which has a limited depth (32 on Skylake). Do you use FP for other stack trace requests—such as backtrace() and non-SamplePGO profiling?
I believe there may be some misalignment between expectations and what SFrame realistically offers. A few observations:
- linux-perf is waiting for V3
- Some developers explored enabling arm64 livepatch with SFrame, but Song Liu has since implemented a frame-pointer-based alternative
- No Linux distribution has adeopted SFrame.
- I am not the only one questioning the object file format design. As more linker-aware folks become aware of this format, similar concerns are being raised: GNU Tools Cauldron SFrame talk notes https://groups.google.com/g/generic-abi/c/3ZMVJDF79g8
- If SFrame is exclusively a kernel-space feature, it could be implemented entirely within objtool – similar to how
objtool --link --orcgenerates ORC info for vmlinux.o. This approach would eliminate the need for any modifications to assemblers and linkers, while allowing SFrame to evolve in any incompatible way.
While SFrame may still have potential to replace the ORC unwinder in the kernel (ORC being a simple format that is considerably larger than .eh_frame), its viability as a stack walking mechanism for userspace programs remains an open question.
From https://maskray.me/blog/2025-10-26-stack-walking-space-and-time-trade-offs
% ~/Dev/bloaty/out/release/bloaty /tmp/out/custom-sframe/bin/clang
FILE SIZE VM SIZE
-------------- --------------
63.9% 88.0Mi 73.9% 88.0Mi .text
11.1% 15.2Mi 0.0% 0 .strtab
7.2% 9.96Mi 8.4% 9.96Mi .rodata
6.4% 8.87Mi 7.5% 8.87Mi .sframe
5.1% 7.07Mi 5.9% 7.07Mi .eh_frame
2.9% 3.96Mi 0.0% 0 .symtab
1.4% 1.98Mi 1.7% 1.98Mi .data.rel.ro
0.9% 1.23Mi 1.0% 1.23Mi [LOAD #4 [R]]
0.7% 999Ki 0.8% 999Ki .eh_frame_hdr
0.0% 0 0.5% 614Ki .bss
0.2% 294Ki 0.2% 294Ki .data
0.0% 23.1Ki 0.0% 23.1Ki .rela.dyn
0.0% 8.99Ki 0.0% 8.99Ki .dynstr
0.0% 8.77Ki 0.0% 8.77Ki .dynsym
0.0% 7.24Ki 0.0% 7.24Ki .rela.plt
0.0% 6.73Ki 0.0% 0 [Unmapped]
0.0% 6.29Ki 0.0% 3.84Ki [21 Others]
0.0% 4.84Ki 0.0% 4.84Ki .plt
0.0% 3.36Ki 0.0% 3.30Ki .init_array
0.0% 2.50Ki 0.0% 2.50Ki .hash
0.0% 2.44Ki 0.0% 2.44Ki .got.plt
100.0% 137Mi 100.0% 119Mi TOTAL
% ~/Dev/unwind-info-size-analyzer/eh_size.rb /tmp/out/custom-sframe/bin/clang
clang: sframe=9303875 eh_frame=7408976 eh_frame_hdr=1023004 eh=8431980 sframe/eh_frame=1.2558 sframe/eh=1.1034
The results show that .sframe (8.87 MiB) is approximately 10% larger than the combined size of .eh_frame and .eh_frame_hdr (7.07 + 0.99 = 8.06 MiB).
Since .eh_frame cannot be eliminated (doing so would lead to loss of restoring callee-saved registers, LSDA, and personality information), this size overhead raises significant concerns about the practical viability of this approach.
It’s worth noting that there are existing, battle-tested implementations of a compact unwind format in LLVM, lld/MachO, and libunwind that work with C++ exception handling (in production since 2015 or earlier).
macOS was an early adopter, and OpenVMS appears to use a variant (“VSI OpenVMS Calling Standard”, and an earlier [RFC] Improving compact x86-64 compact unwind descriptors ).
The Apple Compact Unwinding Format: Documented and Explained - Faultlore documents how this works.
This approach allows frames that cannot be described compactly to fall back to DWARF unwinding, which means most DWARF CFI entries can be removed while still maintaining full functionality.
In contrast, today’s SFrame implementation in GNU Assembler would emit many warnings when building llvm-project.
As a concrete example, in a clang executable on macOS (objdump --arch=x86_64 -h), the __text section is 0x4a55470 bytes, while the __unwind_info section is very small at just 0x79060 bytes and __eh_frame at only 0x58 bytes, demonstrating the efficiency of this approach, even if it is only for synchronous.
Regarding AArch64: It would be valuable to gather Arm’s perspective on compact unwind for ELF (Was any form of compact unwind information considered for AArch64? · Issue #344 · ARM-software/abi-aa · GitHub). I’ll ask them. In the meantime, the Mach-O implementation provides a proven baseline for this architecture.