[RFC] DWARF CFI validation

Introduction

Debuggers, profilers, and language runtimes need a way to unwind the stack to determine register values within each call frame. In DWARF-based platforms, the assembler encodes the unwinding information as DWARF Call Frame Information (CFI) within the object file, using CFI directives that are automatically generated by the compiler for high-level languages. However, in a handwritten assembly, you must write these directives manually.

Incorrect CFI directives can disrupt some languages’ runtime exception handling (e.g. C++), and result in malformed frames that cause debuggers to display incorrect register values. Common mistakes, such as forgetting to inform the unwinder about stack movement or using an incorrect sign for an offset, are easy to miss. Distinguishing between incorrect values caused by these mistakes and those from genuine bugs is challenging and time-consuming.

To identify these issues, we are developing a CFI directive checker. This tool would be useful in the following scenarios:

  • Validating CFI directives in hand-written assembly code.
  • Checking the compatibility of hand-written CFI directives with the compiler-generated CFI of surrounding code.
  • Annotating disassembly.

Background

Call Frame Information (CFI), as detailed in the DWARF standard (section 6.4.1), is organized into tables, with a separate table for each function. Within each table, rows correspond to the program’s instructions and columns represent the machine registers and the Canonical Frame Address (CFA). Each entry in this table provides a rule that instructs the unwinder on how to determine the caller’s value for a given register (or the CFA value) using the current register values (or relative to the current CFA).

The CFI directives inform the unwinder about the difference between each row and its previous row in the table. We refer to each row of the table as the CFI state for that line of the program. We call each entry of the table a CFI value or an unwinding information.

The CFI directives are grouped into Frame Description Entries (FDE) that normally are the same as function regions. Prologue directives are the directives from the beginning of an FDE until reaching the first instruction.

Proposal

We propose adding UnwindInfoChecker to the MC layer, a static analysis that validates CFI directives by comparing them against the semantic effects of their associated machine instructions.

Overview

UnwindInfoChecker analyzes and validates CFI directives within each function unit (delimited by .cfi_startproc and .cfi_endproc). It processes machine instructions and associated CFI directives linearly by program order. The analysis for a function unit begins by initializing the CFI state based on the target’s default rules and the prologue directives. For each subsequent instruction, UnwindInfoChecker performs the following steps:

  1. Abstract execution: Simulate the instruction’s effect on the current CFI state. This execution determines:
    1. A set of possible subsequent valid CFI state entries for each register and the CFA.
    2. Whether the current CFI state entry for a given register/CFA becomes invalid due to the instruction.
  2. Derive directives-based state: Calculate the program’s intended CFI state by applying the CFI directives associated with the instruction to the current state.
  3. Compare states: Compare the directive-derived CFI state entries with the results of the abstract execution for each register and the CFA.
  4. Advance state: Update the current CFI state to the directive-derived state for processing the next instruction, regardless of validation results (to allow subsequent checks).

The comparison in Step 3 for each CFI state entry (register or CFA) falls into one of the following cases:

  • Match: The directive-derived CFI state entry is present in the set of possible valid entries determined by the execution. Validation succeeds for this entry; no diagnostic is emitted.
  • Invalidated: The directive-derived CFI state entry was explicitly invalidated by the execution. An error is emitted, indicating the discrepancy and suggesting possible valid entries from the execution results.
  • Structurally similar mismatch: The directive-derived CFI state entry is structurally similar (e.g., CFA + N1 vs. CFA + N2, or Reg + M1 vs. Reg + M2) but not identical to an entry in the set of possible valid entries. An error is emitted, highlighting the specific difference (e.g., offset value mismatch) and suggesting the structurally similar valid entry.
  • Uninterpretable/other mismatch: The directive-derived CFI state entry is neither validated (found in the valid set) nor explicitly invalidated by the execution and is not structurally similar to any valid entry. A warning is issued, indicating that the analysis could not interpret or validate the entry based on the execution results, and suggesting the set of possible valid entries.

Abstract execution

The abstract execution step simulates the effect of each machine instruction on the current CFI state. Performed independently for each CFI state entry (register or CFA), the abstract execution determines the set of possible valid subsequent states and whether the current state entry becomes invalid.

For a given CFI state entry (unwinding rule) and instruction, the execution applies the following logic:

  • If the instruction modifies any register that the CFI state entry depends on, the current CFI state entry for that register/CFA becomes invalid.
  • If the instruction does not modify any registers the CFI state entry depends on, the current CFI state entry remains valid and is added to the set of possible valid subsequent states.
  • If the instruction modifies a register that the CFI state entry depends on by a known constant value, the constant change is applied to the CFI state entry, and the resulting entry is added to the set of possible valid subsequent states.
  • If the instruction stores a register into a memory location describable by a base register and an offset, the checker creates a new CFI value by replacing every occurrence of the stored register with the memory location and adds it to the set of possible valid subsequent states.
  • If the instruction loads a memory location describable by a base register and an offset into a register, the checker creates a new CFI value by replacing every occurrence of the memory location with the loaded register and then adds it to the set of possible valid subsequent states.

Implementing abstract execution requires semantic information about each machine instruction, specifically:

  • Which registers are read and written?
  • Does the instruction modify a register by a statically known constant? If so, what is the modification operation?
  • Does the instruction access memory (load or store)? If so, what is the base address calculation (register and offset)? What is the source or target register?

Example

The following is a single instruction subprogram that spills register %r10, with example CFI states before and after the instruction:

...
// +----------+------------+
// |   CFA    |    %r10    |
// +----------+------------+
// | %rsp + 8 | same value |
// +----------+------------+
pushq %r10
.cfi_adjust_cfa_offset 7 // CFA becomes %rsp + 8 + 7 = %rsp + 15
.cfi_offset %r10, -16 // %r10 is at CFA - 16
// +-----------+----------+
// |    CFA    |   %r10   |
// +-----------+----------+
// | %rsp + 15 | CFA - 16 |
// +-----------+----------+
...

The UnwindInfoChecker abstractly executes this instruction upon reaching it during analysis. During execution, it observes that this instruction modifies the %rsp value, so it invalidates the current CFA value. The instruction does not change %r10’s value, so the checker selects .cfi_same_value as a valid CFI value for %r10. The modification to %rsp is a constant change (8), as a result, the checker considers %rsp+16 as a valid CFI value for the CFA. The analysis also identifies the instruction as a simple store to the memory location at %rsp-8 (which is equivalent to CFA-16 in the current CFI state). Therefore, the checker considers the value CFA-16 for the %r10’s CFI state.

Overall the execution results in:

CFA %r10
Is the current CFI value valid? no yes
The set of possible CFI values {%rsp+16} {same_value, mem[CFA-16]}

During the comparison, the UnwindInfoChecker sees the directive-derived value for the CFA has a similar structure to one of the valid values in the CFA’s set, but is not identical. Since it cannot reconcile the difference, it emits an error: Expected CFA offset 16, got 15. The directive-derived value for %r10 matches one of the values in %r10’s set, so the checker does not emit an error regarding %r10.

To demonstrate the checker limitation, let’s change the directive-derived CFI state after the instruction as follows:

...
// +----------+------------+
// |   CFA    |    %r10    |
// +----------+------------+
// | %rsp + 8 | same value |
// +----------+------------+
pushq %r10
.cfi_def_cfa %rbp, 8 // CFA becomes %rbp + 8
// +----------+------------+
// |   CFA    |    %r10    |
// +----------+------------+
// | %rbp + 8 | same value |
// +----------+------------+
...

In this case, the UnwindInfoChecker will warn the user that it doesn’t understand the CFA’s value. This is because the CFA’s value is neither among the possible valid values nor the invalidated value. Although %rbp+8 and %rsp+16 are the same, the checker does not understand that and will suggest %rsp+16 to the user as a valid CFI value for CFA. Regarding %r10, the checker would not emit anything, which means the checker is ok with ignoring the spill. This is because the %r10‘s CFI value remained the same as the previous state, and the checker had not invalidated it.

Limitations

The UnwindInfoChecker’s accuracy depends on the completeness of the semantic information available to the abstract execution step for each instruction. Current limitations include:

  • Limited reasoning about aliasing relationships: The checker does not understand the dynamic relationship between registers, which can introduce memory aliases. However, by defining the relationship between the stack pointer and the frame pointer registers, most memory aliases that occur in CFI directives can be covered.
  • Complex pointer operations: The checker cannot interpret specialized pointer operations, such as the xor used for pointer mangling. To solve this problem for most cases, we can enable CFI values to be represented as complex operations.

Design details

Integration

As described above, we intend to use the UnwindInfoChecker to validate CFI directives in the following scenarios:

  • Assembling hand-written assembly files
  • Compiling programs with inline assembly
  • Possible future work: annotating disassembly

This implies integration points within tools such as clang (for assembly and inline assembly) and llvm-mc (for testing).

In these scenarios, the checker operates on a stream of MCInst (machine instructions) and MCCFIInstruction (CFI directives). Regardless of its final implementation location within the LLVM project, the checker requires mechanisms to: operate sequentially on this stream, extract semantic information from MCInst, and parse MCCFIInstruction to track the CFI state.

Pipeline

The UnwindInfoChecker processes input as a sequence of function units. A function unit is defined by the scope between .cfi_startproc and .cfi_endproc directives, corresponding to a Frame Description Entry (FDE). CFIAnalysisMCStreamer breaks a stream of MCInstructions into these function units.

The analysis is implemented in class CFIAnalysis, which the checker utilizes to analyze each function unit separately. For each unit, it instantiates a new CFIAnalysis instance. The checker initializes this instance with the prologue directives (i.e. all the directives before the first instruction) and then feeds the instructions to the analysis in linear order with the CFI directives associated with the instructions.

Prototype

As a demonstration, we implemented the UnwindInfoChecker inside llvm-mc using BOLT’s MCPlus for semantic information. The MCPlus information we used does not depend on any analysis and works by simply checking opcodes. The checker extracts all the semantic information from MCInst through the ExtendedMCInstrAnalysis class.

Prototype links:

Challenges

Implementing the UnwindInfoChecker presents two primary technical challenges: managing CFI state representation and extracting sufficient semantic information from instructions.

CFI Directive Information

The UnwindInfoChecker must construct and update the CFI state based on MCCFIInstructions. The DebugInfo/DWARF component contains relevant structures like UnwindTable and logic for processing CFI programs. However, two obstacles exist for direct reuse:

  • Structure Conversion: In the MC layer, CFI directives are structured as MCCFIInstruction, while the DebugInfo/DWARF layer uses the CFIProgram::Instruction format; therefore, a conversion between these two representations must be implemented.
  • Layering: DebugInfo/DWARF currently has dependencies on the MC layer. Placing UnwindInfoChecker in the MC layer and making it depend on DebugInfo/DWARF would create a problematic cyclic dependency. @Sterling-Augustine is working on separating parts of DebugInfo/DWARF to address this (PR 140096, PR 139175, PR 139326, RFC). Work is also ongoing to separate UnwindTable specifically (PR 142520, PR 142521).

Instruction Semantics

Unlike CFI directives, the semantic information directly available from core MCInst and related MC helper classes is limited. While we can access operands and determine basic properties like register reads/writes, the detailed effect of an instruction is often not readily available.

UnwindInfoChecker’s abstract execution requires more semantic information. For example, the checker has to know that on x86_64 targets a pushq instruction decreases %rsp’s value by 8, and stores the argument register in the memory location %rsp-8. But with information available in MCInst, it only knows that it’s a store and it modifies %rsp.

Open questions

  • Where should the UnwindInfoChecker reside within the LLVM project? We are considering two possible places for this library:

    • Option 1: Implement within the MC layer.
      • Benefits: Using the MC layer automatically provides the checker access for all tools and libraries. This also prevents indirect dependency problems.
      • Drawbacks: It is harder to use already existing features in other parts of the LLVM because most of the other parts are dependent on MC.
    • Option 2: Implement as a separate library.
      • Benefits: In this case, UnwindInfoChecker can have an easier time depending on other libraries like DebugInfo/DWARF and the implementation is easier.
      • Drawbacks: Any tool, such as clang or llvm-mc, and any library that wants to use the checker must depend on this new library, which may introduce new dependency problems.
  • What is the best approach for extracting and representing the CFI state from MCCFIInstruction? We have two possible approaches in mind:

    • Option 1: Convert MCCFIInstruction to CFIProgram::Instruction and leverage DebugInfo/DWARF’s UnwindTable logic.
      • Benefits: Avoids duplicating logic for processing CFI directives and maintaining state.
      • Drawbacks: Requires implementing a robust conversion layer. Heavily relies on the ongoing separation work in DebugInfo/DWARF to resolve dependency issues and potentially adapt UnwindTable’s interface for analytical use.
    • Option 2: Maintain a parallel representation (like the prototype’s CFIState) and re-implement the necessary state-tracking logic.
      • Benefits: Independent of the DebugInfo/DWARF internal representation and structure, allowing more control over the data needed for validation. Can be tailored specifically for the checker’s needs.
      • Drawbacks: Significant duplication of logic already present in DebugInfo/DWARF. Requires ongoing maintenance of the parallel implementation.
  • How can the required semantic information be extracted from MCInst? Instruction semantics is the analysis’s bottleneck.

    • Option 1: Extend MCInstrAnalysis with functionalities currently in BOLT’s MCPlus.
      • Benefits: MCPlus already provides much of the needed semantic information and has a compatible design. Integrating it into the core MC layer (MCInstrAnalysis) makes it widely available.
      • Drawbacks: MCPlus’s current implementation is heavily focused on X86 and built with assumptions about compiler-generated code; extending it for general use across targets and potentially hand-written code might require significant effort. Requires refactoring in BOLT.
    • Option 2: Adapt LLDB’s instruction emulator or inspection engines.
      • Benefits: LLDB’s components already contain detailed semantic information about instructions across various targets. This also enables using the already implemented analysis in LLDB instead of re-implementing it.
      • Drawbacks: LLDB components are designed to operate on binary files and are not integrated with the MC layer. Enabling them to export semantic information via an MCInst-based interface would require deep structural changes and potentially contradict their design assumptions.
    • Option 3: Develop a new abstract instruction representation.
      • Benefits: Provides a clean, target-independent way to expose instruction semantics needed for analysis (e.g., pushq %reg to mem[%rsp - 8] <- %reg; %rsp <- %rsp - 8). This could enable a wide range of future MC-level analyses.
      • Drawbacks: Designing a representation broad enough for all instructions and targets is a complex, research-intensive task. Implementation would require adding a significant amount of code (possibly generated by TableGen) to cover all instructions.

Future work

CFI directive generation

We’ve discussed evolving the UnwindInfoChecker into a CFI generator for assembly code. By improving the checker to propose valid CFI state changes (derived from abstract execution) when errors are detected, the tool could assist or even automate the generation of directives. Once prologue directives establish the initial state, the validator-turned-generator could synthesize subsequent directives. This would significantly ease the burden of writing CFI for hand-written assembly, particularly for complex code or non-standard environments like OS kernels, and could aid in generating debug information for binaries lacking it.

Object file CFI validation and generation

Another further step is to explore validating the CFI in object files to ensure they don’t break the debugger’s unwinding process. This validation, when combined with CFI generation, could also allow for adding this information to object files that lack it, improving their debuggability.

Prior Art

CFI generation

Generating CFI is not a new problem in the ecosystem. Existing features in LLDB and binutils provide functionalities for CFI generation.

CFI Generation in LLDB

LLDB includes functionality to generate unwinding information when CFI is absent. This feature operates on binary files and infers CFI by emulating execution, but does not require actual program execution. Its design is tightly coupled to operating on binaries and its assumption of on-the-fly assembling/disassembling, making integration with the MC layer or use with raw assembly instruction streams challenging. Furthermore, it is focused on generating CFI where none exists, rather than validating existing CFI directives, which is crucial for complex hand-written assembly that might deviate from standard compiler patterns (e.g., non-standard ABIs).

CFI Generation in Binutils

Binutils provide SCFI (Stack CFI), a feature in gas capable of generating CFI directives for assembly input. SCFI has known limitations, such as often assuming the System V AMD64 ABI and that the CFA is always relative to SP or FP. Like LLDB’s feature, its primary function is generation, not validation of potentially erroneous hand-written CFI.

7 Likes

This sounds great! Hand-writing CFI directives is easy to get wrong, so a check that I’ve done it right would be welcome.

One word of warning: I know of at least one case where it’s not possible to write correct CFI information for hand-written assembly. This comes up in 32-bit Arm, due to conditionalization. I might write something like this:

cmp   this, that
popne {some registers}
bne   label_which_will_expect_those_regs_to_have_been_popped

So if the compare instruction says ‘not equal’, then we pop some registers and then branch to a label elsewhere, and between the popne and the bne, those registers aren’t on the stack any more. But if the compare result is ‘equal’, then neither instruction is executed, so between the popne and the bne those registers are still on the stack.

There’s no way in DWARF to express this situation, in which the current state of the stack frame depends on the PSR flags. So it’s impossible to write correct CFI directives for it! The best you can hope for is to write CFI directives that make sense for one of the two cases – probably the one where the branch isn’t taken and control flow continues past the bne.

Thanks for pointing it out.

For now, I don’t have any solution in mind to deal with control flows. As you said the current proposal considers branches does not work and validates the instruction in the program order.

It sounds like your proposal is for asynchronous unwind tables, as it suggests reporting all discrepancies between CFI directives and machine instructions. Is that correct? Are you considering supporting synchronous tables as well?

Interesting proposal and project. Yes, lldb has a similar feature for when CFI is missing for a function, or CFI may not be asynchronous and the function is currently-executing so lldb cannot rely on it.

(“synchronous unwind” CFI are unwind directives only guaranteed at locations that can throw, or that call another function which may throw an exception. “asynchronous unwind” instructions are valid at every instruction in a function, including the prologue and epilogue. Debuggers need async unwind for accurate unwinding from a currently-executing function – once we’ve moved up the stack, we’re at a throwable location with most ABIs and can rely on eh_frame etc CFI. DWARF CFI does not indicate whether the directives are synchronous or asynchronous. Some unwind formats like the ARM index, or the Darwin compact unwind format, are limited to synchronous unwind exclusively, and are very compact and fast to index/parse because of it.)

In lldb, we have two components involved with creating CFI from instructions. In lldb/source/Plugins/Instruction there are target specific plugins that decode instructions manually, only the instructions relevant to unwinding. These plugins generate an intermediate representation of the side effects, and pass that to UnwindAssemblyInstEmulation in lldb/source/Plugins/UnwindAssembly/InstEmulation which creates the “UnwindPlan” description of the unwind state at all relevant locations in the function. In this scenario, the EmulateInstruction target specific plugins are doing what you’re describing MCCFIInstruction layer could contain (we do it in a very lldb specific behavior, but it’s doing a very similar task. And our instruction decoding is not above reproach, with varying quality/completeness for different targets.)

(We also use InstEmulation for other things in lldb, like disambiguating a watchpoint hit on MIPS where the low 3 bits are not reported, so we emulate the load/store instruction to understand what address was accessed. Or on RISCV the InstEmulation is currently only used to do instruction stepping on cores where there is no such capability, to determine where the current instruction may branch to next, if I remember correctly. We don’t have CFI generation from our RISCV instruction emulation yet.)

I’m not completely clear on whether it will be a problem for this checker if DWARF CFI is not exhaustive (asynchronous). For exception handling (eh_frame), unwind state is only required at throwable locations, so it’s fine to omit it in prologues and epilogues altogether; debug_frame can end up being the same content. This breaks debuggers like gdb that rely exclusively on DWARF CFI for their unwinds, but there’s nothing required about it in the format.

Even with mostly-asynchronous unwind from the compiler, you can see sequences where the prologue and epilogue are described, but on a target like i386, pc-relative addressing is usually done mid-function with a sequence like

   call Lnext
Lnext:
   pop $eax    ;;; at this point $esp has been modified for this one insn

it’s a common shortcoming that CFI won’t describe the $esp change at the POP instruction.

clang’s AArch64 codegen often implements early returns by a mid-function epilogue. There is a test & branch forward in the function, then the usual epilogue instructions ending in a ret or what have you. The next instruction is the target of that branch from earlier, and a new basic block where the unwind state hasn’t been modified at all yet. lldb does a one-pass forward-only code flow graph where it “forwards” the unwind state to any branches further into the function. So as we emulate the instructions of the mid-function epilogue we’re unwinding the CFI state, and then we get past the ret and the old unwind state reappears.

A recent issue Felipe de Azevedo Piovezan found with this scheme is that we see the swift compiler generating sequences that branch forward to the end of the function, then back again into the middle, just after a mid-function epilogue. We don’t forward unwind state backwards, so we didn’t need to worry about getting stuck in a CFG loop as we continuously reevalute unwind state, so lldb doesn’t currently handle this one correctly. Tricky stuff!

Anyway, interesting project and a great idea for helping flag bugs in hand-written CFI. Just wanted to throw in my two cents from the lldb side where we’ve done similar things for a while, and rely on it a lot.

3 Likes

One other small aside is that there are some special functions that are not called in a normal ABI way. On Unix userland, when an asynchronous signal is received (e.g. SIGHUP), the kernel will save the entire register context to stack and call a special function (_sigtramp on Darwin) which should have hand-wrtten .cfi directives that describe the location of all the registers in the saved register context for the debugger. This _sigtramp function was not called in a normal ABI way, it does not store the registers to stack itself, and its “caller” (the return address once it completes) may not be in the normal ABI location like $lr on AArch64 or $rsp on intel.

In a more unusual environment, firmware/bareboard programs will often use exception/interrupts as part of their programming, for timers and multi threading, and the function that receives the interrupt/exception is also not called in a normal ABI way. The address of the code that faulted/trapped/system called will not be in the normal ABI call location; it needs to retrieve it in a special way and the hand-written .cfi directive about this will not make sense by instruction inspection & assumption that this was an ABI called function.

(I imagine this CFI checker would have a decoration that would be added to these functions indicating that they’re not to be checked, not that big of a deal. just mentioning the issue.)

1 Like

For now, we are focusing only on the asynchronous unwind tables, as our primary focus is on hand-written assembly. However, supporting synchronous can be achieved by adding an ignoring mechanism, for which we currently have no specification.

In these cases, can’t the CFI checker be informed of the situation by a couple of CFI directives in the prologue?
The proposed checker in this RFC does not depend on any ABI conventions and makes all its assumptions based on the prologue directives. If there can be a set of directives that sets up the starting point clearly enough for the checker, I guess the checker would be able to carry on the analysis by looking into instruction execution.

I implemented a minimal version of the DWARF CFI checker in this PR#145633.

I would appreciate any feedback on the PR, even as small as styling. It’s my first non-NFC PR.

Thanks.

I pushed the prototype as a draft.

In the PR message, it is explained what the prototype is capable of and how the current DWARFCFIChecker can be improved.