Introduction
Debuggers, profilers, and language runtimes need a way to unwind the stack to determine register values within each call frame. In DWARF-based platforms, the assembler encodes the unwinding information as DWARF Call Frame Information (CFI) within the object file, using CFI directives that are automatically generated by the compiler for high-level languages. However, in a handwritten assembly, you must write these directives manually.
Incorrect CFI directives can disrupt some languagesâ runtime exception handling (e.g. C++), and result in malformed frames that cause debuggers to display incorrect register values. Common mistakes, such as forgetting to inform the unwinder about stack movement or using an incorrect sign for an offset, are easy to miss. Distinguishing between incorrect values caused by these mistakes and those from genuine bugs is challenging and time-consuming.
To identify these issues, we are developing a CFI directive checker. This tool would be useful in the following scenarios:
- Validating CFI directives in hand-written assembly code.
- Checking the compatibility of hand-written CFI directives with the compiler-generated CFI of surrounding code.
- Annotating disassembly.
Background
Call Frame Information (CFI), as detailed in the DWARF standard (section 6.4.1), is organized into tables, with a separate table for each function. Within each table, rows correspond to the programâs instructions and columns represent the machine registers and the Canonical Frame Address (CFA). Each entry in this table provides a rule that instructs the unwinder on how to determine the callerâs value for a given register (or the CFA value) using the current register values (or relative to the current CFA).
The CFI directives inform the unwinder about the difference between each row and its previous row in the table. We refer to each row of the table as the CFI state for that line of the program. We call each entry of the table a CFI value or an unwinding information.
The CFI directives are grouped into Frame Description Entries (FDE) that normally are the same as function regions. Prologue directives are the directives from the beginning of an FDE until reaching the first instruction.
Proposal
We propose adding UnwindInfoChecker to the MC layer, a static analysis that validates CFI directives by comparing them against the semantic effects of their associated machine instructions.
Overview
UnwindInfoChecker analyzes and validates CFI directives within each function unit (delimited by .cfi_startproc
and .cfi_endproc
). It processes machine instructions and associated CFI directives linearly by program order. The analysis for a function unit begins by initializing the CFI state based on the targetâs default rules and the prologue directives. For each subsequent instruction, UnwindInfoChecker performs the following steps:
- Abstract execution: Simulate the instructionâs effect on the current CFI state. This execution determines:
- A set of possible subsequent valid CFI state entries for each register and the CFA.
- Whether the current CFI state entry for a given register/CFA becomes invalid due to the instruction.
- Derive directives-based state: Calculate the programâs intended CFI state by applying the CFI directives associated with the instruction to the current state.
- Compare states: Compare the directive-derived CFI state entries with the results of the abstract execution for each register and the CFA.
- Advance state: Update the current CFI state to the directive-derived state for processing the next instruction, regardless of validation results (to allow subsequent checks).
The comparison in Step 3 for each CFI state entry (register or CFA) falls into one of the following cases:
- Match: The directive-derived CFI state entry is present in the set of possible valid entries determined by the execution. Validation succeeds for this entry; no diagnostic is emitted.
- Invalidated: The directive-derived CFI state entry was explicitly invalidated by the execution. An error is emitted, indicating the discrepancy and suggesting possible valid entries from the execution results.
- Structurally similar mismatch: The directive-derived CFI state entry is structurally similar (e.g.,
CFA + N1
vs.CFA + N2
, orReg + M1
vs.Reg + M2
) but not identical to an entry in the set of possible valid entries. An error is emitted, highlighting the specific difference (e.g., offset value mismatch) and suggesting the structurally similar valid entry. - Uninterpretable/other mismatch: The directive-derived CFI state entry is neither validated (found in the valid set) nor explicitly invalidated by the execution and is not structurally similar to any valid entry. A warning is issued, indicating that the analysis could not interpret or validate the entry based on the execution results, and suggesting the set of possible valid entries.
Abstract execution
The abstract execution step simulates the effect of each machine instruction on the current CFI state. Performed independently for each CFI state entry (register or CFA), the abstract execution determines the set of possible valid subsequent states and whether the current state entry becomes invalid.
For a given CFI state entry (unwinding rule) and instruction, the execution applies the following logic:
- If the instruction modifies any register that the CFI state entry depends on, the current CFI state entry for that register/CFA becomes invalid.
- If the instruction does not modify any registers the CFI state entry depends on, the current CFI state entry remains valid and is added to the set of possible valid subsequent states.
- If the instruction modifies a register that the CFI state entry depends on by a known constant value, the constant change is applied to the CFI state entry, and the resulting entry is added to the set of possible valid subsequent states.
- If the instruction stores a register into a memory location describable by a base register and an offset, the checker creates a new CFI value by replacing every occurrence of the stored register with the memory location and adds it to the set of possible valid subsequent states.
- If the instruction loads a memory location describable by a base register and an offset into a register, the checker creates a new CFI value by replacing every occurrence of the memory location with the loaded register and then adds it to the set of possible valid subsequent states.
Implementing abstract execution requires semantic information about each machine instruction, specifically:
- Which registers are read and written?
- Does the instruction modify a register by a statically known constant? If so, what is the modification operation?
- Does the instruction access memory (load or store)? If so, what is the base address calculation (register and offset)? What is the source or target register?
Example
The following is a single instruction subprogram that spills register %r10
, with example CFI states before and after the instruction:
...
// +----------+------------+
// | CFA | %r10 |
// +----------+------------+
// | %rsp + 8 | same value |
// +----------+------------+
pushq %r10
.cfi_adjust_cfa_offset 7 // CFA becomes %rsp + 8 + 7 = %rsp + 15
.cfi_offset %r10, -16 // %r10 is at CFA - 16
// +-----------+----------+
// | CFA | %r10 |
// +-----------+----------+
// | %rsp + 15 | CFA - 16 |
// +-----------+----------+
...
The UnwindInfoChecker abstractly executes this instruction upon reaching it during analysis. During execution, it observes that this instruction modifies the %rsp
value, so it invalidates the current CFA value. The instruction does not change %r10
âs value, so the checker selects .cfi_same_value
as a valid CFI value for %r10
. The modification to %rsp
is a constant change (8), as a result, the checker considers %rsp+16
as a valid CFI value for the CFA. The analysis also identifies the instruction as a simple store to the memory location at %rsp-8
(which is equivalent to CFA-16
in the current CFI state). Therefore, the checker considers the value CFA-16
for the %r10
âs CFI state.
Overall the execution results in:
CFA | %r10 | |
---|---|---|
Is the current CFI value valid? | no | yes |
The set of possible CFI values | {%rsp+16} |
{same_value, mem[CFA-16]} |
During the comparison, the UnwindInfoChecker sees the directive-derived value for the CFA
has a similar structure to one of the valid values in the CFAâs set, but is not identical. Since it cannot reconcile the difference, it emits an error: Expected CFA offset 16, got 15
. The directive-derived value for %r10
matches one of the values in %r10
âs set, so the checker does not emit an error regarding %r10
.
To demonstrate the checker limitation, letâs change the directive-derived CFI state after the instruction as follows:
...
// +----------+------------+
// | CFA | %r10 |
// +----------+------------+
// | %rsp + 8 | same value |
// +----------+------------+
pushq %r10
.cfi_def_cfa %rbp, 8 // CFA becomes %rbp + 8
// +----------+------------+
// | CFA | %r10 |
// +----------+------------+
// | %rbp + 8 | same value |
// +----------+------------+
...
In this case, the UnwindInfoChecker will warn the user that it doesnât understand the CFAâs value. This is because the CFAâs value is neither among the possible valid values nor the invalidated value. Although %rbp+8
and %rsp+16
are the same, the checker does not understand that and will suggest %rsp+16
to the user as a valid CFI value for CFA. Regarding %r10
, the checker would not emit anything, which means the checker is ok with ignoring the spill. This is because the %r10
âs CFI value remained the same as the previous state, and the checker had not invalidated it.
Limitations
The UnwindInfoCheckerâs accuracy depends on the completeness of the semantic information available to the abstract execution step for each instruction. Current limitations include:
- Limited reasoning about aliasing relationships: The checker does not understand the dynamic relationship between registers, which can introduce memory aliases. However, by defining the relationship between the stack pointer and the frame pointer registers, most memory aliases that occur in CFI directives can be covered.
- Complex pointer operations: The checker cannot interpret specialized pointer operations, such as the xor used for pointer mangling. To solve this problem for most cases, we can enable CFI values to be represented as complex operations.
Design details
Integration
As described above, we intend to use the UnwindInfoChecker to validate CFI directives in the following scenarios:
- Assembling hand-written assembly files
- Compiling programs with inline assembly
- Possible future work: annotating disassembly
This implies integration points within tools such as clang
(for assembly and inline assembly) and llvm-mc
(for testing).
In these scenarios, the checker operates on a stream of MCInst
(machine instructions) and MCCFIInstruction
(CFI directives). Regardless of its final implementation location within the LLVM project, the checker requires mechanisms to: operate sequentially on this stream, extract semantic information from MCInst
, and parse MCCFIInstruction
to track the CFI state.
Pipeline
The UnwindInfoChecker processes input as a sequence of function units. A function unit is defined by the scope between .cfi_startproc
and .cfi_endproc
directives, corresponding to a Frame Description Entry (FDE). CFIAnalysisMCStreamer
breaks a stream of MCInstructions
into these function units.
The analysis is implemented in class CFIAnalysis
, which the checker utilizes to analyze each function unit separately. For each unit, it instantiates a new CFIAnalysis
instance. The checker initializes this instance with the prologue directives (i.e. all the directives before the first instruction) and then feeds the instructions to the analysis in linear order with the CFI directives associated with the instructions.
Prototype
As a demonstration, we implemented the UnwindInfoChecker inside llvm-mc
using BOLTâs MCPlus
for semantic information. The MCPlus
information we used does not depend on any analysis and works by simply checking opcodes. The checker extracts all the semantic information from MCInst
through the ExtendedMCInstrAnalysis class.
Prototype links:
- Prototype code
- Tests (e.g., tracking CFA: single-func-cfa-mistake.s; tracking spilled registers: spill-two-reg-reversed.s).
Challenges
Implementing the UnwindInfoChecker presents two primary technical challenges: managing CFI state representation and extracting sufficient semantic information from instructions.
CFI Directive Information
The UnwindInfoChecker must construct and update the CFI state based on MCCFIInstruction
s. The DebugInfo/DWARF
component contains relevant structures like UnwindTable
and logic for processing CFI programs. However, two obstacles exist for direct reuse:
- Structure Conversion: In the MC layer, CFI directives are structured as
MCCFIInstruction
, while theDebugInfo/DWARF
layer uses theCFIProgram::Instruction
format; therefore, a conversion between these two representations must be implemented. - Layering:
DebugInfo/DWARF
currently has dependencies on the MC layer. Placing UnwindInfoChecker in the MC layer and making it depend onDebugInfo/DWARF
would create a problematic cyclic dependency. @Sterling-Augustine is working on separating parts ofDebugInfo/DWARF
to address this (PR 140096, PR 139175, PR 139326, RFC). Work is also ongoing to separateUnwindTable
specifically (PR 142520, PR 142521).
Instruction Semantics
Unlike CFI directives, the semantic information directly available from core MCInst
and related MC
helper classes is limited. While we can access operands and determine basic properties like register reads/writes, the detailed effect of an instruction is often not readily available.
UnwindInfoCheckerâs abstract execution requires more semantic information. For example, the checker has to know that on x86_64 targets a pushq
instruction decreases %rsp
âs value by 8
, and stores the argument register in the memory location %rsp-8
. But with information available in MCInst
, it only knows that itâs a store and it modifies %rsp
.
Open questions
-
Where should the UnwindInfoChecker reside within the LLVM project? We are considering two possible places for this library:
- Option 1: Implement within the MC layer.
- Benefits: Using the MC layer automatically provides the checker access for all tools and libraries. This also prevents indirect dependency problems.
- Drawbacks: It is harder to use already existing features in other parts of the LLVM because most of the other parts are dependent on MC.
- Option 2: Implement as a separate library.
- Benefits: In this case, UnwindInfoChecker can have an easier time depending on other libraries like
DebugInfo/DWARF
and the implementation is easier. - Drawbacks: Any tool, such as
clang
orllvm-mc
, and any library that wants to use the checker must depend on this new library, which may introduce new dependency problems.
- Benefits: In this case, UnwindInfoChecker can have an easier time depending on other libraries like
- Option 1: Implement within the MC layer.
-
What is the best approach for extracting and representing the CFI state from
MCCFIInstruction
? We have two possible approaches in mind:- Option 1: Convert
MCCFIInstruction
toCFIProgram::Instruction
and leverageDebugInfo/DWARF
âsUnwindTable
logic.- Benefits: Avoids duplicating logic for processing CFI directives and maintaining state.
- Drawbacks: Requires implementing a robust conversion layer. Heavily relies on the ongoing separation work in
DebugInfo/DWARF
to resolve dependency issues and potentially adaptUnwindTable
âs interface for analytical use.
- Option 2: Maintain a parallel representation (like the prototypeâs
CFIState
) and re-implement the necessary state-tracking logic.- Benefits: Independent of the
DebugInfo/DWARF
internal representation and structure, allowing more control over the data needed for validation. Can be tailored specifically for the checkerâs needs. - Drawbacks: Significant duplication of logic already present in
DebugInfo/DWARF
. Requires ongoing maintenance of the parallel implementation.
- Benefits: Independent of the
- Option 1: Convert
-
How can the required semantic information be extracted from
MCInst
? Instruction semantics is the analysisâs bottleneck.- Option 1: Extend
MCInstrAnalysis
with functionalities currently in BOLTâsMCPlus
.- Benefits:
MCPlus
already provides much of the needed semantic information and has a compatible design. Integrating it into the core MC layer (MCInstrAnalysis
) makes it widely available. - Drawbacks:
MCPlus
âs current implementation is heavily focused on X86 and built with assumptions about compiler-generated code; extending it for general use across targets and potentially hand-written code might require significant effort. Requires refactoring in BOLT.
- Benefits:
- Option 2: Adapt LLDBâs instruction emulator or inspection engines.
- Benefits: LLDBâs components already contain detailed semantic information about instructions across various targets. This also enables using the already implemented analysis in LLDB instead of re-implementing it.
- Drawbacks: LLDB components are designed to operate on binary files and are not integrated with the MC layer. Enabling them to export semantic information via an
MCInst
-based interface would require deep structural changes and potentially contradict their design assumptions.
- Option 3: Develop a new abstract instruction representation.
- Benefits: Provides a clean, target-independent way to expose instruction semantics needed for analysis (e.g.,
pushq %reg
tomem[%rsp - 8] <- %reg; %rsp <- %rsp - 8
). This could enable a wide range of future MC-level analyses. - Drawbacks: Designing a representation broad enough for all instructions and targets is a complex, research-intensive task. Implementation would require adding a significant amount of code (possibly generated by TableGen) to cover all instructions.
- Benefits: Provides a clean, target-independent way to expose instruction semantics needed for analysis (e.g.,
- Option 1: Extend
Future work
CFI directive generation
Weâve discussed evolving the UnwindInfoChecker into a CFI generator for assembly code. By improving the checker to propose valid CFI state changes (derived from abstract execution) when errors are detected, the tool could assist or even automate the generation of directives. Once prologue directives establish the initial state, the validator-turned-generator could synthesize subsequent directives. This would significantly ease the burden of writing CFI for hand-written assembly, particularly for complex code or non-standard environments like OS kernels, and could aid in generating debug information for binaries lacking it.
Object file CFI validation and generation
Another further step is to explore validating the CFI in object files to ensure they donât break the debuggerâs unwinding process. This validation, when combined with CFI generation, could also allow for adding this information to object files that lack it, improving their debuggability.
Prior Art
CFI generation
Generating CFI is not a new problem in the ecosystem. Existing features in LLDB and binutils provide functionalities for CFI generation.
CFI Generation in LLDB
LLDB includes functionality to generate unwinding information when CFI is absent. This feature operates on binary files and infers CFI by emulating execution, but does not require actual program execution. Its design is tightly coupled to operating on binaries and its assumption of on-the-fly assembling/disassembling, making integration with the MC layer or use with raw assembly instruction streams challenging. Furthermore, it is focused on generating CFI where none exists, rather than validating existing CFI directives, which is crucial for complex hand-written assembly that might deviate from standard compiler patterns (e.g., non-standard ABIs).
CFI Generation in Binutils
Binutils provide SCFI (Stack CFI), a feature in gas
capable of generating CFI directives for assembly input. SCFI has known limitations, such as often assuming the System V AMD64 ABI and that the CFA is always relative to SP or FP. Like LLDBâs feature, its primary function is generation, not validation of potentially erroneous hand-written CFI.