Summary
I propose two new SanitizerCoverage instrumentation modes:
-fsanitize-coverage=trace-args # capture function arguments at entry
-fsanitize-coverage=trace-ret # capture return values at exit
These emit callbacks that provide the runtime with function argument
values, return values, and automatic struct field expansion via
compile-time DICompositeType metadata.
The Problem: The Semantic Observability Gap in SanitizerCoverage
SanitizerCoverage currently provides control-flow observability:
| Mode | What it captures |
|---|---|
trace-pc |
Which edges executed |
trace-cmp |
Comparison operands |
trace-pc-guard |
Edge with guard variable |
trace-pc-entry-exit |
Function enter/exit events |
None expose values flowing through function boundaries. Two calls
traversing identical edges with different argument values are
indistinguishable:
process_request(buf, 16); // safe
process_request(buf, 4096); // triggers overflow
Coverage-guided fuzzers (LibFuzzer, syzkaller) plateau on stateful
programs where security-relevant behavior depends on argument values,
not control-flow topology. A composite feedback signal
(PC, arg_hash) provides finer-grained mutation guidance.
Why Existing LLVM/Clang Flags Cannot Solve This
Category A: SanitizerCoverage (trace-pc, trace-cmp)
Semantically blind to composite values. trace-cmp captures individual
comparison operands but not the full function parameter state.
trace-pc-entry-exit reports which functions executed but not with what
values.
Category B: XRay (-fxray-instrument)
Designed for latency profiling. Inserts NOP sleds at function
boundaries to record timestamps. The XRay runtime and compiler frontend
are not designed to parse DICompositeType metadata or spill struct
fields into callback buffers. Repurposing XRay for data extraction
would corrupt its ultra-low-latency design goal.
Category C: -finstrument-functions
Emits __cyg_profile_func_enter(this_fn, call_site) with no argument
values. Only provides the function address and caller address.
Category D: DataFlowSanitizer (-fsanitize=dataflow)
Byte-level taint tracking on every load/store/ALU operation. 10-100x
overhead. Unsuitable for continuous fuzzing or production use. We only
need boundary snapshots, not full dataflow tracking.
Category E: -fpatchable-function-entry
Provides NOPs for runtime patching (ftrace, kprobes, eBPF). These
runtime tools depend on DWARF to parse arguments. Under aggressive
optimization (-O2), DWARF variable locations are elided, so runtime
tools cannot extract values. Our pass operates at the LLVM IR level
where arguments are explicit SSA values with type metadata intact.
The Gap
| trace-pc | XRay | -finstrument-functions | DFSan | trace-args/ret | |
|---|---|---|---|---|---|
| Argument values | No | No | No | Taint only | Yes |
| Struct field expansion | No | No | No | No | Automatic |
| Works under -O2 | N/A | N/A | N/A | Yes | Yes |
| Language-agnostic (Rust/C) | Yes | No (needs mfentry) | Partial | No | Yes |
Proposed Design
Callbacks
trace-args (one call per parameter at function entry):
void __sanitizer_cov_trace_args(
u64 pc, // function address
u32 arg_idx, // parameter index
u32 arg_size, // sizeof(arg) in bytes
void *arg_ptr, // pointer to argument value
u64 *offsets, // compile-time struct field layout (NULL for scalars)
u32 num_fields // number of struct fields (0 for scalars)
);
trace-ret (before each ReturnInst):
void __sanitizer_cov_trace_ret(
u64 pc,
u32 ret_size,
void *ret_ptr,
u64 *offsets,
u32 num_fields
);
Both compose with existing modes (e.g., trace-pc,trace-args).
Struct field expansion
When a parameter resolves (via stripDITypedefs()) to DICompositeType:
- Walk member list: extract
(byte_offset, byte_size)pairs - Compute FNV-1a hash of type name (stored at
offsets[0]) - Emit module-level
ConstantArray:[hash, off_0, sz_0, ..., off_n, sz_n] - Pass
&offsets[1]andnum_fieldsto callback
For scalars: offsets = NULL, num_fields = 0.
Debug info and codegen semantics
This pass strictly obeys LLVM’s rule that -g must not alter code
generation semantics. If DISubprogram is absent, the function is
skipped. Debug info controls what metadata is available to the
runtime, not whether instrumentation occurs.
Key architectural property: The inserted callbacks are explicit IR
CallInst instructions with side effects. They survive all optimization
passes without requiring metadata propagation through SelectionDAG,
FastISel, or GlobalISel. Unlike !pcsections metadata (which requires
ReplaceAllUsesWith handlers and DAGUpdateListener propagation), our
approach needs no backend changes.
Edge cases
- No
DISubprogram: function skipped - Variadic: skipped
naked: skippedvoidreturn: notrace_ret- Multiple returns:
EscapeEnumeratorfinds all exit points
Implementation
Purpose: What This Coverage Mode Is For
trace-args and trace-ret exist to provide runtime value observability
at function boundaries. The intended consumers are:
- Coverage-guided fuzzers that need finer-grained feedback than edge
coverage alone. A composite signal(PC, arg_hash)distinguishes
executions of the same path with different argument values, breaking
coverage saturation on stateful programs. - Runtime contract verifiers that check captured
(args, ret)tuples
against pre/postcondition predicates (e.g., “size ≤ buffer capacity”)
to detect value-level violations that produce no crash or sanitizer
report. - Struct member verification at runtime: when a function receives a
struct pointer, the expanded field values reveal whether the struct’s
internal state matches what the function’s logic requires—conditions
that depend on non-deterministic runtime state (heap pressure, RCU
epoch, concurrent modification) and cannot be validated at compile time.
Comparison with Existing LLVM Flags
| Flag | What it captures | Argument values? | Struct fields? | Use case |
|---|---|---|---|---|
-fsanitize-coverage=trace-pc |
Edge PCs | No | No | Edge coverage for fuzzers |
-fsanitize-coverage=trace-cmp |
CMP operands | Partial (cmp only) | No | Magic-byte discovery |
-fsanitize-coverage=trace-pc-guard |
Edge + guard | No | No | Custom coverage callbacks |
-fxray-instrument |
Entry/exit timestamps | No | No | Latency profiling |
-finstrument-functions |
Entry/exit addresses | No | No | Function-level profiling |
-fsanitize=dataflow |
Byte-level taint | Taint labels | No | Full dataflow (10-100×) |
-fpatchable-function-entry |
NOP sleds | No (runtime-dependent) | No | Dynamic patching (ftrace/eBPF) |
-fsanitize-coverage=trace-args |
Arg values + struct layout | Yes | Yes (automatic) | Value-aware fuzzing + Function contract verification |
-fsanitize-coverage=trace-ret |
Return values + struct layout | Yes | Yes (automatic) | Postcondition checking |
The key architectural difference: existing modes either capture
control-flow topology (trace-pc), individual comparison operands
(trace-cmp), or timing (XRay). None capture the complete function
parameter state including composite type decomposition. trace-args/ret
fills this gap with <10% per-module overhead.
Files Changed
~270 LOC in SanitizerCoverage.cpp, ~170 LOC tests/docs.
| File | Change |
|---|---|
SanitizerCoverage.cpp |
InjectTraceForArgs(), InjectTraceForRet() |
CodeGenOptions.def |
Two CODEGENOPT flags |
SanitizerArgs.cpp |
Parse "trace-args" / "trace-ret" (enum 1<<20, 1<<21) |
BackendUtil.cpp |
Wire to SanitizerCoverageOptions |
Instrumentation.h |
TraceArgs / TraceRet booleans |
Tests:
llvm/test/Instrumentation/SanitizerCoverage/trace-args.llllvm/test/Instrumentation/SanitizerCoverage/trace-ret.llclang/test/CodeGen/sanitizer-coverage-trace-args-ret.cclang/test/Driver/fsanitize-coverage.c
Impact on LLVM
- Extends
SanitizerCoverage.cpponly (~270 LOC) - No changes to: optimizer, codegen, linker, other sanitizers
- No new metadata kinds or propagation requirements
- Fully backward compatible: existing
trace-pc/trace-cmpunchanged - Works with any LLVM-based frontend (clang, rustc via IR pipeline)
Performance
Per-callback: ~27 ns (dominated by memory spill + indirect call).
With whole-program instrumentation: +133%.
Per-module opt-in: +8.3% on instrumented paths; zero on uninstrumented.
Current status
- 4 commits: core pass, clang tests, LLVM IR tests, documentation
- Deployed with a Linux kernel runtime consumer (KCOV backend)
- Tested with both C and Rust-generated LLVM IR
- Kernel patch series under review (demonstrates real-world viability)
Open questions
- Scalar optimization: Should primitive types use size-specific
overloads (like__sanitizer_cov_trace_cmp{1,2,4,8}) instead of
alloca spill tovoid *? - Offsets array format: Implementation detail between compiler and
runtime, or stable ABI? - Field count limit: Cap to bound per-call cost for large aggregates?
- LTO: Cross-module inlining and offset global visibility?
Feedback welcome.