[RFC] SanitizerCoverage: Add `-fsanitize-coverage=trace-args,trace-ret`

ysk · June 9, 2026, 1:08pm

Summary

I propose two new SanitizerCoverage instrumentation modes:

-fsanitize-coverage=trace-args    # capture function arguments at entry
-fsanitize-coverage=trace-ret     # capture return values at exit

These emit callbacks that provide the runtime with function argument
values, return values, and automatic struct field expansion via
compile-time DICompositeType metadata.

PR: [sancov] Add -fsanitize-coverage=trace-args,trace-ret by yskzalloc · Pull Request #201410 · llvm/llvm-project · GitHub

The Problem: The Semantic Observability Gap in SanitizerCoverage

SanitizerCoverage currently provides control-flow observability:

Mode	What it captures
`trace-pc`	Which edges executed
`trace-cmp`	Comparison operands
`trace-pc-guard`	Edge with guard variable
`trace-pc-entry-exit`	Function enter/exit events

None expose values flowing through function boundaries. Two calls
traversing identical edges with different argument values are
indistinguishable:

process_request(buf, 16);    // safe
process_request(buf, 4096);  // triggers overflow

Coverage-guided fuzzers (LibFuzzer, syzkaller) plateau on stateful
programs where security-relevant behavior depends on argument values,
not control-flow topology. A composite feedback signal
(PC, arg_hash) provides finer-grained mutation guidance.

Why Existing LLVM/Clang Flags Cannot Solve This

Category A: SanitizerCoverage (trace-pc, trace-cmp)

Semantically blind to composite values. trace-cmp captures individual
comparison operands but not the full function parameter state.
trace-pc-entry-exit reports which functions executed but not with what
values.

Category B: XRay (`-fxray-instrument`)

Designed for latency profiling. Inserts NOP sleds at function
boundaries to record timestamps. The XRay runtime and compiler frontend
are not designed to parse DICompositeType metadata or spill struct
fields into callback buffers. Repurposing XRay for data extraction
would corrupt its ultra-low-latency design goal.

Category C: `-finstrument-functions`

Emits __cyg_profile_func_enter(this_fn, call_site) with no argument
values. Only provides the function address and caller address.

Category D: DataFlowSanitizer (`-fsanitize=dataflow`)

Byte-level taint tracking on every load/store/ALU operation. 10-100x
overhead. Unsuitable for continuous fuzzing or production use. We only
need boundary snapshots, not full dataflow tracking.

Category E: `-fpatchable-function-entry`

Provides NOPs for runtime patching (ftrace, kprobes, eBPF). These
runtime tools depend on DWARF to parse arguments. Under aggressive
optimization (-O2), DWARF variable locations are elided, so runtime
tools cannot extract values. Our pass operates at the LLVM IR level
where arguments are explicit SSA values with type metadata intact.

The Gap

	trace-pc	XRay	-finstrument-functions	DFSan	trace-args/ret
Argument values	No	No	No	Taint only	Yes
Struct field expansion	No	No	No	No	Automatic
Works under -O2	N/A	N/A	N/A	Yes	Yes
Language-agnostic (Rust/C)	Yes	No (needs mfentry)	Partial	No	Yes

Proposed Design

Callbacks

trace-args (one call per parameter at function entry):

void __sanitizer_cov_trace_args(
    u64 pc,           // function address
    u32 arg_idx,      // parameter index
    u32 arg_size,     // sizeof(arg) in bytes
    void *arg_ptr,    // pointer to argument value
    u64 *offsets,     // compile-time struct field layout (NULL for scalars)
    u32 num_fields    // number of struct fields (0 for scalars)
);

trace-ret (before each ReturnInst):

void __sanitizer_cov_trace_ret(
    u64 pc,
    u32 ret_size,
    void *ret_ptr,
    u64 *offsets,
    u32 num_fields
);

Both compose with existing modes (e.g., trace-pc,trace-args).

Struct field expansion

When a parameter resolves (via stripDITypedefs()) to DICompositeType:

Walk member list: extract (byte_offset, byte_size) pairs
Compute FNV-1a hash of type name (stored at offsets[0])
Emit module-level ConstantArray: [hash, off_0, sz_0, ..., off_n, sz_n]
Pass &offsets[1] and num_fields to callback

For scalars: offsets = NULL, num_fields = 0.

Debug info and codegen semantics

This pass strictly obeys LLVM’s rule that -g must not alter code
generation semantics. If DISubprogram is absent, the function is
skipped. Debug info controls what metadata is available to the
runtime, not whether instrumentation occurs.

Key architectural property: The inserted callbacks are explicit IR
CallInst instructions with side effects. They survive all optimization
passes without requiring metadata propagation through SelectionDAG,
FastISel, or GlobalISel. Unlike !pcsections metadata (which requires
ReplaceAllUsesWith handlers and DAGUpdateListener propagation), our
approach needs no backend changes.

Edge cases

No DISubprogram: function skipped
Variadic: skipped
naked: skipped
void return: no trace_ret
Multiple returns: EscapeEnumerator finds all exit points

Implementation

Purpose: What This Coverage Mode Is For

trace-args and trace-ret exist to provide runtime value observability
at function boundaries. The intended consumers are:

Coverage-guided fuzzers that need finer-grained feedback than edge
coverage alone. A composite signal (PC, arg_hash) distinguishes
executions of the same path with different argument values, breaking
coverage saturation on stateful programs.
Runtime contract verifiers that check captured (args, ret) tuples
against pre/postcondition predicates (e.g., “size ≤ buffer capacity”)
to detect value-level violations that produce no crash or sanitizer
report.
Struct member verification at runtime: when a function receives a
struct pointer, the expanded field values reveal whether the struct’s
internal state matches what the function’s logic requires—conditions
that depend on non-deterministic runtime state (heap pressure, RCU
epoch, concurrent modification) and cannot be validated at compile time.

Comparison with Existing LLVM Flags

Flag	What it captures	Argument values?	Struct fields?	Use case
`-fsanitize-coverage=trace-pc`	Edge PCs	No	No	Edge coverage for fuzzers
`-fsanitize-coverage=trace-cmp`	CMP operands	Partial (cmp only)	No	Magic-byte discovery
`-fsanitize-coverage=trace-pc-guard`	Edge + guard	No	No	Custom coverage callbacks
`-fxray-instrument`	Entry/exit timestamps	No	No	Latency profiling
`-finstrument-functions`	Entry/exit addresses	No	No	Function-level profiling
`-fsanitize=dataflow`	Byte-level taint	Taint labels	No	Full dataflow (10-100×)
`-fpatchable-function-entry`	NOP sleds	No (runtime-dependent)	No	Dynamic patching (ftrace/eBPF)
`-fsanitize-coverage=trace-args`	Arg values + struct layout	Yes	Yes (automatic)	Value-aware fuzzing + Function contract verification
`-fsanitize-coverage=trace-ret`	Return values + struct layout	Yes	Yes (automatic)	Postcondition checking

The key architectural difference: existing modes either capture
control-flow topology (trace-pc), individual comparison operands
(trace-cmp), or timing (XRay). None capture the complete function
parameter state including composite type decomposition. trace-args/ret
fills this gap with <10% per-module overhead.

Files Changed

~270 LOC in SanitizerCoverage.cpp, ~170 LOC tests/docs.

File	Change
`SanitizerCoverage.cpp`	`InjectTraceForArgs()`, `InjectTraceForRet()`
`CodeGenOptions.def`	Two `CODEGENOPT` flags
`SanitizerArgs.cpp`	Parse `"trace-args"` / `"trace-ret"` (enum `1<<20`, `1<<21`)
`BackendUtil.cpp`	Wire to `SanitizerCoverageOptions`
`Instrumentation.h`	`TraceArgs` / `TraceRet` booleans

Tests:

llvm/test/Instrumentation/SanitizerCoverage/trace-args.ll
llvm/test/Instrumentation/SanitizerCoverage/trace-ret.ll
clang/test/CodeGen/sanitizer-coverage-trace-args-ret.c
clang/test/Driver/fsanitize-coverage.c

Impact on LLVM

Extends SanitizerCoverage.cpp only (~270 LOC)
No changes to: optimizer, codegen, linker, other sanitizers
No new metadata kinds or propagation requirements
Fully backward compatible: existing trace-pc/trace-cmp unchanged
Works with any LLVM-based frontend (clang, rustc via IR pipeline)

Performance

Per-callback: ~27 ns (dominated by memory spill + indirect call).
With whole-program instrumentation: +133%.
Per-module opt-in: +8.3% on instrumented paths; zero on uninstrumented.

Current status

4 commits: core pass, clang tests, LLVM IR tests, documentation
Deployed with a Linux kernel runtime consumer (KCOV backend)
Tested with both C and Rust-generated LLVM IR
Kernel patch series under review (demonstrates real-world viability)

Open questions

Scalar optimization: Should primitive types use size-specific
overloads (like __sanitizer_cov_trace_cmp{1,2,4,8}) instead of
alloca spill to void *?
Offsets array format: Implementation detail between compiler and
runtime, or stable ABI?
Field count limit: Cap to bound per-call cost for large aggregates?
LTO: Cross-module inlining and offset global visibility?

Feedback welcome.

ysk · June 12, 2026, 2:59pm

Linux Kernel RFC patch series at the LKML: 20260611-b4-kcov-dataflow-v2-v2-0-0a261da3987c@est.tech
Paper: https://arxiv.org/pdf/2606.00455
Workspace repository: github.com/yskzalloc/kcov-dataflow

arsenm · June 24, 2026, 2:31pm

@vitalybuka

ysk · June 24, 2026, 3:19pm

When inspecting functions with more than 6 arguments or structures with multiple fields, printf() based debugging becomes impractical. Register-level dynamic instrumentation (e.g., ftrace, kprobes) is fundamentally limited by the calling convention: on x86_64, only 6 integer arguments, On arm64, only 8 arguments live in registers; the rest spill to the stack with no type metadata. struct fields passed by pointer are entirely invisible without manual offset calculation.

The approach scales where printf() and dynamic instrumentation does not. Rather than manually annotating each function of interest, per-module instrumentation captures all function boundaries with zero source modification. Combined with pattern matching on the captured data flow, this enables contract verification, invariant checking, and fuzzer feedback at a level that register-based tools cannot reach. e.g., feeding back true-condition function arguments and return values.

Tested on both Rust and C kernel modules. Since the instrumentation operates at the LLVM IR level, it extends naturally to any language targeting LLVM.

nikic · June 24, 2026, 3:40pm

How does this deal with the call ABI? There is not necessarily a simple correspondence between a source-level argument and an IR level argument. A single source argument may expand to multiple IR-level arguments. It may be direct or indirect.

Instrumenting call args / returns seems like something that should require instrumentation in the frontend to be fully correct.

I guess if the actual use case is fuzzing, you don’t really care whether there is a divergence from source arguments?

Topic		Replies	Views
fsanitize-coverage trace-cmp/div/gep don't emit callbacks Clang Frontend	0	106	October 18, 2016
Missing coverage instrumentation with trace-pc-guard option LLVM Dev List Archives	0	132	March 15, 2017
RFC: [DebugInfo] Improving Debug Information in LLVM to Recover Optimized-out Function Parameters LLVM Dev List Archives	16	317	March 18, 2019
Building LLVM's fuzzers LLVM Dev List Archives	32	562	September 16, 2017
Default implementation of callback functions for sanitizer coverage Clang Frontend	0	119	September 23, 2021