Summary
To enable effective heap partitioning in a hardened memory allocator, such an allocator requires semantic information about allocations that is typically lost during compilation. We propose a framework for allocator-partition hints instrumentation, which provides partition ID hints derived from language-level or other static properties. The design enables different partitioning schemes, with a type-based one being our primary focus. The feature can be enabled for Clang with -fsanitize=alloc-partition
.
The design’s focus is sanitizer-style instrumentation that transparently rewrites allocation calls (e.g., malloc
, new
) to include a partition_id
. The language frontend infers source-level information and attaches !alloc_partition_hint
IR metadata to allocation calls; the middle-end IR pass (AllocPartition) consumes this metadata to rewrite calls based on a configurable partitioning policy. This framework enables various heap organization strategies, with the primary motivation being type-aware hardening deployable across large codebases without requiring source modifications.
Background and Motivation
Heap memory allocators could implement stronger memory-safety hardening features if they had richer semantic information about the allocations they manage. One particularly powerful hardening technique is to partition the heap to isolate different kinds of allocations [PartitionAlloc, ChromeSecurity 2022, Erlingsson 2025].
For example, separating pointer-containing objects from pointerless data allocations can help mitigate certain classes of memory corruption exploits [XZone]: an attacker who gains a buffer overflow on a primitive char array cannot use it to directly corrupt a vtable pointer, function pointer, or other critical metadata in an object residing in a different, isolated heap region. Furthermore, heap isolation can also mitigate many data-only attacks that cannot be mitigated by control-flow mitigations.
It is important to note that heap isolation strategies offer a best-effort approach, and do not provide a 100% security guarantee—albeit achievable at relatively low performance cost. The effectiveness of heap isolation varies for different libraries and binaries, along with the properties of the given allocator implementation.
The fundamental blocker to implementing such strategies is that standard allocators are blind to source-level semantics. A call to malloc()
, __builtin_operator_new()
, or any of the numerous standard untyped memory-allocation functions provide no information about whether the memory is intended for an array of integers or for a critical object containing pointers.
To apply heap partitioning to large, existing C/C++ codebases, a transparent approach is required. The proposed solution is transparent to source code, meaning no modifications to allocation call sites are needed. This model is directly analogous to other sanitizers, which also pair compiler instrumentation with a runtime library—in this case, a compatible, partition-aware memory allocator (this RFC only discusses the compiler support).
Related Features
Several existing or proposed mechanisms provide allocators with type information, but none are suitable for the goal of transparent, binary-wide heap partitioning.
Language Extensions such as C++ P2719 and the proposed C typed_memory_operation (TMO) attribute allow libraries to define and use type-aware allocation functions. However, they require modification of allocation APIs, and explicit inclusion of these APIs across a codebase. This makes language extensions unsuitable for transparent deployment across large unmodified codebases. The sanitizer-style deployment model is a better fit to provide the option for heap partitioning, without the upfront risk (and cost) of a wholesale conversion.
Hardening vs. Performance Instrumentation. The MemProf framework uses profile data to guide allocation placement for performance. AllocPartition is fundamentally different: it aims to provide deterministic, policy-driven partitioning. Hardening techniques require consistent, predictable behavior that works for all code paths, which is orthogonal to the goals of non-deterministic, profile-based partitioning.
Design
One of our observations is that for heap-partition based hardening, there is no “one size fits all”. The design aims to provide a configurable and extensible framework. While heap hardening is the initial motivation, the design can support other static partitioning schemes. By adopting a sanitizer-style approach (-fsanitize=alloc-partition
, no_sanitize
attribute, and ignorelists support), large-scale deployment mirrors the experience of other sanitizers.
Hint Generation and Instrumentation
The feature requires frontend cooperation, while the majority of the heavy lifting is done in a middle-end IR pass. This provides greater flexibility and enables coherent multi-language support for heap partitioning—which is becoming especially relevant, as code generated from different LLVM-based languages is linked into the same binary sharing a heap allocator.
Metadata. The language frontend is responsible for attaching !alloc_partition_hint
metadata to allocation call instructions. This metadata currently captures source-level type information that the middle-end IR pass cannot trivially recover otherwise. The !alloc_partition_hint
metadata is an MDNode with the following format: !{<type-name>, <contains-pointer-bool>}
- where
<type-name>
is the fully qualified name of the inferred type; <contains-pointer-bool>
is an i1/boolean constant indicating if the type (recursively) contains a pointer.
Frontend Hint Generation (Clang). The integration with Clang emits !alloc_partition_hint
for allocation calls derived from the allocated type. The allocated type is inferred as follows:
- For C++
new T
andnew T[N]
expressions, the allocated typeT
is known syntactically. - For untyped allocation calls to functions with the
malloc
oralloc_size
attributes, the type is inferred from asizeof()
expression used in an argument. In other words, for calls to functions such asmalloc()
,__builtin_operator_new()
, or any of the other untyped allocation functions, the type is inferred from common idioms likemalloc(sizeof(T))
orcalloc(N, sizeof(T))
.
The sizeof()
-based type inference is similar to the earlier proposed C-extension typed_memory_operation (TMO). We expect that the core algorithm to infer types based on sizeof-expressions can be shared between the TMO language extension and !alloc_partition_hint
generation used for -fsanitize=alloc-partition
.
Type Inference Limitations and Diagnostics. The sizeof()
inference is a best-effort heuristic. It is known to fail for complex patterns, such as with type-erasing containers that request an untyped bag of bytes.
- Diagnostics: To aid developers in identifying where inference fails, the pass provides optimization remarks (Clang:
-Rpass=alloc-partition
) to point out allocation sites that could not be associated with accurate type-hint information (missing!alloc_partition_hint
metadata). This information can be used to avoid code-patterns that prohibit accurate type inference, or improve frontend hint generation. - Fallback: In cases where the frontend cannot generate a hint, the AllocPartition pass falls back to a less-precise analysis of the pointer’s immediate IR uses, or simply assign a default partition ID.
Instrumentation Pass. The AllocPartition middle-end IR pass consumes the hints to rewrite allocation calls, which runs late in the middle-end optimization pipeline. By default only known libcalls are covered, but coverage can be extended to custom allocation functions with the -alloc-partition-extended
option (Clang: -fsanitize-alloc-partition-extended
). Indirect calls to allocation functions (incl. standard ones) are not covered.
Partitioning Modes. The AllocPartition pass is designed to be extensible with different policies for computing a static partition_id
. Initial modes include:
-
TypeHashPointerSplit (default): Our initial hardening-focused policy. The partition ID space is split, with one half reserved for types that (recursively) contain pointers and the other half for non-pointer types, avoiding partition ID collisions between the pointer and non-pointer containing categories.
-
TypeHash: Partitions based on a hash of the canonical type name (typedefs resolve to underlying).
-
Random / Increment: Simpler modes for testing and other use cases.
The Clang default mode is TypeHashPointerSplit. We are not yet making other modes available via a frontend option, as these may be subject to removal or change. Experimentally, users can change the mode via -mllvm -alloc-partition-mode=<mode>
.
Runtime Interface and ABI. The interface between the compiler and the partition-aware allocator is designed to be simple and efficient. The instrumentation rewrites allocation function (<func>
) calls to __partition_<func>(<func args>..., uint64_t partition_id)
, where partition_id
is a compile-time computed constant. All memory builtin libcalls (isAllocationFunction()
) along with custom allocation functions (see “Instrumenting Non-Standard Allocation Functions” below) are supported.
The choice of an opaque uint64_t
partition ID deliberately abstracts semantic information, enabling future enhancements to partitioning modes transparently. One goal was to avoid additional runtime cost and the complexity of parsing structured type information.
The instrumentation provides a default ABI which appends a partition ID function argument, and a “fast ABI”. The latter is more performant with a very small ID space, controlled with -alloc-partition-max=<max-partitions>
(Clang: -fsanitize-alloc-partition-max=<max-partitions>
).
ABI | Clang Flag | <func>(<size>) |
Partition ID Argument |
---|---|---|---|
Default | (none) | __alloc_partition_<func>(<size>, <id>) |
Passed as final function argument |
Fast | -fsanitize-alloc-partition-fast-abi |
__alloc_partition_<id>_<func>(<size>) |
Encoded in function name |
Instrumenting Non-Standard Allocation Functions. To support environments using non-standard allocation functions (e.g. in OS kernels), AllocPartition can instrument such functions with the Clang -fsanitize-alloc-partition-extended
option. This enables instrumentation of any call marked with the !alloc_partition_hint
metadata. When used with Clang, all calls to functions marked with __attribute__((malloc))
or __attribute__((alloc_size(..)))
will therefore be instrumented.
Future Enhancements
__builtin_alloc_partition_id(<type>)
: For Clang, introduce a builtin helper to query the partition ID (based on the current mode) of a given type. To implement, a new LLVM intrinsic would be introduced, which is substituted with a constant in the AllocPartition pass. This would make the partition ID no longer opaque, and code may start depending on particular properties of partition IDs that we do not (yet) want to guarantee longer-term to allow for improvements to partitioning modes.
Implementation
The current implementation can be found at: GitHub - melver/llvm-project at alloc-partition
Frequently Asked Questions
How can I deal with allocation wrapper functions? One strategy would be to mark wrappers with __attribute((alloc_size(..)))
and compile with -fsanitize-alloc-partition-extended
; this requires providing a partition-aware allocation wrapper function.
For the TypeHash* partitioning modes, is the partition ID stable? Yes, it relies on xxHash as the hash function.
Do the TypeHash* modes distinguish between e.g. uint8_t and unsigned char? No, the innermost underlying type of a typedef is looked up. A notable exception is uintptr_t
(see below).
Does the TypeHashPointerSplit consider uintptr_t a pointer? Yes, it does. Syntactically uintptr_t
is not a pointer; semantically, however, very likely used as such.
Are indirect function calls to allocation functions covered? No.
Cc: @vitalybuka @pcc @kees @tsan @Glider @waffles_the_dog @rcvalle