[RFC] Typed allocator support

waffles_the_dog · June 20, 2024, 5:05pm

Summary

Type based segregation of allocations has been increasingly adopted as a low cost mitigation for memory errors in unsafe languages. To support systemic adoption of such allocators, we are proposing a new attribute to support automatic adoption of these allocators. This attribute allows the developer to annotate C APIs to specify that there is a type-segregating version of a allocation function available, and to redirect calls to that typed variant, including an inferred type tag to support automatic type based segregation by the allocator.

Motivation

As unsafe languages C and C++ are prone to a variety of memory error classes with severe security implications. Reducing and mitigating these issues without requiring large scale rewriting of existing code, and without breaking existing ABI, and many different approaches are needed in concert to resolve these issues. The existing -fbounds-safety proposal[1] works to prevent security problems caused by logic errors resulting in out of bounds access to memory. Another major attack vector is logic resulting in temporal memory or lifetime safety bugs, most typically use-after-free, and this proposal aims to provide tools to mitigate attacks built on these errors.

One class of systemic mitigation that can achieve this is type based segregation of dynamic allocations, an approach that Apple has already used to great benefit on its platforms[2], as have other projects[3]. The core problem in existing approaches to type segregation is that they require significant source adoption work, do not support pure C, or have limited/ad-hoc type segregation (often based on call stack introspection) that limit their applicability to general platform wide adoption. The attribute we are proposing allows this automatic exposure of type information in C APIs, where the language limits the ability to automatically communicate type information.

The increased flexibility and type awareness of C++ provides options for more general solutions, and to that end we are proposing extensions for operator new and operator delete to the C++ language committee that expose the actual type being allocated to the relevant allocation APIs.[4]

Apple’s experience has found that type based segregation is a generally effective and low cost mitigation for many common attacks on C and C++ code, and we expect the use of such segregation to increase over time.

The intent of this proposal is to allow libraries and platforms to provide type segregating allocation APIs, and have those APIs be adopted automatically and transparently without requiring any adoption effort by downstream consumers to switch to the new APIs or to provide explicit type information.

Programming model

General Usage

To adopt this feature an author providing an allocation API uses this attribute to specify the typed variant of the method to call, as well as the parameter to perform type inference over.

e.g.

void *malloc(size_t sz);

is updated to

void *typed_malloc(size_t sz, uint64_t type_descriptor);
void *malloc(size_t sz) __attribute__((typed_memory_operation(typed_malloc, 1)));

The result of these declarations and annotations is transparent redirection of calls to annotated allocation functions, functionally equivalent to rewriting

ptr = malloc(sizeof(SomeType));

to

ptr = typed_malloc(sizeof(SomeType), /* type descriptor for SomeType */)

For a developer using such annotated APIs this redirection is largely transparent. The major caveat is that maintaining source compatibility requires this redirection only occur during direct calls, as the type of the typed segregating function is necessarily distinct from that of the original so indirect calls must use the unsegregated interface.

Attribute semantics

The typed_memory_operation attribute takes two parameters, the first parameter is the type segregating function that is to be used as the new target for typed calls, the second parameter is the argument number for the parameter to perform type inference over. Earlier implementations allowed the target function to be an entirely opaque symbol that avoided the need for actual declaration of the typed interface, however in practice it proved beneficial to expose the typed interface explicitly, and doing so allows semantic checks that prevented errors due to silent API mismatches.

Type inference and type descriptors

By design this proposal does not require developers explicitly specify types being allocated, but rather it introduces an inference step to be performed over the call expression to determine what type[s] are being allocated. This inference is based on local analysis of the callsite assuming idiomatic coding practices to determine the set of C types being allocated, and whether the allocation is fixed size.

To reduce the performance burden on the allocator from explosive growth of “distinct” types, the type descriptors in this proposal are produced by first coalescing the relevant C types to unique structural types based on the type[s] of data in each byte of a data type rather than the type name, point of declaration, or similar source level properties.

Having developed the structural type for an allocation, we need the descriptor that is actually used to be sufficiently compact that it does not impact code size or call performance. To that end this proposal does not use or provide any complex type metadata, but rather uses a single 64bit type descriptor that contains flags to indicate core properties of the type (whether the object contains pointers, vtables, etc), and a hash of the structural type. The flags are necessary to allow allocators to adjust the segregation policies according to the data contained by those types, and the hash provides the core mechanism to segregate distinct types.

As this proposal supports existing code, and performs heuristic based type inference it is possible for the inference to fail. In such a case the redirection is still performed, however the type descriptor in this case is set to indicate that inference failed, and the descriptor hash is derived from the call location, so that the allocator is able to segregate allocations from different call sites even when inference fails.

Rewrite target ABI and semantics

The rewrite target for an annotated function logically acquires an additional type descriptor parameter, that is required to be declared immediately following the parameter targeted for inference. In other words the an annotated API as below

void *allocator_function(T1 Arg1, T2 Arg2, ..., TN ArgN, TN1 ArgN1, ...) __attribute__((typed_memory_operation(typed_allocator_function, N)));

requires that the typed_allocator_function target function be declared as

void *typed_allocator_function(T1 Arg1, T2 Arg2, ..., TN ArgN, uint64_t Descriptor, TN1 ArgN1, ...);

Inferring the type descriptor value passed as Descriptor does not involve any runtime evaluation and is determined entirely statically, and the call rewrite does not impact evaluation order or side effects of any argument expression. As there is a change to the number and position of arguments to the target function vs the original, this does necessarily impact the register and/or stack locations of parameters though this should not impact any existing code as definitionally the new target function is aware of the ABI from the time of initial adoption.

Portability with toolchains that do not support the extension

If a toolchain does not recognize this extension, either the attribute will be ignored, or API providers will need to ensure appropriate macro guards around the declarations to prevent breakage due to -Werror -Wunknown-attributes and similar configurations. As the adoption of the type segregating APIs is an automatic translation from the original call, the end users of these APIs do not need to maintain different code paths for platforms supporting the type segregation.

Implementation

We have implemented this proposed extension, and have deployed it on the codebases for multiple large consumer operating systems with no meaningful source compatibility impact or code size regressions. Runtime overhead is dependent on design decisions and trade offs made by the allocator, however in our deployment we were able to adopt trade offs that resulted in no overall runtime or memory regression while providing the segregation properties we felt were necessary.

Our implementation of this extension performs the call retargeting during the CodeGen pass in Clang, as this means that any compiler passes, warnings, or other analysis over the AST or during Sema will produce feedback to the user that match the call site as written rather than the implicitly rewritten call that they are not necessarily aware of.

ABI

In addition to the typed target ABI, it is also necessary to specify the ABI for the type descriptor that is exposed to the platform or allocator library vending the type entrypoints. In our current implementation this information is passed via the following structure.

enum tmo_layout_semantics : uint16_t {
    tmo_layout_none = 0,
    tmo_layout_data_pointer = 1 << 0,
    tmo_layout_struct_pointer = 1 << 1,
    tmo_layout_immutable_pointer = 1 << 2,
    tmo_layout_anonymous_pointer = 1 << 3,
    tmo_layout_reference_count = 1 << 4,
    tmo_layout_resource_handle = 1 << 5,
    tmo_layout_spatial_bounds = 1 << 6,
    tmo_layout_tainted_data = 1 << 7,
    tmo_layout_generic_data = 1 << 8,
};

enum tmo_type_semantics : uint8_t {
    tmo_type_semantics_none = 0,
    tmo_type_semantics_is_polymorphic = 1 << 0,
    tmo_type_semantics_has_mixed_unions = 1 << 1,
};

enum tmo_type_kind : uint8_t {
    tmo_type_kind_c = 0,
    tmo_type_kind_objc = 1,
    tmo_type_kind_swift = 2,
    tmo_type_kind_cxx = 3
};

enum tmo_callsite_semantics : uint8_t {
    tmo_callsite_semantics_none = 0,
    tmo_callsite_semantics_fixed_size = 1 << 0,
    tmo_callsite_semantics_array = 1 << 1,
    tmo_callsite_semantics_header_prefixed_array = 1 << 2,
};

struct tmo_type_descriptor {
    tmo_layout_semantics layout_semantics : 16;
    tmo_type_semantics type_semantics : 4;
    tmo_type_kind kind : 2;
    tmo_callsite_semantics callsite_semantics : 4;
    unsigned unused : 4;
    unsigned version : 2;
    uint32_t hash : 32;
}

This structure is then flattened to a 64 bit integer as [layout:16][type:4][kind:2][callsite:4][unused:4][version:2][hash:32].

Builtin support

To support interfaces where explicit type information is available (as can occur in wrappers and template allocation functions) the __builtin_tmo_get_type_descriptor(type or expression) builtin is provided that produces a type descriptor for the type of the given expression, without performing any heuristic driven inference. This can be used to support explicit adoption in environments where exact types can be known (for example, macro and C++ template based allocators).

Limitations

As neither C nor C++ provide full object introspection, and C does not support C++ style compile time code execution, it is not possible to specify this extension in a manner that allows developers to customise construction of the type descriptor. We have endeavoured to define the descriptor in a manner that makes it generically usable, however doing so necessarily loses some granularity.

Allocation wrappers are another common idiom in normal code, and frequently separate the expression that contains type information from the allocator call site. As a result such wrappers make allocation types opaque and coalesce the allocator callsites so the site based hash fallback also fails to provide information that can be used to support allocation segregation. The solution for the specific issue is to have the wrapper authors make use of this attribute to provide typed wrappers that can explicitly forward the inferred type descriptor explicitly to the underlying typed allocation APIs.

Future directions

The current inference model is derived from local analysis of idiomatic use of sizeof operator and similar constructions, and as a result misses cases where local inference could still be performed - casting, out parameters, etc.

The semantic information currently provided in the type descriptor is limited by extensive use of opaque types (untyped pointers and intptr_ts) in objects, so adding a mechanism that allows such data and types to be annoted with semantic information would potentially be beneficial, though doing so in a way that that is ergonomic and is compatible with the required constraints may prove challenging.

Citations

[1] RFC: Enforcing Bounds Safety in C (-fbounds-safety)
[2] Towards the next generation of XNU memory safety: kalloc_type
[3] Efficient And Safe Allocations Everywhere!
[4] P2719R0: Type-aware allocation and deallocation functions

Clang consensus called in this message.

waffles_the_dog · June 20, 2024, 9:28pm

Thanks to @akorobeynikov for pointing out the import of this clobbered the style

efriedma-quic · June 22, 2024, 12:28am

Am I following correctly that:

This is only proposing to change malloc(). This doesn’t affect free(), and new/delete is a separate proposal.
The compiler looks for expressions specifically of the form malloc(sizeof(T)).

What’s the benefit of unifying the descriptors for different callsites allocating the same type? Does reducing the total number of descriptors make the mitigation more effective?

waffles_the_dog · July 8, 2024, 11:51pm

Sorry for the delayed response, I was out for ISO C++ meeting and then some chaos happened.

The proposal is intended to apply to any allocation function or allocation wrapper - as wrappers remove any accurate inferable type expression at the point the underlying allocator is invoked the model of use is that allocation wrappers would themselves have typed variants, using this feature to handle the call site rewriting automatically, and then directly invoke the typed allocation API as appropriate.

It in principle works for deallocation (the attribute applies, type inference is performed and the call site rewrite occurs), though the expression inference that occurs is likely suboptimal (e.g. distinguishing free(singleValuePointer) from free(arrayOfValues) isn’t possible in C but does impact what the type identifier should be)

To clarify the function does not need to be malloc nor any a priori known function, just a call to a function with this annotation.

Answering the core question: generally yes, the inference matches many common idioms (sizeof(T), sizeof(T)*N, sizeof(T)+sizeof(U), etc). There is no specific reason this could not be expanded over time.

If the question was about how expansive the inference model is, it is currently entirely local to the expression passed through the specified parameter - no flow analysis of the surrounding code occurs - it could be performed but there’s a compile time and complexity costs to doing so.

Largely performance - under a model where different types need to be sequestered every “unique” type has significant overhead, so making every allocation site get unique allocation pool makes overhead much higher than is necessary, and is simply non-viable for system allocators.

There’s also security value, though this applies more to the C++ language proposal (P2719) in that the allocator can verify that when performing a deallocation the object being deallocated is correctly typed for the deallocation site (circling back to applying this attribute to free() you’d need to the type inferred at a shared free site to match the site inferred at the allocation site, which does not semantically work if every allocation site gets a unique type).

efriedma-quic · July 9, 2024, 7:59am

I guess this would require rewriting your code so the type of the pointer passed to free() matches the type inferred by malloc? In general, the argument type of free is just void*, so you can’t usefully compute the type. Could be interesting for codebases that are willing to do such a rewrite, I guess.

Hmm. The choice of heuristics here could be an entire proposal itself.

waffles_the_dog · July 10, 2024, 10:39am

Yeah, the cost of adopting free(x) must require x=alloc(...) always agree on the exact type of x given idiomatic C is challenging. Again C++ can do better because subtyping can always ensure the correct type is involved (sans triggering UB), but C style allocators will always be at a disadvantage here.

There are additional concerns where ABI stability is a concern: if the “type” passed to the deallocator must match that inferred at the allocation site then the mechanism of inference, the internal structure, and even “opaque” properties could become abi breaks.

This proposal is only providing a mechanism to provide type information that the allocator. Determining how to use that information to segregate types is a decision to be made by the allocator.

AaronBallman · July 12, 2024, 11:46am

Thank you for this RFC, it’s very interesting work! A few things caught my attention:

The increased flexibility and type awareness of C++ provides options for more general solutions, and to that end we are proposing extensions for operator new and operator delete to the C++ language committee that expose the actual type being allocated to the relevant allocation APIs.[4]

Our experience with adding new overloads to operator new has not been bad, but new overloads of operator delete have been… very unpleasant. After literal years of trying, we’ve almost got sized deallocation deployed everywhere… from C++14. I’m very concerned we’ll wind up in a similar situation here with deployment. Have you considered ways to address this?

Apple’s experience has found that type based segregation is a generally effective and low cost mitigation for many common attacks on C and C++ code, and we expect the use of such segregation to increase over time.

Can you speak to those costs? What kind of performance changes do you see in terms of compile times and run times?

For a developer using such annotated APIs this redirection is largely transparent. The major caveat is that maintaining source compatibility requires this redirection only occur during direct calls, as the type of the typed segregating function is necessarily distinct from that of the original so indirect calls must use the unsegregated interface.

Is this thunk happening at the compiler or at the runtime library? I ask because of function pointers – if this is happening at compile time, then it seems like calls to malloc through a function pointer perhaps won’t (always) be rewritten. That in an of itself may be reasonable, but it makes for an awkward mismatch if we ever need to have a matching rewrite for free. e.g., the user uses a function pointer to make the allocation but calls free directly because that’s the only deallocation function in C. IOW, code like this:

void *(*allocator)(size_t) = malloc;
void *ptr = allocator(sizeof(int));
free(ptr);

To that end this proposal does not use or provide any complex type metadata, but rather uses a single 64bit type descriptor that contains flags to indicate core properties of the type (whether the object contains pointers, vtables, etc), and a hash of the structural type.

Will this feature be allowed in freestanding mode when malloc() is supported there? If so, is 64-bits appropriate for that situation?

If a toolchain does not recognize this extension,

I’m wondering about C as the lingua franca for a lot of other languages. If someone wants to use this from, say, Visual Basic 6 (or any other language that allows C FFIs), can they still do so? (I presume that for them, the call to malloc still resolves to the real malloc implementation, but they can call typed_malloc or whatever if the symbol is exposed, same as any other C function.)

Our implementation of this extension performs the call retargeting during the CodeGen pass in Clang

Ah, so this is not going through thunks but actual rewrites. What happens for code like the function pointer case? (For the C++ proposal, another question is: how does this impact constexpr memory allocations? Does the constexpr engine need to understand how to perform this type inference or is the expectation that this feature wouldn’t be involved in constant expression evaluation?) Another C specific question would be, what happens with this:

static_assert(_Generic(malloc, void *(*)(size_t) : 1, default : 0));

I presume that because we do the rewrite at codegen time, this code passes the static_assert. But then you have the question about how this behaves:

void *ptr = _Generic(malloc, void *(*)(size_t) : malloc)(12);

struct tmo_type_descriptor {
    tmo_layout_semantics layout_semantics : 16;
    tmo_type_semantics type_semantics : 4;
    tmo_type_kind kind : 2;
    tmo_callsite_semantics callsite_semantics : 4;
    unsigned unused : 4;
    unsigned version : 2;
    uint32_t hash : 32;
}

The one thing that worries me about this structure is forwards compatibility and ABI breaks. For example, tmo_callsite_semantics and tmo_type_semantics both have four bits available and we’re already defining three of them. We do have four bits of unused space available, but I wonder if that unused space should be spread around to the fields we think may need extension in the future.

This structure is then flattened to a 64 bit integer

With the machine endianness? Or always with a particular endianness?

waffles_the_dog · July 12, 2024, 1:16pm

(cc’ing @ldionne)

Overloading operator new and operator delete are both filled with a lot of excitement

We discussed briefly out of band, but I’ll detail the issues here as a public record.

For the C++ proposal (P2719) there is no new implicit global new or delete, just the ability to add a new/delete definition that receives concrete type information. That limits (does not remove) the risks we get from new new+delete operators. In principle it also allows people to implement aligned, sized, unaligned, etc without requiring the language to add yet more new/delete APIs.

The real issue is that the way the C++ specification determines the correct delete to call is based on the scope and the parameters used for operator new, because of course

If we consider the intended usage of the C++ proposal, a developer would be able to specify

template <std::derived_from<MyBase> T> void *operator new(type_identity_t<T>, size_t) {
...
}

deletion via the general global operator delete is not suitable. C++ as currently specified (maybe there’s a way to fix things?) finds the correct delete operator by looking at the definition scope of new, and takes the same parameters (sans size vs pointer).

No noticeable impact on compile time (I imagine you could construct a pathological scenario that would show compile time degradation - something like thousands/millions of calls to malloc with different large/complex types might do it).

No impact on performance of generated code (though as above I’m sure you could construct a case that has a measurable delta).

I’ll look into what concrete information I can provide, but the purpose of this design is to allow the allocator to make decisions appropriate to its usage. The overall performance of an allocator is determined by the tradeoffs that are appropriate for the particular usage, At a very high level: greater granularity of type segregation necessarily increases runtime memory usage, reducing the degree of segregation reduces the memory cost but also reduces the security advantage, so your logic for choosing how to segregate may become more complex, etc.

You asked about thunks but later on reached the description of how the retargeting works. Obviously that answers the thunk question, the remainder of the questions in this block are variations on how does this impact what type of the malloc symbol is, to which the answer is - by design - that it does not. This attribute only impacts direct calls to an attributed function, it simply isn’t possible to change the behavior in any other context without causing myriad problems.

Code that uses function pointers for allocation would need to explicitly add logic to allow a developer to provide a pointer to a typed allocation function. e.g. imagine

typedef struct AllocatorContext {
    int version;
    void* (*alloc)(size_t);
} AllocatorContext;
void * AllocatorAllocate(AllocatorContext* allocator, size_t size);

Could be updated as

typedef struct AllocatorContext {
    int version;
    void* (*alloc)(size_t);
    void* (*typed_alloc)(size_t, size_t);
} AllocatorContext;
void * AllocatorTyperdAllocate(AllocatorContext* allocator, size_t descriptor, size_t size);
void * AllocatorAllocate(AllocatorContext* allocator, size_t size) __attribute__((typed_memory_operation(AllocatorTyperdAllocate, 2)));

This attribute is not specifically tied to malloc - it’s more correct to think of it as a type inferrer + type parameter rewriter, that is primarily of value for allocator functions like malloc (which impacts what information it is providing), but I’m sure people can invent new use cases. There is no implicit adoption logic either - a free standing build would not see any change unless the free standing context’s malloc declaration was explicitly annotated to do so.

Regarding 64-bit descriptor: this was the smallest we could collapse the type information into that provides both explicit type meta information, while also = being able to distinguish between most types.

Correct: this attribute does not remove the annotated function, the only impact is that the expression annotated_function(..., expression, ...) gets logically replaced with annotated_function_target(..., expression, __builtin_get_typedescriptor(inferred type of expression)...).

I actually had to check this. It results in the behavior that you’d want from security pov in Po4/5, but not necessarily what you might expect?

Agreed this is super concerning. I’ve tried to make it as close as possible to “do not assume your understanding of descriptor is stable”. e.g this is intended to be a mitigation, not a 100% fix, and so the allocator runtime is assumed to be making near random choices about object placement and the concern if just removing the kind of i

ldionne · July 12, 2024, 7:57pm

I know you already know that, but just to clarify and make sure that nobody gets the wrong impression, this RFC does not propose adding any operator new or operator delete to the standard library. This is only about malloc & friends.

While we could take the same approach for operator new and operator delete (and in fact we did at Apple), we are pursuing a different approach for solving this problem for C++ as described by Type-aware allocation and deallocation functions. We believe that approach allows both implementing the typed allocator model described here but also much more, and so we are targeting standardization of that core language feature as opposed to a compiler vendor extension.

That being said, it is still 100% valid to be concerned about deployment challenges for doing this with malloc & friends. In my experience, there are a few things that made new operator new & operator delete really challenging to deploy. Also, it is worth mentioning that we did have trouble with both: we first had trouble with deploying operator new(size_t, align_val_t) ~5+ years ago, and more recently we also had similar trouble with deploying operator delete(void*, size_t).

The first thing that made new allocation/deallocation functions tricky to deploy in C++ is the fact that the deployment target provided to the compiler must be strictly respected by users. Indeed, when we e.g. add a new typed_malloc entry point in a system library, we also teach clang to know in what versions of the system it can assume the entry points. Clang will then generate calls to those entry points only if the system supports that – nothing new so far. However, people tend to lie about their deployment target. For example, one thing that happens a lot is folks specifying (all versions are fictitious) --target arm64-apple-macos13.0 but then trying to run the produced binary on macOS 12. Normally, that’s an incorrect but fairly benign thing to do. However, if the compiler generated calls to APIs that are not available in macOS 12 based on the user’s promise to run on at least macOS 13, this becomes a more serious violation and your binary will basically fail to load on the older OS. This is technically not specific to allocation functions. Introducing anything new that depends on system support has the same properties, it’s just that allocation is so omnipresent that the risk of actually hitting this failure is very high if you were doing things wrong in the first place.

Second, operator new and operator delete in C++ are specified in a way that they are replaceable by users at link time. So let’s say a user was previously replacing new(size_t), but not new(size_t, align_val_t) because it didn’t exist yet. Now, if the compiler suddenly starts calling new(size_t, align_val_t) in a place where it previously called new(size_t) because you’re allocating an overaligned type, this is a noticeable change in behavior. Whereas your user-provided new(size_t) would have been called previously, the system’s new(size_t, align_val_t) will now be called instead. The user needs to go and override new(size_t, align_val_t) as well if they want to control that allocation now. That is effectively a breaking change that was done with operator new and operator delete in the past. Fortunately, malloc does not suffer from the same problem nearly as much because malloc is not overridable. People who intercept malloc will need to e.g. disable the typed allocation feature or intercept both malloc and typed_malloc, but that’s a really small impact in comparison to the C++ story.

So all in all, I am not super concerned about the deployment challenges for something like this in pure C. While it certainly wouldn’t be trivial, I don’t think there’s anything fundamentally difficult here. Doing this for C++ is definitely harder, but we have also done it successfully (hint: we disable the whole machinery if we detect that users have overridden operator new).

AaronBallman · July 17, 2024, 5:35pm

Excellent, that’s good to hear!

Thanks! So how likely is it that you’re going to want to extend this concept to free()? That’s where my primary concern comes in.

Excellent, thank you!

Which behavior is that though? I’m guessing that under your proposal, the association matches the controlling operand, resulting in malloc(12) which is then rewritten? Or is this closer to the function pointer case because malloc decays from a function designator into a function pointer, and thus the call is not rewritten?

waffles_the_dog · July 30, 2024, 9:59pm

[note]Sigh, apparently I did not hit the reply button so this was sitting as a draft[/note]

It matches the controlling operand, and performs a rewrite, however that’s not intentional from explicit design, but rather a byproduct of the rewrite occurring at (clang) codegen time which is after _Generic has resolved. For security purposes we want to maximize the times we want to maximize the number of allocations that get rewritten to the typed variant of the allocation.

One thing I did strongly consider was trying to the rewriting even further down (all the way to IR), which would allow optimization passes that can flatten or devirtualize an allocation routine to be then retargeted, but the problem is that would mean needing to be able to go from the RValue argument in IR to the original source expression which just is not feasible in any real way. My real thinking in such an approach was not so much defeating opaque calls to the allocators but rather trying to increase our ability to automatically handle wrappers, a la

void *my_array_alloc(elementsize, count);
...
Foo* array = my_array_alloc(sizeof(Foo), count);

which currently requires a dev to manually add the typed variant, which given the prevalence of allocator wrapper functions is somewhat irksome from the angle of “we want the maximum coverage for minimum manual work”

AaronBallman · July 31, 2024, 11:38am

Thanks for the explanation, that seems like reasonable behavior to me (the only oddity is that the final call doesn’t actually have the type as specified in _Generic, but that’s true of any call to the allocation function, so not entirely unexpected, just more in-your-face).

AaronBallman · July 31, 2024, 11:43am

So far, I’ve not seen a whole lot of community engagement on this RFC, which makes it hard for me to determine whether this does or does not have consensus.

If you have opinions on whether Clang should support this or not, please share them!

Endill · July 31, 2024, 12:18pm

It would be nice if someone can describe how this RFC interacts with P2719R0, which was supported in EWG in St. Louis, so it might be something we’d need to implement anyway.

It would help me to form an opinion, and potentially others who have been watching this discussion from the sidelines, too.

waffles_the_dog · July 31, 2024, 5:20pm

If you were wanting a single type aware allocator implementation, then assuming this proposal was present, you would (and the design of P2719 is intended to support) use the supporting builtins from this proposal to construct an appropriate descriptor and call the typed allocation directly, something along the lines of

template <typename T> void *operator new(type_identity_t<T>, size_t sz) {
   return my_type_aware_alloc(sz, __builtin_tmo_get_type_descriptor(T));
}
// (and so on for the exponentially increasing number of new/delete functions)
...

This is functionally what apple clang does on Darwin today.

In an ideal world people would be able to define their own type metadata, and that’s why P2719R0 is designed as it is, but in C there’s just not the ability to meaningfully operate on types, so it’s not currently feasible to have the user/library specify construction of type descriptors.

ilovepi · August 13, 2024, 7:20pm

A little late to the party, but reading the proposal I wondered why you didn’t leverage the type metadata used for things like CFI, since it already has extensive support? It seems like a natural fit for doing this type of work.

waffles_the_dog · August 13, 2024, 8:49pm

If we could use a shared type model that would be great (multiple implementations of the same thing in a single codebase is silly :D), but my recollection of the CFI type model is that it is focused on the type system idea of what types are, as for the purpose of CFI struct S { int i; } and struct T { int i; } are different types. As this proposal is targeted at general allocators the structural type is more important than the named type - e.g. two different types with the same name must be different, and for practical reasons you also want to be able to unify structurally equivalent types with different names.

It’s important to note that we’re absolutely dealing with trade offs here - in an ideal world every distinct type would be completely separated from any other type, but for practical reasons that’s not possible for a general allocator. From a language feature point of view you really want the developer to be able to choose how to encode/separate types, and that’s what we’re proposing in C++ (P2719), but for C that’s not really an option.

What we’ve done in this proposal is to attempt to strike a balance that makes sense for general allocators (the system’s malloc, etc), it provides flags to indicate properties of the type being allocated (see the _semantics enums) and a precomputed hash over the structural type. The semantic flags allow the allocator to make decisions about how to handle an allocation, and the type hash provides a trivial mechanism to separate different types without incurring any runtime overhead.

As an example a malloc implementation could say “I’m mostly concerned about UAF involving vtables, so I’m willing to use more memory to segregate those allocations”, and do something akin to:

void *my_cool_malloc(size_t sz, type_descriptor descriptor) {
   // pseudo code, lets just pretend descriptor is a trivially accessible struct right now
   if ((descriptor.type_semantics & tmo_type_semantics_is_polymorphic) == 0)
     return  internal_alloc(default_zone, sz);
   alloc_zone *zone = polymorphic_zones[descriptor.hash & ZONE_COUNT];
   return internal_alloc(zone, sz);
}

This kind of logic is not really possible with type name based segregation, and for performance reasons it’s also not really feasible to pass a general metadata object (std::type_info_extreme_addition?) and have the allocator build the information at runtime, at least in the general case.

ilovepi · August 14, 2024, 12:54am

First, let me start by saying that I’m a proponent of having configurable support for memory segregation, whether that’s by via the type system or another kind of analysis. However, I think that there’s a very wide set of security tradeoffs that different projects will need to make, and I’m not totally convinced that the proposal you’re putting forth is configurable enough to support those needs.

I’m also not sure that I agree that the structural type is more important than the named type, at least if your motivation is security. While the example you’ve provided demonstrates something you can’t do w/ type metadata, you also can’t support users who do want every type segregated. You’re right that this is a tradeoff, but its hard to make any kind of judgment about such a tradeoff without a security model to compare it against. Note: I’m not suggesting that you formalize one here. Ultimately, I think we can all agree that it’s always going to be hard to define a format/protocol that can work with everyone’s constraints, particularly in a field as sensitive as security.

One question I have, though, is why its better to push this information into the allocator, and not just have the compiler decide which buckets to use(e.g. pick an ID/type for the allocation)? The compiler has access to far richer data than can be conveyed via the ABI you’ve described, and could handle mapping equivalent types into the same bucket, right? That would allow you to very easily extend the allocator, without worrying about any complex logic.

But the approach I’d actually prefer is to make the ABI configurable and use different codegen/ABI switches to choose the encoding, based on your project’s needs. I think this is rather similar to something like the Shadow Call Stack, in that you need some level of cooperation between the Toolchain and runtime to have it work. For allocators that want a fine grained approach, they can communicate that with backend flags(perhaps reusing the CFI metadata), but for other allocators they can choose something like the ABI you’ve described, or just an opaque ID to bucket things. For this last approach, I’d hazard that no mater the encoding, you could use the CFI metadata until you actually need to generate the encoding. There’s probably some gatchas here I havent considered, but I think theres some value in considering whether these are good tradeoffs.

waffles_the_dog · August 22, 2024, 7:49pm

[sigh, I apologize for the lack of response, I was waiting for a reply from you and only just realized that I had once again left an unposted draft]

However, I think that there’s a very wide set of security tradeoffs that different projects will need to make, and I’m not totally convinced that the proposal you’re putting forth is configurable enough to support those needs.

Agreed that there are trade offs, and this is kind irksome in just how fixed those trade offs are. That’s why the C++ proposal just provides the type as a template parameter rather than a magic “descriptor”

I’m also not sure that I agree that the structural type is more important than the named type, at least if your motivation is security. While the example you’ve provided demonstrates something you can’t do w/ type metadata, you also can’t support users who do want every type segregated.

I agree here, there are cases where complete segregation makes sense, but as we are focused on system level allocation structural vs named type is important for security and performance matters (in essence a system allocator cannot afford to separate every single type due to resource overheads, so some degree of bucketing will happen, for that reason the structural type and properties are more important as you want to avoid collisions between critical types). However I recognize the issue you’re raising in that what matters for a system allocator (malloc, etc) is different from what would matter to a specific project (say a special purpose daemon, where the number of distinct types being allocated is low, but the security ramifications are high).

One question I have, though, is why its better to push this information into the allocator, and not just have the compiler decide which buckets to use(e.g. pick an ID/type for the allocation)? The compiler has access to far richer data than can be conveyed via the ABI you’ve described, and could handle mapping equivalent types into the same bucket, right?

The compiler does not have this information, because the compiler does not know the constraints of the system or the process that it is actually running under. The number and degree of bucketing (or even segregation at all) under this model is entirely at the discretion of the allocator, and it can be different from process to process. There’s also an element of how many unique types there are: for instance the webkit typed segregating allocator will change how much it segregates different types on the fly based on allocation patterns because otherwise the memory overhead is too high.

At compile time the compiler cannot know the exact allocation behavior of a program, and cannot know all the types being allocated (because we only see a single TU at a time, and even if we were compiling all the sources as a single TU we can’t see what libraries are doing).

But the approach I’d actually prefer is to make the ABI configurable and use different codegen/ABI switches to choose the encoding, based on your project’s needs. I think this is rather similar to something like the Shadow Call Stack, in that you need some level of cooperation between the Toolchain and runtime to have it work. For allocators that want a fine grained approach, they can communicate that with backend flags(perhaps reusing the CFI metadata), but for other allocators they can choose something like the ABI you’ve described, or just an opaque ID to bucket things.

I think the issue here is that the proposal as currently present assumes that because a developer cannot actually provide code to operate over the allocated types that there has to be a single definition of what the type descriptor looks like.

I’m wondering if we could do something a little more customizable by permitting a “schema” be specified, something like

typed_memory_operation(typed_malloc, 1, {descriptor,cfi,type_info,etc})

which would allow a bit more developer control over what information they were being provided with, while still working in the context of C’s inability to operate over types. There would still be questions of how to convey information about compound and VLA types in such a world but at least the choice would be available.

ilovepi · August 22, 2024, 11:44pm

I agree here, there are cases where complete segregation makes sense, but as we are focused on system level allocation structural vs named type is important for security and performance matters (in essence a system allocator cannot afford to separate every single type due to resource overheads, so some degree of bucketing will happen, for that reason the structural type and properties are more important as you want to avoid collisions between critical types). However I recognize the issue you’re raising in that what matters for a system allocator (malloc, etc) is different from what would matter to a specific project (say a special purpose daemon, where the number of distinct types being allocated is low, but the security ramifications are high).

I’m not sure I agree w/ your view on what’s appropriate for a system allocator, but my concern is baking in an inflexible ABI w/o a way to opt into a different one that is more appropriate for the use case at hand.

The compiler does not have this information, because the compiler does not know the constraints of the system or the process that it is actually running under. The number and degree of bucketing (or even segregation at all) under this model is entirely at the discretion of the allocator, and it can be different from process to process. There’s also an element of how many unique types there are: for instance the webkit typed segregating allocator will change how much it segregates different types on the fly based on allocation patterns because otherwise the memory overhead is too high.

The compiler can know constraints and relevant information (like type info) that the allocator cannot. Typically, the systems I’ve seen have used the compiler to perform some analysis based on dataflow or reachability, and partitioned memory that way. We’ve seen this a lot in the SFI space in particular. In these systems the allocator is providing some guarantees about how its going to behave (e.g. providing some secure memory regions, or using hardware protections), but it just trusts the compiler to identify how to segregate things and does precisely that. The allocator isn’t really in a position to know any better than the compiler about the runtime constraints of the system its running on, and if there is some kind of cooperation and room for the allocator to optimize layout or memory usage, I’d view that as an orthogonal constraint.

If you want type based segregation based on layout vs. the language’s view of types, then it is simple for the compiler to solve by using a different algorithm/ABI to assign the IDs. If the allocator may not honor the segregation, then I don’t even see the point of this proposal. I can’t imagine choosing an allocator whose touting a security strategy that may – at its discretion – introduce a vulnerability because memory pressure was high. That may sound a bit strong, but I don’t know a lot of security engineers who like parts of the system potentially changing a property they rely on. Having written all that, I feel like this part of the discussion is getting out of scope, so while I’m sure we could debate this point at length, I think we can just leave this bit as: “we disagree on the role of the allocator and properties”. I’m happy to discuss this more w/ you in another forum, but I think this bit of the discussion isn’t on topic enough to keep going here where its likely to distract from your proposal.

At compile time the compiler cannot know the exact allocation behavior of a program, and cannot know all the types being allocated (because we only see a single TU at a time, and even if we were compiling all the sources as a single TU we can’t see what libraries are doing).

Compiling separate translation units could be solved with metadata or through LTO. The library limitation exists in all scenarios: your secure allocator has the same issue and precompiled libraries are unlikely to even use the special allocator API with additional type info. On systems that will ship libraries that wish to cooperate w/ the allocation strategy, metadata embedded in the object can solve the problem, perhaps with additional linker support in fixing up/generating the type IDs.

I think the issue here is that the proposal as currently present assumes that because a developer cannot actually provide code to operate over the allocated types that there has to be a single definition of what the type descriptor looks like.

I’m wondering if we could do something a little more customizable by permitting a “schema” be specified, something like
typed_memory_operation(typed_malloc, 1, {descriptor,cfi,type_info,etc})
which would allow a bit more developer control over what information they were being provided with, while still working in the context of C’s inability to operate over types. There would still be questions of how to convey information about compound and VLA types in such a world but at least the choice would be available.

This is more or less what I’m getting at: there needs to be room to choose different strategies and tradeoffs in the ABI. Whether that comes from a more complex descriptor, an ELF attribute, a compiler flag, or some other mechanism, kind of doesn’t matter. What matters is that the community of people using this across a variety of platforms are going to have to maintain and workaround this new ABI, and it will be a shame if its too inflexible to accommodate the wide variety of requirements. At the same time, I understand there’s a risk of making it tool general, and it loosing functionality/performance/other key property.

To sum up: you’re proposing a potentially long lived ABI change, and I’m concerned that the current plan is too narrowly focused. There’s a lot of folks in this community that can attest to missteps in the ABI for several languages and even hardware specifications, and we probably want to hear from them in more depth before moving forward.

But I don’t want my critique to discourage you. I’m pretty supportive of the general direction of the proposal, and with a few tweaks I think a lot more folks would feel like this is something they’re willing to get behind, too. I haven’t seen too much community engagement in the thread, so its a bit hard for me to judge if they’re opposed, supportive, or just indifferent. I’d hazard its the last, and I hope we can get a least a few more folks engaged here to discuss this a bit more.

Topic		Replies	Views
[RFC] Experimental implementation of P2719 - Type-aware allocation and deallocation functions Clang Frontend rfc , clang	3	308	January 16, 2025
RFC: Enforcing Bounds Safety in C (-fbounds-safety) Clang Frontend	179	31546	January 28, 2025
[RFC] Lifetime annotations for C++ Clang Frontend	90	56329	August 23, 2023
[RFC] Adding support for marking allocator functions in LLVM IR LLVM Dev List Archives	18	284	January 7, 2022
[RFC] Attributes for Allocator Functions in LLVM IR IR & Optimizations	26	1940	July 26, 2022

[RFC] Typed allocator support

Summary

Motivation

Programming model

General Usage

Attribute semantics

Type inference and type descriptors

Rewrite target ABI and semantics

Portability with toolchains that do not support the extension

Implementation

ABI

Builtin support

Limitations

Future directions

Citations

Related topics