(I previously brought this up on the mailing list, with some good comments, but things have evolved enough since that thread I’m starting a fresh conversation)
Background
Rust uses LLVM for codegen, and has its own allocator functions. In order for
LLVM to correctly optimize out allocations we have to tell the optimizer about
the allocation/deallocation functions used by Rust, currently in a patch[0] that
Rust upstream is uncomfortable landing (more on that below). In particular, we
need to:
-
Support replacement of allocation with an SSA value when the address of an
allocation doesn’t escape (other than to the deallocator). This requires
being able to recognize the allocation call (and its kind, alloc or
realloc), know what argument to a deallocator call is the freed pointer, and
know that the allocation and deallocation functions match. We also have to
know that the allocation is one that is permissible to remove (explicit
calls to C++'s new operator are not removable, as an example). If we can
know the initial contents of the allocation (e.g. calloc) this optimization
can apply even when the pointer may be read from prior to being freed. We
need to know all the properites of the allocator function and its arguments
except allocation alignment (as the address ceases to exist) to perform this
optimization. -
Heap-to-stack conversion: when the allocation’s address does escape but
we’re certain that it is freed in this function, rather than by a callee or
caller we can convert the allocation into a stack allocation. In this case
we need to know everything above, plus the alignment of the allocation. If
the alignment is specified as a parameter, we can mark that parameter with
an attribute and permit this optimization. Note that for a function with an
implied alignment like malloc we can’t determine the alignment, nor can we
trust thealign
attribute because that is semantically an _under_estimate
of the alignment returned by the function. Rather, we should be using an
_over_estimate of the alignment promised to users instead. This bug is
present in the existing heap-to-stack logic in the Attributor, and is not
addressed or affected by this proposal. Fortunately, the heap2stack logic in
the Attributor is not included in the set of default passes (though it does
run for OpenMP code.) -
Heap-to-global conversion: this has no additional requirements beyond what
we’ve enumerated above for heap-to-stack. Note that the alignment defect described in (2) for H2S also appears to happen for H2G, but as best I can tell this hasn’t been a problem because of luck. Fixing the defect is not planned in my work, but the additional information in the IR may make such future work easier in the future.
Rust has tests to ensure it benefits from (1), and I don’t think the rest of the optimizations are significant for them. The other two constraints are so that we don’t paint ourselves into a corner with the current types of relevant optimizations. Languages supported by Clang, such as C and C++, have stable symbol names for their allocation functions, which are hardcoded in LLVM[1][2]. Unfortunately, this strategy does not work for Rust, where developers don’t want to commit to a particular symbol name and calling convention yet.
Proposal
We add several attributes to LLVM IR:
-
allockind(KIND)
: Marks a function as being a specific allocation function
kind with various behaviors.KIND
is a bit mask that can describe a
function as returning a new allocation, a new zeroed allocation, resizing an
allocation, or freeing an allocation. Specifically, the entries in the bit
mask would be:- new: the function returns a new allocation
- resize: the function resizes the given allocation
- free: the function frees the given allocation
- uninit: newly allocated memory is uninitialized
- zeroed: newly allocated memory is zeroed
- aligned: newly allocated memory is aligned based on the
allocalign
parameter
One of the first three should be specified, and then the remaining values
are flags that can apply to the function types in the first part of the
mask. -
"alloc-family"="FAMILY"
: Marks a function as part of an allocator family,
named by the “primary” allocation function (e.g."alloc-family"=“malloc”
,
"alloc-family"=“_Znwm”
, or"alloc-family"=“__rust_alloc”
). -
allocalign
: Marks a function argument as specifying the minimum alignment
of the returned allocation. -
allocatedptr
: Marks a function argument as the pointer that the allocator
will manipulate. The behavior is dependent on the next attribute.
These attributes, combined with the existing allocsize(n[, m])
attribute lets
us annotate alloc, realloc, and free type functions in LLVM IR, which relieves
Rust of the need to carry a patch to describe its allocator functions to LLVM’s
optimizer. Some example IR of what this might look like:
; Function Attrs: nounwind ssp
define i8* @test5(i32 %n) #4 {
entry:
%0 = tail call noalias dereferenceable_or_null(20) i8* @malloc(i32 20) #8
%1 = load i8*, i8** @s, align 8
call void @llvm.memcpy.p0i8.p0i8.i32(i8* noundef nonnull align 1 dereferenceable(10) %0, i8* noundef nonnull align 1 dereferenceable(10) %1, i32 10, i1 false) #0
ret i8* %0
}
attributes #8 = { nounwind allocsize(0) "alloc-family"="malloc" allockind(new,uninit) }
Similarly, the call free(foo)
would get the attributes free(allocatedptr foo)
/alloc-family=”malloc” allockind(free)
and realloc(foo, N)
gets
releaseptr(allocatedptr foo, N)
/"alloc-family"=”malloc” allocsize(1) allockind(resize)
.
Alternatives
Hard-code the Rust function names in LLVM
We don’t really want to do this because the function signatures have changed on
the Rust side a couple of times, and since these are emitted by the compiler
there’s no requirements that they be versioned today. Adding them to LLVM as a
permanent thing would necessitate versioning the symbol names. In fact, when I
started this researching this line of work, it was noticed that __rust_realloc’s
signature had changed and due to lack of test coverage nobody had yet noticed.
Don’t Add allockind()
This was the original approach, but without allockind, we were attempting to
detect whether a call was an alloc, aligned_alloc, realloc, or free kind based
on whether e.g. allocatedptr or allocalign appeared anywhere on the call. As
argument attributes – or entire arguments – may be removed by optimization
passes like dead-arg elimination, this was not correct."
Further, we need to express some extra information on return values (eg memory
is uninitialized or zeroed) that we can’t express without additional
information. Today that information is implicitly contained in the hard-coded
list in MemoryBuiltins.cpp
but if we start using attributes to communicate
these behaviors we need to be more circumspect to avoid requiring all allocators
to behave exactly like the standard C allocator functions.
Benefits
In addition to the benefits for Rust, the LLVM optimizer could also be improved
to not optimize away defects like
{
auto *foo = new Thing();
free(foo);
}
which would then correctly crash instead of silently “working” until something
actually uses the allocation. Similarly, there’s a potential defect when only
one side of an overridden operator::new and operator::delete is visible to the
optimizer and inlineable, which can look indistinguishable from the above after
inlining.
We can also remove most knowledge of C++ allocator functions from LLVM entirely,
and instead rely on the attributes to communicate that information to LLVM.
Status
I have mostly implemented this already: some of the dead-ends mentioned in the alternatives section are a direct result of previous failed attempts. I’ll be mailing out the changes in reasonably-sized batches as they’re ready, unless I hear objections.
[0]
https://github.com/rust-lang/llvm-project/commit/b1f55f7159540862c407a2d89d49434ce65892e5
[1]
https://github.com/llvm/llvm-project/blob/cd5f582c3dd747ab97b57df37642b0dffba398ee/llvm/lib/Analysis/MemoryBuiltins.cpp#L73
[2]
https://github.com/llvm/llvm-project/blob/cd5f582c3dd747ab97b57df37642b0dffba398ee/llvm/lib/Analysis/MemoryBuiltins.cpp#L433