TL;DR:
Defining memory functions in C / C++ results in a chicken and egg problem. Clang can mutate the code into semantically equivalent calls to libc. None of -fno-builtin-memcpy
, -ffreestanding
nor -nostdlib
provide a satisfactory answer to the problem.
Goal
Create libc’s memory functions (aka memcpy
, memset
, memcmp
, …) in C++ to benefit from compiler’s knowledge and profile guided optimizations.
Current state
LLVM is allowed to replace a piece of code that looks like a memcpy with an IR intrinsic that implements the same semantic, namely call void @llvm.memcpy.p0i8.p0i8.i64
(e.g. https://godbolt.org/z/0y1Yqh).
This is a problem when designing a libc’s memory function as the compiler may choose to replace the implementation with a call to itself (e.g. https://godbolt.org/z/eg0p_E)
Using -fno-builtin-memcpy
prevents the compiler from understanding that an expression has memory copy semantic, effectively removing @llvm.memcpy
at the IR level : https://godbolt.org/z/lnCIIh. In this specific example, the vectorizer kicks in and the generated code is quite good. Unfortunately this is not always the case: https://godbolt.org/z/mHpAYe.
In addition -fno-builtin-memcpy
prevents the compiler from understanding that a piece of code has the memory copy semantic but does not prevent the compiler from generating calls to libc’s memcpy
, for instance:
Using __builtin_memcpy
: https://godbolt.org/z/O0sjIl
Passing big structs by value: https://godbolt.org/z/4BUDc0
In both cases, the generated @llvm.memcpy
IR intrinsic is lowered into a libc memcpy
call.
We would like to use __builtin_memcpy
to communicate the semantic to the compiler but prevent it from generating calls to the libc.
One could argue that this is the purpose of -ffreestanding
but the standard leaves a lot of freestanding requirements implementation defined ( see https://en.cppreference.com/w/cpp/freestanding ).
In practice, making sure that -ffreestanding
never calls libc memory functions will probably do more harm than good. People using -ffreestanding
are now expecting the compiler to call these functions, inlining bloat can be problematic for the embedded world ( see comments in https://reviews.llvm.org/D60719 )
Proposals
We envision two approaches: an attribute to prevent the compiler from synthesizing calls or a set of builtins to communicate the intent more precisely to the compiler.
- A function/module attribute to disable synthesis of calls
1.1 A specific attribute to disable the synthesis of a single call
attribute((disable_call_synthesis(“memcpy”)))
Question: Is it possible to specify the attribute several times on a function to disable many calls?
1.2 A specific attribute to disable synthesis of all libc calls
attribute((disable_libc_call_synthesis))
With this one we are losing precision and we may inline too much. There is also the question of what is considered a libc function, LLVM mainly defines target library calls.
1.3 Stretch - a specific attribute to redirect a single synthesizable function.
This one would help explore the impact of replacing a synthesized function call with another function but is not strictly required to solve the problem at hand.
attribute((redirect_synthesized_calls(“memcpy”, “my_memcpy”)))
- A set of builtins in clang to communicate the intent clearly
__builtin_memcpy_alwaysinline(…)
__builtin_memmove_alwaysinline(…)
__builtin_memset_alwaysinline(…)
To achieve this we may have to provide new IR builtins (e.g. @llvm.alwaysinline_memcpy
) which can be a lot of work.