Defining memory functions in C / C++ results in a chicken and egg problem. Clang can mutate the code into semantically equivalent calls to libc. None of
-nostdlib provide a satisfactory answer to the problem.
Create libc’s memory functions (aka
memcmp, …) in C++ to benefit from compiler’s knowledge and profile guided optimizations.
LLVM is allowed to replace a piece of code that looks like a memcpy with an IR intrinsic that implements the same semantic, namely
call void @llvm.memcpy.p0i8.p0i8.i64 (e.g. https://godbolt.org/z/0y1Yqh).
This is a problem when designing a libc’s memory function as the compiler may choose to replace the implementation with a call to itself (e.g. https://godbolt.org/z/eg0p_E)
-fno-builtin-memcpy prevents the compiler from understanding that an expression has memory copy semantic, effectively removing
@llvm.memcpy at the IR level : https://godbolt.org/z/lnCIIh. In this specific example, the vectorizer kicks in and the generated code is quite good. Unfortunately this is not always the case: https://godbolt.org/z/mHpAYe.
-fno-builtin-memcpy prevents the compiler from understanding that a piece of code has the memory copy semantic but does not prevent the compiler from generating calls to libc’s
memcpy, for instance:
Passing big structs by value: https://godbolt.org/z/4BUDc0
In both cases, the generated
@llvm.memcpy IR intrinsic is lowered into a libc
We would like to use
__builtin_memcpy to communicate the semantic to the compiler but prevent it from generating calls to the libc.
One could argue that this is the purpose of
-ffreestanding but the standard leaves a lot of freestanding requirements implementation defined ( see https://en.cppreference.com/w/cpp/freestanding ).
In practice, making sure that
-ffreestanding never calls libc memory functions will probably do more harm than good. People using
-ffreestanding are now expecting the compiler to call these functions, inlining bloat can be problematic for the embedded world ( see comments in https://reviews.llvm.org/D60719 )
We envision two approaches: an attribute to prevent the compiler from synthesizing calls or a set of builtins to communicate the intent more precisely to the compiler.
- A function/module attribute to disable synthesis of calls
1.1 A specific attribute to disable the synthesis of a single call
Question: Is it possible to specify the attribute several times on a function to disable many calls?
1.2 A specific attribute to disable synthesis of all libc calls
With this one we are losing precision and we may inline too much. There is also the question of what is considered a libc function, LLVM mainly defines target library calls.
1.3 Stretch - a specific attribute to redirect a single synthesizable function.
This one would help explore the impact of replacing a synthesized function call with another function but is not strictly required to solve the problem at hand.
- A set of builtins in clang to communicate the intent clearly
To achieve this we may have to provide new IR builtins (e.g.
@llvm.alwaysinline_memcpy) which can be a lot of work.