I’m starting this thread in response to Monthly LLVM libc meeting - #6 by michaelrj-google (I can’t attend the meeting so let’s discuss this in an RFC)
- -ffreestanding
- Need to talk to Guillaume (gchatelet)
- The original reason for the flag was to avoid the compiler calling builtin memcpy inside memcpy
- -fno-builtin prevents the memcpy inlining, but it also prevents inlining in later, and also causes issues with LTO
- LTO and inlining are very important in GPU builds, but also anyone else who is building from source
- Original patch: D74162 [Inliner] Inlining should honor nobuiltin attributes
- It would be nice to have a reproducer so it can be checked if the problem is fixed
I totally agree that these options ( -ffreestanding
/ -fno-builtin
) prevent almost all inlining possibilities between application code and the libc. This is not desirable in the long run.
Now, the concern about the compiler turning memcpy
code into a call to memcpy
is not restricted to this particular function. The compiler is allowed to turn all code that looks like a libc function into a call to that function
e.g. GCC turning custom mystrlen
into a call to strlen
We can improve on the current situation though : -ffreestanding
implies -fno-builtin
which really prevents all optimizations.
For the production version of libc we can remove -ffreestanding
and use finer grain -fno-builtin-function
flag. That is, use -fno-builtin-memcpy
when compiling libc.src.string.memcpy
, use -fno-builtin-strlen
when compiling libc.src.string.strlen
and so on and so forth. This still prevents a lot of inlining possibilities but it’s a first step.
When the compiler is clang
, we can be even more specific and apply an attribute on specific functions when we know its body is subject to libc delegation. This would only work for clang
though but it’s a much more precise tool; effectively preventing inlining only for a handful of problematic functions.
Let’s look at a contrived example to see what I mean. The following function would need to be compiled with -fno-builtin-memset
void libc_memset(const char* ptr, char value, size_t size) {
for(size_t i=0; i < size; ++i)
ptr[i] = value;
}
But this could be rewritten into
void libc_memset_loop(const char* ptr, char value, size_t size) __attribute__((no_builtin("memset"))) {
for(size_t i=0; i < size; ++i)
ptr[i] = value;
}
void libc_memset(const char* ptr, char value, size_t size) {
if(size == 0) return;
libc_memset_loop(ptr, value, size);
}
With this second version, PGO and LTO would be able to inline the zero size shortcut at the call site if deemed necessary.
Additionally, for the specific case of memcpy
when __builtin_memcpy_inline
is available, the compiler will not be able to recognize the memcpy
semantics so we could drop the -fno-builtin-memcpy
completely.
Now, for integration tests I think it’s still important to use -ffreestanding
to make sure we don’t accidentally depend on hosted features.