I recognized that compiler-rt's the implementation of __aeabi_memcpy simply
branches to memcpy.
The implementation of memcpy is not provided. So an externally provided
memcpy () has to be used.
(also applies to memmove, memset, memclr)
In a nutshell, Compiler-RT may assume there is a C library underneath.
This is the same for LibGCC
(Libgcc (GNU Compiler Collection (GCC) Internals)):
"GCC will also generate calls to C library routines, such as
memcpy and memset, in some cases."
This also works on free-standing environments (ex. the Linux kernel)
because those environments assume the compiler library will do so, and
thus implement "memcpy", "memset", etc.
On ARM I have seen implementations of memcpy () using floating-point
registers (if compiled with NEON support). The is perfectly
legal, as memcpy () only needs to comply with the Procedure Call Standard.
According to this d0-d7, d16-d31 are scratch registers.
RT *could* have some optimised versions of those functions, but as you
can imagine, this is not a trivial pursuit. One would need to take
into account all the variations, alignment, ISA support and
micro-architecture differences, and that's a very large project, even
if we just consider ARM targets.
Given that most C libraries have had that consideration already, and
have done more tests (conformance and performance) than we have, it's
safe to assume that whatever the C libraries did is probably better
(average case) than any "smart" thing we can conjure in a weekend.
The situation is slightly different for __aeabi_memcpy (). The ABI spec
explicitly states: "In general, implementations of these
functions are allowed to corrupt only the integer core registers permitted
to be corrupted by the [AAPCS] (r0-r3, ip, lr, and CPSR)."
This is standard AAPCS, I trully hope neither glibc, newlib or musl's
implementations will corrupt anything more.
newlib addresses this by explicitly providing a separate implementation for
__aeabi_memcpy () and memcpy ().
But things can get messed up if compiler-rt's __aeabi_memcpy () is actually
Indeed, this is the real problem (not the FP clobbering, the two
libc's versions). This will be a linker nightmare, and could break the
programmers assumption by using a specific C library.
Furthermore, EABI's implementation returns void, while the C standard
version returns the pointer, so they're not completely replaceable for
So, it's safe to use memcpy for all calls to __eabi_memcpy (modulo
AAPCS bugs, performance), but the other way around is not.
The implementation of __aeabi_memcpy () in ARM's compiler 5 tool-chain also
uses floating-point registers - but preserves the contents.
So this one is ABI compliant.
Any function that is AAPCS compliant *must* preserve all but the AAPCS
registers. If one doesn't, it's not Compiler-RT's fault to assume so.
My conclusion is that the current implementation of compiler-rt can
potentially introduce difficult to track down problems.
Unless of course LLVM always can handle corrupted floating-point registers
for all calls to __aeabi_memcpy ().
Either the aeabi variants of memcpy, memmove, memset, memclr should be fixed
or the stubs should removed from compiler-rt.
I agree. But it's not that simple.
Some environments (ex. Linux kernel, Android, FreeBSD?, OSX?) depend
on this behaviour, and as much as I agree with you that this is a
broken assumption, we can't just remove it from RT.
To make matters more complicated, different environments will update
RT at different times, and any change we do will have impact for years
to come, and will be hard to fix in the best way for everyone, quickly
I believe one quick way to "fix" the multiple-versions is to mark RT's
version as weak (or equivalent), so that the C library's (or
free-standing's) version always gets picked first. But it may be an
over simplification from my part...
I may also be underestimating the register clobbering problem, so if
you could give me a concrete example where it's ok for the library to
corrupt non-AAPCS registers, we could further discuss it.
The only case I remember in GCC, was that it was spilling to VFP
registers (before stack) when they existed in the platform, but this
was a bad idea and was reverted promptly, because it was breaking
stack unwinding, context-switching, etc.