compler-rt, __aeabi_memcpy () possibly broken (ARM)

Hello,

I recognized that compiler-rt's the implementation of __aeabi_memcpy simply branches to memcpy.
The implementation of memcpy is not provided. So an externally provided memcpy () has to be used.
(also applies to memmove, memset, memclr)

On ARM I have seen implementations of memcpy () using floating-point registers (if compiled with NEON support). The is perfectly
legal, as memcpy () only needs to comply with the Procedure Call Standard. According to this d0-d7, d16-d31 are scratch registers.

The situation is slightly different for __aeabi_memcpy (). The ABI spec explicitly states: "In general, implementations of these
functions are allowed to corrupt only the integer core registers permitted to be corrupted by the [AAPCS] (r0-r3, ip, lr, and CPSR)."

newlib addresses this by explicitly providing a separate implementation for __aeabi_memcpy () and memcpy ().

But things can get messed up if compiler-rt's __aeabi_memcpy () is actually used. (with any memcpy () that clobbers flating-point
registers.)

The implementation of __aeabi_memcpy () in ARM's compiler 5 tool-chain also uses floating-point registers - but preserves the contents.
So this one is ABI compliant.

My conclusion is that the current implementation of compiler-rt can potentially introduce difficult to track down problems.
Unless of course LLVM always can handle corrupted floating-point registers for all calls to __aeabi_memcpy ().

Either the aeabi variants of memcpy, memmove, memset, memclr should be fixed or the stubs should removed from compiler-rt.

Or am I getting it all wrong?

Peter

I recognized that compiler-rt's the implementation of __aeabi_memcpy simply
branches to memcpy.
The implementation of memcpy is not provided. So an externally provided
memcpy () has to be used.
(also applies to memmove, memset, memclr)

Hi Peter,

In a nutshell, Compiler-RT may assume there is a C library underneath.
This is the same for LibGCC
(Libgcc (GNU Compiler Collection (GCC) Internals)):

    "GCC will also generate calls to C library routines, such as
memcpy and memset, in some cases."

This also works on free-standing environments (ex. the Linux kernel)
because those environments assume the compiler library will do so, and
thus implement "memcpy", "memset", etc.

On ARM I have seen implementations of memcpy () using floating-point
registers (if compiled with NEON support). The is perfectly
legal, as memcpy () only needs to comply with the Procedure Call Standard.
According to this d0-d7, d16-d31 are scratch registers.

RT *could* have some optimised versions of those functions, but as you
can imagine, this is not a trivial pursuit. One would need to take
into account all the variations, alignment, ISA support and
micro-architecture differences, and that's a very large project, even
if we just consider ARM targets.

Given that most C libraries have had that consideration already, and
have done more tests (conformance and performance) than we have, it's
safe to assume that whatever the C libraries did is probably better
(average case) than any "smart" thing we can conjure in a weekend.

The situation is slightly different for __aeabi_memcpy (). The ABI spec
explicitly states: "In general, implementations of these
functions are allowed to corrupt only the integer core registers permitted
to be corrupted by the [AAPCS] (r0-r3, ip, lr, and CPSR)."

This is standard AAPCS, I trully hope neither glibc, newlib or musl's
implementations will corrupt anything more.

newlib addresses this by explicitly providing a separate implementation for
__aeabi_memcpy () and memcpy ().
But things can get messed up if compiler-rt's __aeabi_memcpy () is actually
used.

Indeed, this is the real problem (not the FP clobbering, the two
libc's versions). This will be a linker nightmare, and could break the
programmers assumption by using a specific C library.

Furthermore, EABI's implementation returns void, while the C standard
version returns the pointer, so they're not completely replaceable for
every case.

So, it's safe to use memcpy for all calls to __eabi_memcpy (modulo
AAPCS bugs, performance), but the other way around is not.

The implementation of __aeabi_memcpy () in ARM's compiler 5 tool-chain also
uses floating-point registers - but preserves the contents.
So this one is ABI compliant.

Any function that is AAPCS compliant *must* preserve all but the AAPCS
registers. If one doesn't, it's not Compiler-RT's fault to assume so.

My conclusion is that the current implementation of compiler-rt can
potentially introduce difficult to track down problems.
Unless of course LLVM always can handle corrupted floating-point registers
for all calls to __aeabi_memcpy ().

Either the aeabi variants of memcpy, memmove, memset, memclr should be fixed
or the stubs should removed from compiler-rt.

I agree. But it's not that simple.

Some environments (ex. Linux kernel, Android, FreeBSD?, OSX?) depend
on this behaviour, and as much as I agree with you that this is a
broken assumption, we can't just remove it from RT.

To make matters more complicated, different environments will update
RT at different times, and any change we do will have impact for years
to come, and will be hard to fix in the best way for everyone, quickly
enough.

I believe one quick way to "fix" the multiple-versions is to mark RT's
version as weak (or equivalent), so that the C library's (or
free-standing's) version always gets picked first. But it may be an
over simplification from my part...

I may also be underestimating the register clobbering problem, so if
you could give me a concrete example where it's ok for the library to
corrupt non-AAPCS registers, we could further discuss it.

The only case I remember in GCC, was that it was spilling to VFP
registers (before stack) when they existed in the platform, but this
was a bad idea and was reverted promptly, because it was breaking
stack unwinding, context-switching, etc.

cheers,
--renato

Normal AAPCS functions are allowed to corrupt d0-d7 and d16-d31, but
the RTABI seems pretty explicit that these functions aren't. I think
we've got a real problem here.

Cheers.

Tim.

Hum... (went to read the AAPCS again), right.

I'm guessing we haven't got that problem yet because no one uses those
registers or they save/restore. But this is a potential problem, yes.

On free-standing environments (Linux), Vinicius fixed the need for
__aeabi_mempcy to point to memcpy, so we could remove RT's copy if
it's only about Linux (and possibly Android's bionic). But I don't
know about BSD or OSX.

However, implementing a crude version of those functions in RT is not
a solution, as we're bound to get many things wrong for a long time.

cheers,
--renato

I'm not using it in NetBSD at least, __aeabi_memcpy and co are provided
by libc. That said, it is just an alias for memcpy in libc either and
not using FP.

Joerg

Renato,

thanks for your reply. However, I'm not sure whether I agree to your conclusions.

Let me try to show why:

The situation is slightly different for __aeabi_memcpy (). The ABI spec
explicitly states: "In general, implementations of these
functions are allowed to corrupt only the integer core registers permitted
to be corrupted by the [AAPCS] (r0-r3, ip, lr, and CPSR)."

This is standard AAPCS, I trully hope neither glibc, newlib or musl's
implementations will corrupt anything more.

No, this is *not* standard AAPCS. "integer core registers" makes the difference here!
It is rather the base standard AAPCS in ARM's nomenclature.

A normal AAPCS compliant C-function (like, let's say memcpy) can also corrupt the
floating-point registers D0-D7, D16-D31.
(see: Procedure Call Standard for the ARM Architecture, section 5.1.2.1)
And that's exactly what some implementations of memcpy do.
These scratch floating-point registers just come in handy to transfer 32 or 64 bytes at once!

__aeaby_memcpy () however does not have this freedom. It is bound by the "Run-time ABI
for the ARM Architecture". And this restricts it to corrupt core integer registers only.
It must preserve the contents of all floating-point registers. Therefor it can not simply
branch to an externally provided memcpy, without knowing how that is implemented.

That means it is safe to use __eabi_memcpy () for calls to memcpy (), but not the other
way around. (contrary to what you concluded)

That's why I think we have a problem here.

Now I'm gonna open a can of worms:

There are more instances of this restriction for other functions, like long long helper
functions, the one-time construction API and such.
In case of ldivmod and udivmod it is explicitly stated that these *have* full [AAPCS] privileges.

Functions written in assembly should be safe, but c functions are a risk.
compiler-rt seems to use __attribute__((pcs("aapcs"))) for functions that are implemented in C.
That still does not guarantee that no floating-point registers are corrupted!

Just compile this function and you will see what I mean:

   __attribute__((pcs("aapcs"))) double fn_aapcs (double a, double b) {return (a+b);}

There are quite some candidates that might violate the Run-Time ABI rules.

The only solution that I see is that LLVM must treat all eabi functions as normal functions.
It should assume floating-point scratch registers could be corrupted even by functions that
are defined not to do so.
And LLVM must never try to be smart and use float registers for no-float code, and be it
as handy scratch registers for inlined copy operations and such.

Even if compiler-rt did not have a problem, I have seen too many libraries implement these
functions in C. The one-time construction API is an example (__cxa_guard_acquire and friends).
These functions are typically written in C and call external functions. (mutexes, sleeping)
If any of these functions messes any float registers, then things are gonna be fucked up.

Hopefully LLVM simply treats eabi functions as normal C-functions regardless of the Run-time
ABI restrictions. And LLVM never touches floating-point registers - not even the scratch registers
unless float operations are performed.
Then things might work. Someone should mark the relevant code in LLVM, so that this is not
changed in the future - ever.

What really makes we sweat is that if there is a problem, then it will only bite under obscure
circumstances and will be very hard to identify.

Please, please tell me I'm wrong and overlooked something completely.

Peter

I'm guessing we haven't got that problem yet because no one uses those
registers or they save/restore. But this is a potential problem, yes.

The reason we probably don't have a problem is that LLVM does not seem to
assume the float registers remain unchanged when it generates a call to
__aeabi_memcpy. Even though it could.

That way it doesn't matter whether memcyp is called (and corrupts these registers).

As long as we can guarantee this is always the case, there won't be a problem
with LLVM, compiler-rt. Can we guarantee?

BTW, newlib's memcpy () is an example that corrupts float registers. So we can't
assume these do not exist.

Cheers,

Peter

No, this is *not* standard AAPCS. "integer core registers" makes the
difference here! It is rather the base standard AAPCS in ARM's nomenclature.

Hi Peter,

I replied to Tim earlier today, you are correct. AAPCS protects some,
not all D registers.

For some reason, I assumed AAPCS would protect them all, and AAPCS_VFP
would allow you to scratch D0~D7. My bad.

As long as we can guarantee this is always the case, there won't be a
problem with LLVM, compiler-rt. Can we guarantee?

I don't think we can (nor should) guarantee this, but we can keep it
that way until whatever fix is accepted across all platforms.

BTW, newlib's memcpy () is an example that corrupts float registers. So we
can't assume these do not exist.

I didn't assume they don't exist, just that they wouldn't break the
current (bad) model.

It seems that my assumption that systems used them is outdated by at
least 2 years. Also, with Vinicius patch last year, we made sure that
they're no longer needed. So I think the proposal to remove them is
looking more and more attractive.

Maybe Anton could comment on why he added, so we can extend our search
for whomever is using and fix it in the source.

cheers,
--renato