compiler-rt non-neon builds are still broken

Hi Renato,

With latest trunk when I build with -mfpu=vfpv3-d16 I get:

…/projects/libcxxabi/src/Unwind/UnwindRegistersSave.S:373:8: error: invalid operand for instruction
stcl p11, cr0, [r0], {0x20} @ vstm r0, {d16-d31}
^

…/projects/libcxxabi/src/Unwind/UnwindRegistersRestore.S:394:8: error: invalid operand for instruction
ldcl p11, cr0, [r0], {0x20} @ vldm r0, {d16-d31}
^

Build works when neon is enabled but this won’t work for Marvell CPUs. Could you please have a look at this?

Thanks!

Of course $subject meant to say s/compiler-rt/libcxxabi

sigh... :slight_smile:

I'm looking into it...

cheers,
--renato

Hi again,

Build works when neon is enabled but this won't work for Marvell CPUs.
Could you please have a look at this?

sigh... :slight_smile:

I'm looking into it...

I looked around a little and converting

stcl p11, cr0, [r0], {0x20} @ vstm r0, {d16-d31}

into

stcl p11, cr0, [r0], {0x20}

fixes the error and other unwind implementations are doing something like

stcl p11, cr0, [r0], {0x20} /* vstm r0, {d16-d31} */

so this looks safe. What do you think?

Scratch that, doesn't seem to work for my cross-build setup. But seems
to work on native ARM builder. Weird.

Hi Ismail,

@ is the comment character in ARM assembly, but not always recognised? Not sure.

Anyway, we should not even try to unwind D16~D31 when you compile the
library with VFPv3-D16. If we had push/pop for .cpu and .fpu for ARM,
we could fix that one pretty quickly, but we don't.

My guess is that we should not be emitting that line at all, so
replacing __ARM_FP by __ARM_NEON would be the right thing, and only
emitting stcl when ARCH < 7 and vstmia otherwise.

If that works, it'll depend on how you compile your library, and it'd
be good to know how people do that and what do they expect to see. For
instance, I don't expect to see that function be called at all when
ARCH < 7 && !NEON.

cheers,
--renato

Hi Renato,

Scratch that, doesn't seem to work for my cross-build setup. But seems
to work on native ARM builder. Weird.

Hi Ismail,

@ is the comment character in ARM assembly, but not always recognised? Not sure.

Nope it was just bad testing. It failed on native build too.

Anyway, we should not even try to unwind D16~D31 when you compile the
library with VFPv3-D16. If we had push/pop for .cpu and .fpu for ARM,
we could fix that one pretty quickly, but we don't.

My guess is that we should not be emitting that line at all, so
replacing __ARM_FP by __ARM_NEON would be the right thing, and only
emitting stcl when ARCH < 7 and vstmia otherwise.
If that works, it'll depend on how you compile your library, and it'd
be good to know how people do that and what do they expect to see. For
instance, I don't expect to see that function be called at all when
ARCH < 7 && !NEON.

This might be true but gcc compiles this fine so at least there might
be an ASM parser bug.

Thanks,
ismail

It's not, that change was intentional... ARM is deprecating
co-processor handling where VFP/NEON instructions should be used.

That's why we need the ifdef. We just need to get that right...

--renato

Something like this ?

Index: src/Unwind/UnwindRegistersRestore.S

Scratch that, doesn't seem to work for my cross-build setup. But seems
to work on native ARM builder. Weird.

Hi Ismail,

@ is the comment character in ARM assembly, but not always recognised? Not sure.

Anyway, we should not even try to unwind D16~D31 when you compile the
library with VFPv3-D16. If we had push/pop for .cpu and .fpu for ARM,
we could fix that one pretty quickly, but we don't.

That's an interesting point. Really we should split this function in two, so the demand-saving logic saves the high regs only when an EHT entry requests it. That way you can build an unwinder that will work on both VFPv3-D16 processors, and ones with D16~D31.

My guess is that we should not be emitting that line at all, so
replacing __ARM_FP by __ARM_NEON would be the right thing, and only
emitting stcl when ARCH < 7 and vstmia otherwise.

If that works, it'll depend on how you compile your library, and it'd
be good to know how people do that and what do they expect to see. For
instance, I don't expect to see that function be called at all when
ARCH < 7 && !NEON.

It'd be nice if we could eliminate those preprocessor checks so you could link a -march=armv5t unwinder against an -mfpu=neon binary and have it work.

I added the original ones so I could build -march=armv4t, -march=thumbv4t, -march=thumbv6-m unwinders, because I couldn't figure out how to get the assembler shut up about unsupported coprocessor instructions. Is there a way to tell the assembler "yes, I really really know what I'm doing"? If so, that would be a much better solution than what we've got now.

Cheers,

Jon

That's an interesting point. Really we should split this function in two, so
the demand-saving logic saves the high regs only when an EHT entry requests
it. That way you can build an unwinder that will work on both VFPv3-D16
processors, and ones with D16~D31.

The function is already split in two, one for 0~15 and another for
16~31, but the latter still needs to be compiled successfully on all
arches.

One solution, with macros, is to not even compile for non-NEON
binaries, but people want to compile to VFPv2 and still link the
library on NEON code and work.

I particularly find that kind of programming very lazy, and coping
with the demands of lazy people is impossible when different people
are lazy for different things.

It'd be nice if we could eliminate those preprocessor checks so you could
link a -march=armv5t unwinder against an -mfpu=neon binary and have it work.

Would it? I can't see the purpose in this...

I added the original ones so I could build -march=armv4t, -march=thumbv4t,
-march=thumbv6-m unwinders, because I couldn't figure out how to get the
assembler shut up about unsupported coprocessor instructions. Is there a way
to tell the assembler "yes, I really really know what I'm doing"? If so,
that would be a much better solution than what we've got now.

There is, by using the .fpu directive, but that's biding for the rest
of the file, so you could end up writing NEON code out of the space
where the .fpu directive was meant for and the compiler will never
find the bug. You'd get a run time bug when running on a machine that
has no NEON.

The alternative, using push/pop like MIPS or PPC, we could make the
FPU change local to that function, but there isn't support for that in
ARM yet. I'm planning to add it in addition to the horribly broken
.fpu directive in the next few months. This will be required for IFUNC
implementations.

The other alternative is to split those functions on a separate file,
that has .fpu NEON, so that we're sure that only when these functions
are called that we actually unwind VFP/NEON registers. Since we
already guarantee that in the library (by only marking the registers
tainted when we *use* them), this should do ok for the time being.

It'd also be a way to compile libc++abi with GCC, as I doubt they will
implement push/pop too soon for ARM...

cheers,
--renato

That's an interesting point. Really we should split this function in two, so
the demand-saving logic saves the high regs only when an EHT entry requests
it. That way you can build an unwinder that will work on both VFPv3-D16
processors, and ones with D16~D31.

The function is already split in two, one for 0~15 and another for
16~31, but the latter still needs to be compiled successfully on all
arches.

One solution, with macros, is to not even compile for non-NEON
binaries, but people want to compile to VFPv2 and still link the
library on NEON code and work.

I particularly find that kind of programming very lazy, and coping
with the demands of lazy people is impossible when different people
are lazy for different things.

It'd be nice if we could eliminate those preprocessor checks so you could
link a -march=armv5t unwinder against an -mfpu=neon binary and have it work.

Would it? I can't see the purpose in this...

Think about it from the perspective of a toolchain provider that wants to ship a multilib that will work on chips with and without neon registers. The particular example I gave may be bogus, but I think the principle stands.

I added the original ones so I could build -march=armv4t, -march=thumbv4t,
-march=thumbv6-m unwinders, because I couldn't figure out how to get the
assembler shut up about unsupported coprocessor instructions. Is there a way
to tell the assembler "yes, I really really know what I'm doing"? If so,
that would be a much better solution than what we've got now.

There is, by using the .fpu directive, but that's biding for the rest
of the file, so you could end up writing NEON code out of the space
where the .fpu directive was meant for and the compiler will never
find the bug. You'd get a run time bug when running on a machine that
has no NEON.

The alternative, using push/pop like MIPS or PPC, we could make the
FPU change local to that function, but there isn't support for that in
ARM yet. I'm planning to add it in addition to the horribly broken
.fpu directive in the next few months. This will be required for IFUNC
implementations.

The other alternative is to split those functions on a separate file,
that has .fpu NEON, so that we're sure that only when these functions
are called that we actually unwind VFP/NEON registers. Since we
already guarantee that in the library (by only marking the registers
tainted when we *use* them), this should do ok for the time being.

That sounds like the right solution.

Yeah, I know... I'm just a bit frustrated with how toolchains have
being abused for so long that people don't know any more what's
sensible and what's just laziness... Ignore me... :slight_smile:

cheers,
--renato

Ismail,

I haven't tested much, but can you try the attached patch? I think it
should work under *any* condition, since it uses the .fpu directive. I
split it in a separate file to not taint the rest.

If that solves your problem, I'll do a last check and will submit to
phab review.

cheers,
--renato

libcxx-unwind-vfp.patch (19.8 KB)

Thanks Renato!

Is there a .fpu directive for wmmx so we could get rid of these too?

#if (!defined(__ARM_ARCH_6M__) && !defined(__ARM_ARCH_6SM__)) || __ARM_WMMX

Jon

Hi,

Ismail,

I haven't tested much, but can you try the attached patch? I think it
should work under *any* condition, since it uses the .fpu directive. I
split it in a separate file to not taint the rest.

Doesn't seem to work

../projects/libcxxabi/src/Unwind/UnwindRegistersSaveARM.S:152:15:
error: register expected
  vstmia r0, {d16-d31}
              ^

../projects/libcxxabi/src/Unwind/UnwindRegistersRestoreARM.S:145:15:
error: register expected
  vldmia r0, {d16-d31}
              ^

Using -mfpu=vfpv3-d16 with r224370

Is this gcc or clang? I'm using both and I can't see that error...

$ arm-none-eabi-gcc -mcpu=cortex-a8 -mfpu=vfpv3-d16
UnwindRegistersRestoreARM.S -S -o -
$ clang -target armv7 -mfpu=vfpv3-d16 UnwindRegistersRestoreARM.S -S -o -

produce the vldmia alternative and

$ arm-none-eabi-gcc -mfpu=vfpv3-d16 UnwindRegistersRestoreARM.S -S -o -
$ clang -target armv6 -mfpu=vfpv3-d16 UnwindRegistersRestoreARM.S -S -o -

produce the fldmiad version.

cheers,
--renato

Hi,

../projects/libcxxabi/src/Unwind/UnwindRegistersRestoreARM.S:145:15:
error: register expected
  vldmia r0, {d16-d31}
              ^
Using -mfpu=vfpv3-d16 with r224370

Is this gcc or clang? I'm using both and I can't see that error...

$ arm-none-eabi-gcc -mcpu=cortex-a8 -mfpu=vfpv3-d16
UnwindRegistersRestoreARM.S -S -o -
$ clang -target armv7 -mfpu=vfpv3-d16 UnwindRegistersRestoreARM.S -S -o -

produce the vldmia alternative and

$ arm-none-eabi-gcc -mfpu=vfpv3-d16 UnwindRegistersRestoreARM.S -S -o -
$ clang -target armv6 -mfpu=vfpv3-d16 UnwindRegistersRestoreARM.S -S -o -

clang here:

/opt/clang/bin/clang -target armv7a-unknown-linux-gnueabihf -mfpu=vfpv3-d16 -c UnwindRegistersRestore.S

./UnwindRegistersRestoreARM.S:145:15: error: register expected
  vldmia r0, {d16-d31}
              ^

/opt/clang/bin/clang -v

clang version 3.6.0 (trunk 224333)
Target: x86_64-suse-linux
Thread model: posix
Found candidate GCC installation: /usr/lib64/gcc/x86_64-suse-linux/4.8
Selected GCC installation: /usr/lib64/gcc/x86_64-suse-linux/4.8
Candidate multilib: .;@m64
Candidate multilib: 32;@m32
Selected multilib: .;@m64

My GCC is 4.9.2, and I can't reproduce your error. But where GCC is
4.8, I can. I'm not sure why is that, since we're using Clang, but it
deserves some investigation.

Basically, the error message is saying that "d16" is not a valid
register name. If you change to -mfpu=vfpv3 everything works, so it's
definitely a run time issue.

cheers,
--renato

/opt/clang/bin/clang -target armv7a-unknown-linux-gnueabihf -mfpu=vfpv3-d16 -c UnwindRegistersRestore.S

./UnwindRegistersRestoreARM.S:145:15: error: register expected
   vldmia r0, {d16-d31}
               ^

My GCC is 4.9.2, and I can't reproduce your error. But where GCC is
4.8, I can. I'm not sure why is that, since we're using Clang, but it
deserves some investigation.

Basically, the error message is saying that "d16" is not a valid
register name. If you change to -mfpu=vfpv3 everything works, so it's
definitely a run time issue.

Is integrated-as the default on solaris? If not, which assembler is being picked up?

Ismail, can you run the above command again with '-v'?