Exceptions not working when cross compiling for ARM Cortex M4 with clang and precompiled libraries from ARM GNU GCC Toolchain

Hello!

I have been struggling with enabling exceptions for the Clang build, when cross compiling using precompiled libraries from ARM GNU GCC Toolchain.

LLVM version: 13.0.0
CMake version: 3.21.3
targeted CPU: STM32L432KC, ARM Cortex M4

The C++ flags:

-mthumb -mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16
-nodefaultlibs
--sysroot=${ARM_GNU_TOOLCHAIN_PATH}/arm-none-eabi
-flto
-fdata-sections -ffunction-sections
# For <iostream>, <string>, ...
-isystem "${ARM_GNU_TOOLCHAIN_PATH}/arm-none-eabi/include/c++/${ARM_GNU_TOOLCHAIN_GCC_VERSION}/" 
# For <bits/*>, ...
-isystem "${ARM_GNU_TOOLCHAIN_PATH}/arm-none-eabi/include/c++/${ARM_GNU_TOOLCHAIN_GCC_VERSION}/arm-none-eabi/thumb/v7e-m+fp/hard/"
-fexceptions
-g

ARM_GNU_TOOLCHAIN_PATH is the root path to the mentioned ARM GNU GCC Toolchain. ARM_GNU_TOOLCHAIN_GCC_VERSION is equal to 10.3.1. One can download it here: https://developer.arm.com/tools-and-software/open-source-software/developer-tools/gnu-toolchain/gnu-rm/downloads

The linker flags:

-mthumb -mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16
-nodefaultlibs
--sysroot=${ARM_GNU_TOOLCHAIN_PATH}/arm-none-eabi
-flto
-fdata-sections -ffunction-sections
-Wl,--gc-sections
-g
-flto
-fexceptions
# Path to standard libraries: libc, libm, ...
-L"${ARM_GNU_TOOLCHAIN_PATH}/arm-none-eabi/lib/thumb/v7e-m+fp/hard/"
# Path to libgcc
-L"${ARM_GNU_TOOLCHAIN_PATH}/lib/gcc/arm-none-eabi/${ARM_GNU_TOOLCHAIN_GCC_VERSION}/thumb/v7e-m+fp/hard/"
-lc -lm -lnosys -lstdc++ -lgcc")

I use custom linker script and custom startup file to compile the firmware properly. I attach them both.

This code:

try
{
    throw std::runtime_error{"Some error!"};
} catch (const std::exception&e)
{
    printf("Error: %s\r\n", e.what());
}

Will not work for the Clang build. It will crash. It works normally with the GCC build.

The problem is that clang generates “.got” section, which it inputs automatically, without being “said”, to RAM. I have included it in the “.data” section, because otherwise it breaks the global variable initialization within the startup code. GOT section is normally related to dynamic symbol resolution. Since I link everything statically, I don’t understand why this section shows up there.

The mentioned “.got” section contains one variable, which is an address pointing to some place in “rodata”. Unfortunately, this address is wrong. It points to some random rodata, where some ASCII strings are held. It shall point to the symbol:
vtable for __cxxabiv1::__class_type_info + 8 in section .rodata
I know that, because I have scrapped it from the GCC build (performed with the downloaded toolchain), for which the exceptions work properly.

The code crashes in the __cxa_type_match:
0x080096d6 in __cxa_type_match ()
─── Assembly ─────────────
0x080096d6 ? ldr r3, [r4, #0] # r4 holds the address of “got”
0x080096d8 ? mov r1, r6
0x080096da ? mov r0, r4
0x080096dc ? ldr r6, [r3, #16] # Loaded from the address pointed by (r3 + 16)
0x080096de ? add r2, sp, #4
0x080096e0 ? movs r3, #1
0x080096e2 ? blx r6 # This line crashes

According to the comments in the assembly:
r4 contains the address to the got section, which is 0x20000000. Under 0x20000000 lies the address to the random ASCII rodata. It means that:

ldr r6, [r3, #16]

Loads the ASCII characters to r6. Then we perform the blx jump with contents of r6, which is wrong, because ASCII data is no address.

Using gdb I am able to write data to RAM by hand. I have found the address of the symbol vtable for __cxxabiv1::__class_type_info + 8 in section .rodata, which is 0x8019004. Then I wrote this address under 0x20000000 (the “got” section). Unfortunately, the program didn’t end up as expected - within the “catch” block, but went to “_exit”, through “std::terminate”, what means that no suitable handler has been found during stack unwinding. The stack trace from that attempt:

[0] from 0x08008df0 in _exit
[1] from 0x08008d42 in abort
[2] from 0x08009d08 in __gnu_cxx::__verbose_terminate_handler()
[3] from 0x08008cc2 in __cxxabiv1::__terminate(void ()())
[4] from 0x08008cf2 in std::terminate()
[5] from 0x08009c08 in __cxa_throw
[6] from 0x0800fedc in test_throws_uncaught_exception()+52 at /home/kacper/Workspace/Aura/tests/device/test_serial_logger/test_logs_uncaught_exception.cpp:11
[7] from 0x08014ff0 in run_test(void (
)(), char const*, unsigned int)+92 at test_logs_uncaught_exception_runner.cpp:62
[8] from 0x08014f8e in test_main()+12 at test_logs_uncaught_exception_runner.cpp:79
[9] from 0x08014440 in app_main::$_0::operator()() const+16 at /home/kacper/Workspace/Aura/tests/device/setup/test_runner.cpp:24

I attach section dump and disassembly of the program as well.

Am I missing some compiler or linker flag?

I am fighting with it for a week at least. I was hoping that I will be able to push my project with clang build as the main build, but I guess I will have to fall back to GCC instead.

Any help would be appreciated.

Kind regards,
Kacper Kowalski

startup_stm32l432xx.s (11.7 KB)

STM32L432KCUx_FLASH.ld (6.03 KB)

test_logs_uncaught_exception.objdump (1.61 MB)

test_logs_uncaught_exception.disass (2.29 MB)

Some small suggestions:

LTO makes everything more complicated; make sure things work without -flto before trying LTO.

A “.got” section shows up when an object file compiled with -fPIC/-fPIE accesses an external global variable or constant. You can check if an object file contains a relocation like this with objdump -r (something like R_ARM_GOT_PREL). For a static link, the linker should fix it to point to the right address. If the linker isn’t computing the correct address, my best guess is that something is wrong with the linker script. Are you setting the address of flash/RAM correctly? Is the difference between the address in the GOT and the actual address related to the size of some section? (The consequences of getting this wrong might not be immediately obvious if the code is mostly using pc-relative addressing.)

If you have a working build with gcc, you might want to compare to see what, exactly, is different. Check the link line with “-###”; are you using the same linker? The same libraries?

-Eli

One more small suggestion. Eli’s suggestions are well worth trying first.

If you are static linking the linker should have written the relocated value into the .got section leaving no dynamic relocation. Can you use the linker map file to find that section in the ELF file, presumably it will be in FLASH and check that the resulting value in the flash is correct (assuming the .got will be in the correct place) after copying to RAM. If it is correct in the FLASH but not after copying to RAM then it could possibly be something to do with copying the GOT to the correct address in RAM, i.e. if that has gone wrong then the GOT and various other bits of the program could also be wrong.

I’d also look at the map file to see if other orphan sections (those that haven’t been explicitly placed by your script) have missed out being copied into RAM. To be sure your script hasn’t missed anything I recommend setting --orphan-handling-mode=warn or =error https://sourceware.org/binutils/docs/ld/Options.html#index-_002d_002dorphan_002dhandling_003dMODE

Peter

Hello Eli, Peter!

Thank you for the tips.

Peter,

  1. I guess that the default option for orphan handling is “place”, because the “got” section, which was initially not in the linker script, was put by the linker to the “data” section. When I have figured it out, I added such section to the linker script:

.got :
{
. = ALIGN(8);
*(.got)
(.got)
. = ALIGN(8);
} >FLASH

As you can see, I put it in ROM, so there is no way for the value to be invalid, because those values are loaded by the programmer/flasher device. No “data” section is involved here, so it is not initialized by the startup code.
When building in Debug mode, then “orphan handling” equal to “warn” warns me only about some “debug*” and “comment” sections. I guess it’s irrelevant?

Eli,
2. The default option for the compiler was to compile it with “-fno-pic”. When I added that flag explicitly to the compiler invocation, no difference in the executable layout, nor in the object files or static libraries, has been observed.

  1. I have checked the object files and static libraries for the entries in the “got” section (using “objdump -r …”) and I couldn’t find one. It seems that the linker is inputting the “got” section on its own.

  2. FLASH and RAM address are for sure OK. See the linker script: https://drive.google.com/file/d/1G2oBmN574LTOE9chbBgtkYZRdjcfjluR/view?usp=sharing

  3. I have compared the value which is in the “got” section with the value it should have and couldn’t find any pattern, unfortunately. It might be there, but it would be hard to calculate it.

  4. “-flto” affects the size of the binary only a little, but the error recurs the same way, when the flag is not used.

  5. One more thing: I have compiled libcxx, libunwind, libcxxabi and compiler-rt with the very same flags like specified in the first post and using clang. I linked it with my binary. It worked! The exceptions have been caught properly. I had to disable some functions from libcxx and libcxxabi, and provide some stubs, which have no implementation (e.g. aligned (de)allocation). I guess this may break other functionality in the app. The binary size increases too much, so I don’t think that using libcxx and libcxxabi is a long-run solution. The problem recurs for the setup: libstdc++ with libunwind and compiler-rt (when libgcc is replaced with libunwind + compiler-rt). It seems that using lld along with libstdc++ breaks the build. One more thing: lld generates a much larger “got” section, with multiple values, when libcxx is involved.

  6. I didn’t compare the linker flags yet for the gcc vs clang build. I will do it soon, and I will get back with the results.

Kind regards,
Kacper

IIUC then LLD with libc++, libc++abi, and libunwind works, but no other combination of c++ libraries has worked?

Although I don’t have a source for it, my understanding is that it is not recommended to mix runtime support libraries. For example libc++abi needs to be matched with llvm libunwind and possibly compiler-rt. The same goes for the GNU equivalents.

Splitting things up the problem is likely to be one or more of:

  • Code generation of exceptions in clang not being understood by GNU exception handling code
  • LLD is unable to link GNU exception handling code correctly
  • The combination of runtime libraries that you’ve chosen isn’t working together

One possible experiment to see if it is LLD is to try compiling with clang but linking with the GNU linker. If that starts to work, then can you look to see what the differences are? LLD has a –reproduce option that you can upload to bug-report, although you’ll have to wait for the Bugzilla migration to finish before doing that. If that doesn’t work then it is either some problem/incompatibility with the GNU runtime libraries or an incompatible set of libraries.

There are some things that can be done to reduce the size of libc++ and libc++abi. The demangler in libc++abi is large and only used in error messages so it can be omitted with a few tweaks. Anything that avoids bringing in locale, which is often not needed in embedded systems will also help.

I’m still puzzled at why you are seeing so many GOT entries being created. As Eli mentioned the linker will create a GOT entry in response to specific relocation types like R_ARM_GOT_PREL. The compiler shouldn’t be generating these for a non-PIC and non-PIE build. One possibility is that some later flag on the command line is overriding the -fno-pic, it could be the use of LTO as the code-generator may change the PIC setting from what has been used to make the bitcode objects.

That’s probably the best I can do without seeing the objects.

Peter

Hello Peter, Eli,

I have linked the binary using the verbose option and could see I was missing “crti.o”, “crtbegin.o”, “crt0.o”, “crtend.o” and “crtn.o” object files. GCC from the ARM GNU GCC Toolchain figures it out on its own for the specific architecture, even if the “-nostdlib” command line is used. “-nostartfiles” prevents it from linking those object files on its own.

After linking “crt*” objects for the Clang build the error recurred. I get the “got” section with one entry and the exceptions are not working properly. I am able to use the same set of arguments for GCC, though, except the “–target” option. Linking the final executable with Clang:

/home/kacper/Workspace/Aura/device/…/build/common_dependencies/llvm-src/bin/clang++ --target=armv7em-none-eabi -Wall -Wextra -mthumb -mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16 -nodefaultlibs --sysroot=/home/kacper/Workspace/Aura/device/…/build/common_dependencies/armgnutoolchain-src/arm-none-eabi -fno-pic -fdata-sections -ffunction-sections -isystem /home/kacper/Workspace/Aura/device/…/build/common_dependencies/armgnutoolchain-src/arm-none-eabi/include/c++/10.3.1/ -isystem /home/kacper/Workspace/Aura/device/…/build/common_dependencies/armgnutoolchain-src/arm-none-eabi/include/c++/10.3.1//arm-none-eabi/thumb/v7e-m+fp/hard/ -g -Wall -Wextra -mthumb -mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16 -nodefaultlibs --sysroot=/home/kacper/Workspace/Aura/device/…/build/common_dependencies/armgnutoolchain-src/arm-none-eabi -fno-pic -Wl,–sysroot=/home/kacper/Workspace/Aura/device/…/build/common_dependencies/armgnutoolchain-src/arm-none-eabi -Wl,–gc-sections -Wl,-Map=output.map -Wl,-nostdlib -L/home/kacper/Workspace/Aura/device/…/build/common_dependencies/armgnutoolchain-src/arm-none-eabi/lib/thumb/v7e-m+fp/hard/ -L/home/kacper/Workspace/Aura/device/…/build/common_dependencies/armgnutoolchain-src/lib/gcc/arm-none-eabi/10.3.1/thumb/v7e-m+fp/hard/ -o test_logs_uncaught_exception /home/kacper/Workspace/Aura/device/…/build/common_dependencies/armgnutoolchain-src/lib/gcc/arm-none-eabi/10.3.1/thumb/v7e-m+fp/hard//crti.o /home/kacper/Workspace/Aura/device/…/build/common_dependencies/armgnutoolchain-src/arm-none-eabi/lib/thumb/v7e-m+fp/hard/crt0.o /home/kacper/Workspace/Aura/device/…/build/common_dependencies/armgnutoolchain-src/lib/gcc/arm-none-eabi/10.3.1/thumb/v7e-m+fp/hard//crtbegin.o CMakeFiles/test_logs_uncaught_exception.dir/test_logs_uncaught_exception.cpp.obj …/…/…/libdevice_specific.a …/…/…/cube/libcube.a …/setup/libdevice_test_setup.a …/…/…/_deps/unity_project-build/libunity.a libtest_logs_uncaught_exception_runner.a …/…/…/_deps/unity_project-build/libunity.a …/setup/libunity_putchar.a …/…/…/serial_logger/libserial_logger.a …/…/…/cube/libcube.a …/…/…/libfreertos.a …/…/…/_deps/printf_library-build/lib/libprintf.a -lstdc++ -lm -lc -Wl,–start-group -lgcc -lg -lc -Wl,–end-group -Wl,–start-group -lgcc -lc -lnosys -Wl,–end-group /home/kacper/Workspace/Aura/device/…/build/common_dependencies/armgnutoolchain-src/lib/gcc/arm-none-eabi/10.3.1/thumb/v7e-m+fp/hard//crtend.o /home/kacper/Workspace/Aura/device/…/build/common_dependencies/armgnutoolchain-src/lib/gcc/arm-none-eabi/10.3.1/thumb/v7e-m+fp/hard//crtn.o -T/home/kacper/Workspace/Aura/device/cube/Aura/STM32L432KCUx_FLASH.ld

With GCC:

/home/kacper/Workspace/Aura/device/…/build/common_dependencies/armgnutoolchain-src/bin/arm-none-eabi-g++ --target=armv7em-none-eabi -Wall -Wextra -mthumb -mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16 -nodefaultlibs --sysroot=/home/kacper/Workspace/Aura/device/…/build/common_dependencies/armgnutoolchain-src/arm-none-eabi -fno-pic -fdata-sections -ffunction-sections -isystem /home/kacper/Workspace/Aura/device/…/build/common_dependencies/armgnutoolchain-src/arm-none-eabi/include/c++/10.3.1/ -isystem /home/kacper/Workspace/Aura/device/…/build/common_dependencies/armgnutoolchain-src/arm-none-eabi/include/c++/10.3.1//arm-none-eabi/thumb/v7e-m+fp/hard/ -g -Wall -Wextra -mthumb -mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16 -nodefaultlibs --sysroot=/home/kacper/Workspace/Aura/device/…/build/common_dependencies/armgnutoolchain-src/arm-none-eabi -fno-pic -Wl,–sysroot=/home/kacper/Workspace/Aura/device/…/build/common_dependencies/armgnutoolchain-src/arm-none-eabi -Wl,–gc-sections -Wl,-Map=output.map -Wl,-nostdlib -L/home/kacper/Workspace/Aura/device/…/build/common_dependencies/armgnutoolchain-src/arm-none-eabi/lib/thumb/v7e-m+fp/hard/ -L/home/kacper/Workspace/Aura/device/…/build/common_dependencies/armgnutoolchain-src/lib/gcc/arm-none-eabi/10.3.1/thumb/v7e-m+fp/hard/ -o test_logs_uncaught_exception /home/kacper/Workspace/Aura/device/…/build/common_dependencies/armgnutoolchain-src/lib/gcc/arm-none-eabi/10.3.1/thumb/v7e-m+fp/hard//crti.o /home/kacper/Workspace/Aura/device/…/build/common_dependencies/armgnutoolchain-src/arm-none-eabi/lib/thumb/v7e-m+fp/hard/crt0.o /home/kacper/Workspace/Aura/device/…/build/common_dependencies/armgnutoolchain-src/lib/gcc/arm-none-eabi/10.3.1/thumb/v7e-m+fp/hard//crtbegin.o CMakeFiles/test_logs_uncaught_exception.dir/test_logs_uncaught_exception.cpp.obj …/…/…/libdevice_specific.a …/…/…/cube/libcube.a …/setup/libdevice_test_setup.a …/…/…/_deps/unity_project-build/libunity.a libtest_logs_uncaught_exception_runner.a …/…/…/_deps/unity_project-build/libunity.a …/setup/libunity_putchar.a …/…/…/serial_logger/libserial_logger.a …/…/…/cube/libcube.a …/…/…/libfreertos.a …/…/…/_deps/printf_library-build/lib/libprintf.a -lstdc++ -lm -lc -Wl,–start-group -lgcc -lg -lc -Wl,–end-group -Wl,–start-group -lgcc -lc -lnosys -Wl,–end-group /home/kacper/Workspace/Aura/device/…/build/common_dependencies/armgnutoolchain-src/lib/gcc/arm-none-eabi/10.3.1/thumb/v7e-m+fp/hard//crtend.o /home/kacper/Workspace/Aura/device/…/build/common_dependencies/armgnutoolchain-src/lib/gcc/arm-none-eabi/10.3.1/thumb/v7e-m+fp/hard//crtn.o -T/home/kacper/Workspace/Aura/device/cube/Aura/STM32L432KCUx_FLASH.ld -nostartfiles

For GCC everything works as expected, for Clang doesn’t. Using “-Wl,–reproduce” generates “-Bstatic” file which can be found here:
https://drive.google.com/file/d/1Wvx4JPXpPWefV1AZCp4czWOqCXV3Rukx/view?usp=sharing

I will try to compile libcxx and libcxxabi to be tinier according to Peter’s tips and will try to figure out whether aligned allocation could be omitted somehow.

Kind regards,
Kacper

Thanks for the reproducer.

I think I’ve worked out where your GOT entry is coming from. There is a R_ARM_TARGET2 relocation which takes on a different value based on the target. For Linux targets this defaults to R_ARM_GOT_PREL. For embedded systems I think R_ARM_ABS32 is what an embedded build of GNU ld will use.

Can you add -Wl,–target2=abs to your command line (option is supported by GNU ld and LLD; documented https://sourceware.org/binutils/docs/ld/ARM.html). That should make the GOT unnecessary. Ideally we should add that flag to the bare-metal driver for Arm Targets.

There is a chance that this will make everything magically work. If the unwinder is expecting R_ARM_TARGET2 to be resolved as R_ARM_ABS32 then even if lld correctly resolves R_ARM_GOT_PREL it won’t be what the unwinder is expecting.

My apologies I don’t have a lot of spare time today to try and see if I can spot anything else.

Peter

Thanks for the reproducer.

I think I’ve worked out where your GOT entry is coming from. There is a R_ARM_TARGET2 relocation which takes on a different value based on the target. For Linux targets this defaults to R_ARM_GOT_PREL. For embedded systems I think R_ARM_ABS32 is what an embedded build of GNU ld will use.

Can you add -Wl,–target2=abs to your command line (option is supported by GNU ld and LLD; documented https://sourceware.org/binutils/docs/ld/ARM.html). That should make the GOT unnecessary. Ideally we should add that flag to the bare-metal driver for Arm Targets.

There is a chance that this will make everything magically work. If the unwinder is expecting R_ARM_TARGET2 to be resolved as R_ARM_ABS32 then even if lld correctly resolves R_ARM_GOT_PREL it won’t be what the unwinder is expecting.

My apologies I don’t have a lot of spare time today to try and see if I can spot anything else.

Peter

P.S. I’ve removed the message text from before your previous reply as it exceeded the unmoderated message size limit and I don’t expect too many of them to be around today.

Hello Peter,

I truly appreciate your commitment, and thank you for the help and so quick responses. There is nothing to apologize for. :wink:

The “-Wl,–target2=rel” resolves the problem. Thank you!

Kind regards,
Kacper