LTO is generating invalid range extension thunks on baremetal

I’m using LLD for a baremetal project targeting aarch64-none-elf. I’m encountering odd behavior when compiling with LTO enabled.

The linker is emitting AArch64AbsLongThunk thunks with an address of 0x0. For example:

000000001007ea2c <__AArch64AbsLongThunk___clang_call_terminate>:
1007ea2c: 58000050     	ldr	x16, 0x1007ea34 <$d>
1007ea30: d61f0200     	br	x16

000000001007ea34 <$d>:
1007ea34: 00 00 00 00  	.word	0x00000000
1007ea38: 00 00 00 00  	.word	0x00000000

Some additional symptoms include:

  • The function a thunk is referencing is sometimes optimized out.
  • When compiling the project without LTO, no thunks are generated to begin with.

Is this a legitimate bug, or have I overlooked a requirement for fixed address ELF executables?

It looks suspicious, but difficult to know where without seeing the inputs to the link. My suggestions:

  • Add --save-temps to the LLD command line. This will preserve the ELF file that comes back from LTO.
  • Check the symbol for the target of the Thunk in the LTO output, something like llvm-readobj --symbols will do. In particular, is the symbol a weak reference or an absolute symbol with address 0x0?
  • Check the linker map file and symbol table output file for the final program.

LLD has a --reproduce=<reproducer.tar> option. If you are able to share the output (likely to need an external link due to size restrictions on github, then please raise a Github issue.

It looks like __clang_call_terminate is a compiler generated function for cases like:

// f() could throw an exception.
extern void f();

void g() noexcept { f(); }

In the non LTO case __clang_call_terminate is a weak definition defined within a comdat group. I can see that in some cases LTO may inline f into g so that __clang_call_terminate is no longer required. However I would only expect the group not to be generated if there were no calls to __clang_call_terminate and there must have been at least one to generate the thunk.

One possible workaround for you if you don’t need exceptions is to compile with -fno-exceptions. In that case I think __clang_call_terminate is no longer required.

It’s happening with other symbols. __clang_call_terminate was just an example.

I might have narrowed it down fruther, the invalid Thunks seem to be related to compiling with -fstack-protector and related code-gen. The simplified outline is below.

__stack_chk_fail is defined using ASM and performs some implementation specific logic to recover the stack canary. It then calls stack_protector_fail to perform the debug printout and terminate.

setup_stack_protector is a constructor function that assigns a value to the __stack_chk_guard reference val.

set_protector/get_protector are functions to control the currently installed stack protector failure function.

StackProtector.S (GAS Syntax):

.text
.global __stack_chk_fail
.extern stack_protector_fail
__stack_chk_fail:
#ifndef NDEBUG
    add     x8, x8, x9
#endif
    mov     x0, x8
    mov     x1, x9

    b      stack_protector_fail

StackProtector.h/cpp (exposition only):

extern "C" {
std::uintptr_t __stack_chk_guard;
[[noreturn]] void stack_protector_fail(std::uintptr_t lhs, std::uintptr_t rhs) noexcept;
}
[[clang::no_stack_protector, gnu::constructor]] void setup_stack_protector() noexcept;
StackProtector get_protector() noexcept;
StackProtector set_protector(StackProtector protector) noexcept; 

The current code relies on [[gnu::constructor]] to setup the canary value. When I compile the code-base as is:

  • The setup_stack_protector constructor is removed from the program entirely.
  • stack_protector_fail is emitted as a null thunk.

If a program manually references any functions within the StackProtector.cpp compilation unit (e.g. calling set_protector):

  • The setup_stack_protector constructor is no longer removed.
  • The linker step fails with ld.lld: error: undefined symbol: stack_protector_fail.

Finally, marking stack_protector_fail as [[gnu::used]], the link error goes away.

In my mind, the two pieces of buggy behaviour are:

  • [[gnu::constructor]] doesn’t seem to be respected by LTO if a compilation unit has no external references. Specifying [[gnu::constructor, gnu::used]] produces the same wrong behaviour.
  • .extern directives don’t seem to be respected correctly by LTO, at least in this instance.

I’m not 100% sure what the C++ spec and compiler extensions specify in terms of behaviour, but based off the documentation provided I would argue this is incorrect behaviour. To me, [[gnu::constructor]] would imply [[gnu::used]] in the sense that the compiler/linker wouldn’t try to optimise it away. Obviously the actual implementation would be subject to optimisations.

I would need to dig in and see why .extern stack_protector_fail is not behaving as expected.

I can see why GNU constructor functions could be considered implicitly used as they might have side-effects; hover I can understand the perspective that if LTO finds no references to anything defined in a module then it could consider that constructor redundant. It does seem like a github issue for LTO code generation to ask the question, or a separate Discourse post would be worthwhile.

As I understand it, assuming StackProtector.S is a standalone assembly language file, this should get assembled into an ELF file and passed to LLD as an ELF file. From that LLD should see a definition of __stack_chk_fail and a reference to stack_protector_fail. I expect LLD to communicate to LTO that stack_protector_fail should be kept.

The .extern shouldn’t be necessary. IIUC it doesn’t do anything in GNU syntax as all undefined symbols are global Extern (Using as)

Although I think the point is that the reference from StackProtector.o should have been sufficient to keep stack_protector_fail.

For example readelf on StackProtector.S

Symbol table '.symtab' contains 4 entries:
   Num:    Value          Size Type    Bind   Vis       Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT   UND
     1: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT     2 $x.0
     2: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT     2 __stack_chk_fail
     3: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND stack_protector_fail

As a linker (not a LTO) person I’m most interested in why the linker has chosen to make a Thunk to address 0x0. It should only do this if the destination address is 0x0, and that should only be the case if there is a section placed at address 0x0 or the symbol is absolute with a value of 0x0. In all other cases something has gone wrong and I should be able to work that part out if I’ve got an example.