[LLD] Writing thunks before the corresponding section


MIPS LA25 thunk is used to call PIC function from non-PIC code.
Usually it contains three instructions:

lui $25, %hi(func)
addiu $25, $25, %lo(func)
j func

We can write such thunk in an arbitrary place of the generated file.
But if a PIC function requires the thunk is the first routine in a
section, we can optimize the code and escape jump instruction. To do
so we just write the following thunk right before the PIC routine.

lui $25, %hi(func)
addiu $25, $25, %lo(func)

In fact GNU bfd/gold linkers write all MIPS LA25 thunks required for
the section "A" into a separate input section "S" and put section "S"
before "A". The last thunk in the section "S" might have an optimized
two-instructions form.

I would like to implement such optimization in LLD. My question is
about ARM thunks - is it okay to write them before corresponding input
section not after like LLD does now?

Hello Simon,

Yes it is okay to write ARM thunks before an InputSection. There is a
similar "inline state change" thunk in ARM that does BX PC, NOP to
change state and fall through. The ARM Thunks that are implemented now
just need to be in range of the source branch. I have previously
worked on an ARM Linker that has thunks in separate sections in the
same way that you describe for bfd/gold.

I can't tell if you are planning to implement Thunks as separate
InputSections or assigning them to existing InputSections as they are
now but writing them at the front and not the end.

If you are considering putting the thunks as data to be written prior
to the InputSection contents I think you'll need some extra book
- Padding might be needed between the last thunk and the InputSection
contents if the alignment of the InputSection is higher than the usual
2 or 4.
- If the Thunk is conceptually part of the InputSection (starts at
offset 0) then all the relocations and symbols will need displacing.

It is worth mentioning that disassembly of ARM and Thumb Thunks may
look a bit strange if they are moved from after the InputSection. This
is because they lack a mapping symbol ($a or $t) that tells the
disassembler what instruction set to disassemble. I've got adding
mapping symbol for linker generated InputSections on my list of things
to do.

Hope this helps


Maybe it’s a little bit evil, but I’ve found that SUB PC,PC,#3 works just fine to change to Thumb state without any NOP needed on all current-generation CPUs I’ve tried it on, and in particular Raspberry Pi 2 (Cortex A7), Pi 3 (Cortex A53) and Odroid XU4 (Cortex A15).

Unfortunately I never though to try this ten years ago on the ARM7TDMI

e.g. (assumes Linux EABI kernel)


.equ STDOUT, 1

.globl _start
.syntax unified

sub pc,pc,#3

movs r0,#STDOUT
adr r1,hello
movs r2,#11
swi 0
swi 0

.align 2
hello: .asciz “Hello asm!\n”

This seems to be a reasonable optimization, and I don’t have any particular concern about implementing it.

Forgot to mention: BX PC won't do anything in ARM mode. Standard way is ADD
Rn,PC,#1;BX Rn (typically LR).

In Thumb mode BX PC will switch to ARM, but the BX instruction should be
4-byte aligned and the next 2 bytes are ignored .. doesn't matter whether
they are NOP or not.

The architecture manual says BX PC from the 2nd Thumb instruction in a 4
byte word is unpredictable. On some implementations it will work, resuming
at the ARM instruction in the very next bytes (address 4 bytes more than
the word the Thumb instruction was in). But it's hit and miss. The
following code works on Odroid XU4 (A15) and Raspberry Pi 2 (A7) but not on
Raspberry Pi 3 (A53 - bus error):

00010054 <_start>:
   10054: e24ff003 sub pc, pc, #3
   10058: 2001 movs r0, #1
   1005a: a105 add r1, pc, #20 ; (adr r1, 10070 <hello>)
   1005c: 220b movs r2, #11
   1005e: 4778 bx pc
   10060: e3b07004 movs r7, #4
   10064: ef000000 svc 0x00000000
   10068: e3b07001 movs r7, #1
   1006c: ef000000 svc 0x00000000

00010070 <hello>:
   10070: 6c6c6548 .word 0x6c6c6548
   10074: 7361206f .word 0x7361206f
   10078: 000a216d .word 0x000a216d


Sorry for delay with reply.

It looks like now thunks can be implemented as a synthetic sections.
In that case we give flexible solution and will be able to put thunks
before/after related sections, using different alignment etc. As far
as I know BFD linker uses the same approach at least for MIPS thunks.
I will try to implement this idea.

Sure. One thing I want to remind you is that there is a place in Writer.cpp where we assume all synthetic sections were appended to end of Sections list. Look for llvm::reverse. If you add synthetic section thunks right before/after non-synthetic sections, you also want to change that.

Thanks to point that.