[BOLT][AArch64] Questions on the implementation of option `--plt` on AArch64

I’m writing to discuss some questions when implementing the --plt option of BOLT on AArch64. Here is the test program written in C.

#include <stdio.h>
int main(){
    printf("Hello World\n");
    return 0;

In X86, the printf function call is compiled as a call to puts entry in the .plt section.

 40058f:    e8 fc fe ff ff     callq 400490 <puts@plt>

The first inst of the puts entry (i.e., pc 0x400490) is a jump to the implementation address stored in the GOT entry of puts.

Disassembly of section .plt:

0000000000400480 <.plt>:
  400480:       ff 35 42 0b 20 00       pushq  0x200b42(%rip)        # 600fc8 <_GLOBAL_OFFSET_TABLE_+0x8>
  400486:       ff 25 44 0b 20 00       jmpq   *0x200b44(%rip)        # 600fd0 <_GLOBAL_OFFSET_TABLE_+0x10>
  40048c:       0f 1f 40 00             nopl   0x0(%rax)

0000000000400490 <puts@plt>:
  400490:       ff 25 42 0b 20 00       jmpq   *0x200b42(%rip)        # 600fd8 <puts@GLIBC_2.2.5>
  400496:       68 00 00 00 00          pushq  $0x0
  40049b:       e9 e0 ff ff ff          jmpq   400480 <.plt>

The --plt option uses the function convertCallToIndirectCall to combine the inst callq(0x40058f) and the inst jumpq(0x400490) into one callq(0xa000ef) and replace the original callq inst(0x40058f), thus reducing the count of insts executed.

  a000ef:       ff 15 e3 0e c0 ff       callq  *-0x3ff11d(%rip)        # 600fd8 <puts@GLIBC_2.2.5>

However in AArch64, there exists no inst that call to an address stored in the memory. They use 4 insts from 0x400540 to 0x40054c to do the similar work.

  400694: 97ffffab     	bl	0x400540 <puts@plt>
0000000000400540 <puts@plt>:
  400540: 90000110     	adrp	x16, 0x420000 <puts@GLIBC_2.17+0x420000>
  400544: f9400e11     	ldr	x17, [x16, #0x18]
  400548: 91006210     	add	x16, x16, #0x18
  40054c: d61f0220     	br	x17

So, my question is, should we replace the original bl inst with these 4 insts(do similar optimization work as in X86), or just give up the --plt option on AArch64?

Hi. X86 has memory-register architecture, thus it might use values stored in memory without having they placed in register first. The aarch64 is register-register arch, so you should load desired value from memory to register first, so such an optimisation is not possible.

Thank you for your reply. Additionally, I’d like to clarify whether replacing the “bl” instruction with these four instructions is not possible by using BOLT, or if there is no optimization effect achieved by doing so.

Technically we can try to it in case this PLT entry was used only ones in the whole binary. But basically such a scheme with PLT is used not to duplicate 4 instruction all over the place. Also I’m not convinced that such an optimisation would give anything in absolutely most of the binaries.

Thanks for your detailed reply. At last, I would like to inquire whether the optimization --frame-opt can be achieved on AArch64 from your expertise. I am going to work on that and trying to avoid doing useless work just like this time.​