I’m writing to discuss some questions when implementing the --plt
option of BOLT on AArch64. Here is the test program written in C.
#include <stdio.h>
int main(){
printf("Hello World\n");
return 0;
}
In X86, the printf
function call is compiled as a call to puts
entry in the .plt section.
40058f: e8 fc fe ff ff callq 400490 <puts@plt>
The first inst of the puts
entry (i.e., pc 0x400490) is a jump to the implementation address stored in the GOT entry of puts
.
Disassembly of section .plt:
0000000000400480 <.plt>:
400480: ff 35 42 0b 20 00 pushq 0x200b42(%rip) # 600fc8 <_GLOBAL_OFFSET_TABLE_+0x8>
400486: ff 25 44 0b 20 00 jmpq *0x200b44(%rip) # 600fd0 <_GLOBAL_OFFSET_TABLE_+0x10>
40048c: 0f 1f 40 00 nopl 0x0(%rax)
0000000000400490 <puts@plt>:
400490: ff 25 42 0b 20 00 jmpq *0x200b42(%rip) # 600fd8 <puts@GLIBC_2.2.5>
400496: 68 00 00 00 00 pushq $0x0
40049b: e9 e0 ff ff ff jmpq 400480 <.plt>
The --plt
option uses the function convertCallToIndirectCall
to combine the inst callq
(0x40058f) and the inst jumpq
(0x400490) into one callq
(0xa000ef) and replace the original callq
inst(0x40058f), thus reducing the count of insts executed.
a000ef: ff 15 e3 0e c0 ff callq *-0x3ff11d(%rip) # 600fd8 <puts@GLIBC_2.2.5>
However in AArch64, there exists no inst that call to an address stored in the memory. They use 4 insts from 0x400540 to 0x40054c to do the similar work.
400694: 97ffffab bl 0x400540 <puts@plt>
0000000000400540 <puts@plt>:
400540: 90000110 adrp x16, 0x420000 <puts@GLIBC_2.17+0x420000>
400544: f9400e11 ldr x17, [x16, #0x18]
400548: 91006210 add x16, x16, #0x18
40054c: d61f0220 br x17
So, my question is, should we replace the original bl
inst with these 4 insts(do similar optimization work as in X86), or just give up the --plt
option on AArch64?