question about --emit-relocs with lld

Hi,

While doing Linux kernel builds linked with lld, I've tracked down a
difference that breaks relocation of the kernel image (e.g. under
KASLR[1]). Some relocations are changed to ABS (weirdly, all are in
.rodata section). Note the difference below in the resulting linked
output.
.L__const._start.instance becomes *ABS* only under lld:

$ cat minimal.c
struct minimal {
        void *pointer;
        int value;
};

void _start(void)
{
        struct minimal instance = {
                .value = 1,
        };
}
$ llvm-build/x86/bin/clang -c minimal.c
$ /usr/bin/ld.bfd --emit-relocs minimal.o -o minimal.bfd
$ llvm-build/x86/bin/ld.lld --emit-relocs minimal.o -o minimal.lld
$ objdump -Sdr minimal.bfd
...
00000000004000b0 <_start>:
  4000b0: 55 push %rbp
  4000b1: 48 89 e5 mov %rsp,%rbp
  4000b4: 48 8b 04 25 d0 00 40 mov 0x4000d0,%rax
  4000bb: 00
                        4000b8: R_X86_64_32S .rodata
  4000bc: 48 89 45 f0 mov %rax,-0x10(%rbp)
  4000c0: 48 8b 04 25 d8 00 40 mov 0x4000d8,%rax
  4000c7: 00
                        4000c4: R_X86_64_32S .L__const._start.instance+0x8
  4000c8: 48 89 45 f8 mov %rax,-0x8(%rbp)
  4000cc: 5d pop %rbp
  4000cd: c3 retq

$ objdump -Sdr minimal.lld
...
0000000000201000 <_start>:
  201000: 55 push %rbp
  201001: 48 89 e5 mov %rsp,%rbp
  201004: 48 8b 04 25 20 01 20 mov 0x200120,%rax
  20100b: 00
                        201008: R_X86_64_32S .rodata
  20100c: 48 89 45 f0 mov %rax,-0x10(%rbp)
  201010: 48 8b 04 25 28 01 20 mov 0x200128,%rax
  201017: 00
                        201014: R_X86_64_32S *ABS*+0x8
  201018: 48 89 45 f8 mov %rax,-0x8(%rbp)
  20101c: 5d pop %rbp
  20101d: c3 retq

I'm not sure where to start looking for solving this...

Thanks!

-Kees

[1] https://github.com/ClangBuiltLinux/linux/issues/404

Hi,

While doing Linux kernel builds linked with lld, I've tracked down a
difference that breaks relocation of the kernel image (e.g. under
KASLR[1]). Some relocations are changed to ABS (weirdly, all are in
.rodata section). Note the difference below in the resulting linked
output.

Can you file a bug for this.

-Tom

Sure! Done: https://bugs.llvm.org/show_bug.cgi?id=41385

Thanks!

This seems like a bug in lld.

Here is the cause of the bug.

Symbol whose name begins with “.L” are local symbols. Usually such symbols are discarded by the assembler because relocations relative to local symbols can be replaced by ones that are relative to beginning of sections. But there’s one case in which the assembler cannot do that. Local symbols in mergeable sections cannot be replaced, because mergeable sections are split by the linker and reassembled, so their offsets from beginning of sections are not computable at assemble-time.

The linker recognizes remaining “.L” symbols and discards them. lld implements that behavior, but it discards the symbols even when --emit-relocs is given. This is the cause of the bug.

We should keep “.L” symbols when --emit-relocs is given. Looks like that’s what GNU linkers do as well.