[lld] ELF weak aliases

So I just got lua to link and run and work on x86-64 Linux with musl
and lld. It did require one change to hack around incorrect handling
of ELF weak aliases.

In musl __stdio_exit.c
<http://git.musl-libc.org/cgit/musl/tree/src/stdio/__stdio_exit.c> we
have:

static FILE *const dummy_file = 0;
weak_alias(dummy_file, __stdin_used);
weak_alias(dummy_file, __stdout_used);
weak_alias(dummy_file, __stderr_used);

weak_alias(old, new) is defined as: extern __typeof(old) new
__attribute__((weak, alias(#old)))

This generates the following object file:
mspencer@mspencer-vm:~/Projects/test$ objdump -st
../musl/src/stdio/__stdio_exit.o

../musl/src/stdio/__stdio_exit.o: file format elf64-x86-64

SYMBOL TABLE:
0000000000000000 l df *ABS* 0000000000000000 src/stdio/__stdio_exit.c
0000000000000044 l F .text 0000000000000049 close_file
0000000000000000 l O .rodata 0000000000000008 dummy_file
0000000000000000 l d .text 0000000000000000 .text
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 l d .bss 0000000000000000 .bss
0000000000000000 l d .rodata 0000000000000000 .rodata
0000000000000000 l d .note.GNU-stack 0000000000000000 .note.GNU-stack
0000000000000000 w O .rodata 0000000000000008 __stderr_used
0000000000000000 w O .rodata 0000000000000008 __stdin_used
0000000000000000 g F .text 0000000000000044 __stdio_exit
0000000000000000 w O .rodata 0000000000000008 __stdout_used
0000000000000000 *UND* 0000000000000000 __libc
0000000000000000 *UND* 0000000000000000 __lock
0000000000000000 *UND* 0000000000000000 __lockfile

Contents of section .text:
0000 53833d00 00000000 740abf00 000000e8 S.=.....t.......
0010 00000000 488b1d00 000000eb 0c4889df ....H........H..
0020 e81f0000 00488b5b 704885db 75ef488b .....H.[pH..u.H.
0030 3d000000 00e80a00 0000488b 3d000000 =.........H.=...
0040 005beb00 534889fb 4885db74 3e83bb8c .[..SH..H..t>...
0050 00000000 78084889 dfe80000 0000488b ....x.H.......H.
0060 4328483b 4338760a 4889df31 f631d2ff C(H;C8v.H..1.1..
0070 5348488b 7308482b 7310730f 488b4350 SHH.s.H+s.s.H.CP
0080 4889dfba 01000000 5bffe05b c3 H.......[..[.
Contents of section .rodata:
0000 00000000 00000000

Note that __stdout_used is the last symbol in the .rodata section.
This means that the reader assigns the data (16 bytes of 0) to
__stdout_used. Because dummy_file and the other __stdx_used symbols
come before it, they end up in the right place in the final file.

This works great until another object file provides a definition of
__stdout_used. The weak definition of it gets totally removed, meaning
so does the content for the other __stdx_used symbols.

I fixed this by adding weak_alias(dummy_file,
__zinurfilestealinurdata); to __stdio_exit.c which allocated the 16
bytes to __zinurfilestealinurdata.

Another way to fix this it to, in the reader, assign all the data to
the non-weak symbol (dummy_file in this case) when multiple symbols
share the same location. However, this fails to work if you have a
weak symbol pointing in to the middle of a non weak symbol's data. In
this case we actually need to move the data over to the non-weak
symbol (or create an anonymous local symbol to hold the data).
However, this only needs to happen in specific cases.

- Michael Spencer

How are you modeling weak aliases in Atoms?

mach-o does not support weak aliases. My mental model of a weak alias is:

  If foo is a weak alias for bar, then if nothing else defines bar, use foo in place of bar.

-Nick

Hi Michael,

Does ELF support aliasing ?

How is the relationship captured in ELF symbol table, that one symbol is a
alias of another symbol ?

Note that __stdout_used is the last symbol in the .rodata section.
This means that the reader assigns the data (16 bytes of 0) to
__stdout_used. Because dummy_file and the other __stdx_used symbols
come before it, they end up in the right place in the final file.

Did you change the Reader too ?

The Reader doesnot allocate any space for __stdout_used. The size of the
current symbol = (value of next symbol - current symbol). In this case its
zero.

This works great until another object file provides a definition of
__stdout_used. The weak definition of it gets totally removed, meaning
so does the content for the other __stdx_used symbols.

When the other object provides a definition for __stdout_used, the atom
gets the property of the other object which defines the atom isnt it, and
so as the ordinal too riht ?

Couldnt follow how did the others move ?

This is what I see with binutils/ld :-

$cat 1.c
#include "stdio_impl.h"

static FILE *const dummy_file = 0;
weak_alias(dummy_file, __stdin_used);
weak_alias(dummy_file, __stdout_used);
weak_alias(dummy_file, __stderr_used);

$cat 2.c
int __stdout_used = 10;
$readelf -s 1.o | grep -E 'used|dummy_file'
     6: 0000000000000000 8 OBJECT LOCAL DEFAULT 4 dummy_file
     9: 0000000000000000 8 OBJECT WEAK DEFAULT 4 __stdin_used
    10: 0000000000000000 8 OBJECT WEAK DEFAULT 4 __stdout_used
    11: 0000000000000000 8 OBJECT WEAK DEFAULT 4 __stderr_used
$readelf -s 2.o | grep -E 'used|dummy_file'
     7: 0000000000000000 4 OBJECT GLOBAL DEFAULT 2 __stdout_used
$ld 1.o 2.o
ld: warning: cannot find entry symbol _start; defaulting to 00000000004000e8
$readelf -s a.out | grep -E 'used|dummy_file'
     5: 00000000004000e8 8 OBJECT LOCAL DEFAULT 1 dummy_file
     7: 00000000006000f0 4 OBJECT GLOBAL DEFAULT 2 __stdout_used
     8: 00000000004000e8 8 OBJECT WEAK DEFAULT 1 __stdin_used
    13: 00000000004000e8 8 OBJECT WEAK DEFAULT 1 __stderr_used

Thanks

Shankar Easwaran

How are you modeling weak aliases in Atoms?

mach-o does not support weak aliases. My mental model of a weak alias is:

  If foo is a weak alias for bar, then if nothing else defines bar, use foo in place of bar.

-Nick

ELF doesn't have any specific concept of aliases. The compiler just
assigns multiple symbols to the same address. The reader just creates
a mergeAsWeak atom with the last symbol in the symbol table at that
address getting the content.

- Michael Spencer

Hi Michael,

Does ELF support aliasing ?

How is the relationship captured in ELF symbol table, that one symbol is a
alias of another symbol ?

It is not explicitly captured. It's an implicit relationship due to
the symbols having the same address.

Note that __stdout_used is the last symbol in the .rodata section.
This means that the reader assigns the data (16 bytes of 0) to
__stdout_used. Because dummy_file and the other __stdx_used symbols
come before it, they end up in the right place in the final file.

Did you change the Reader too ?

No. I just made another symbol to steal the actual content.

The Reader doesnot allocate any space for __stdout_used. The size of the
current symbol = (value of next symbol - current symbol). In this case its
zero.

__stdout_used is the last symbol at that address, so it gets the data.
The hack was to make __stdout_used not get the data.

This works great until another object file provides a definition of
__stdout_used. The weak definition of it gets totally removed, meaning
so does the content for the other __stdx_used symbols.

When the other object provides a definition for __stdout_used, the atom
gets the property of the other object which defines the atom isnt it, and
so as the ordinal too riht ?

Couldnt follow how did the others move ?

I'm not quite sure what you mean here.

This is what I see with binutils/ld :-

$cat 1.c
#include "stdio_impl.h"

static FILE *const dummy_file = 0;
weak_alias(dummy_file, __stdin_used);
weak_alias(dummy_file, __stdout_used);
weak_alias(dummy_file, __stderr_used);

$cat 2.c
int __stdout_used = 10;
$readelf -s 1.o | grep -E 'used|dummy_file'
     6: 0000000000000000 8 OBJECT LOCAL DEFAULT 4 dummy_file
     9: 0000000000000000 8 OBJECT WEAK DEFAULT 4 __stdin_used
    10: 0000000000000000 8 OBJECT WEAK DEFAULT 4 __stdout_used
    11: 0000000000000000 8 OBJECT WEAK DEFAULT 4 __stderr_used
$readelf -s 2.o | grep -E 'used|dummy_file'
     7: 0000000000000000 4 OBJECT GLOBAL DEFAULT 2 __stdout_used
$ld 1.o 2.o
ld: warning: cannot find entry symbol _start; defaulting to 00000000004000e8
$readelf -s a.out | grep -E 'used|dummy_file'
     5: 00000000004000e8 8 OBJECT LOCAL DEFAULT 1 dummy_file
     7: 00000000006000f0 4 OBJECT GLOBAL DEFAULT 2 __stdout_used
     8: 00000000004000e8 8 OBJECT WEAK DEFAULT 1 __stdin_used
    13: 00000000004000e8 8 OBJECT WEAK DEFAULT 1 __stderr_used

Thanks

Shankar Easwaran

Yes, which is what we want. Currently we get a dummy_file,
__stdin_used, __stderr_used all as 0 size.

- Michael Spencer

Hi Michael,

Does ELF support aliasing ?

How is the relationship captured in ELF symbol table, that one symbol is
a
alias of another symbol ?

It is not explicitly captured. It's an implicit relationship due to
the symbols having the same address.

Got it.

Note that __stdout_used is the last symbol in the .rodata section.
This means that the reader assigns the data (16 bytes of 0) to
__stdout_used. Because dummy_file and the other __stdx_used symbols
come before it, they end up in the right place in the final file.

Did you change the Reader too ?

No. I just made another symbol to steal the actual content.

We could change the Reader so that if the symbol is the last symbol in the
section and the symbol is weak, treat the size of the symbol differently.

The Reader doesnot allocate any space for __stdout_used. The size of the
current symbol = (value of next symbol - current symbol). In this case
its
zero.

__stdout_used is the last symbol at that address, so it gets the data.
The hack was to make __stdout_used not get the data.

Got it. Thanks for explainining things.

This works great until another object file provides a definition of
__stdout_used. The weak definition of it gets totally removed, meaning
so does the content for the other __stdx_used symbols.

When the other object provides a definition for __stdout_used, the atom
gets the property of the other object which defines the atom isnt it,
and
so as the ordinal too riht ?

Couldnt follow how did the others move ?

I'm not quite sure what you mean here.

Sorry for not making it clear. I was not sure how did the content of the
other symbols change when another object file provided a definition of
__stdout_used ?

Thanks

Shankar Easwaran

Hi Michael,

Does ELF support aliasing ?

How is the relationship captured in ELF symbol table, that one symbol is
a
alias of another symbol ?

It is not explicitly captured. It's an implicit relationship due to
the symbols having the same address.

Got it.

Note that __stdout_used is the last symbol in the .rodata section.
This means that the reader assigns the data (16 bytes of 0) to
__stdout_used. Because dummy_file and the other __stdx_used symbols
come before it, they end up in the right place in the final file.

Did you change the Reader too ?

No. I just made another symbol to steal the actual content.

We could change the Reader so that if the symbol is the last symbol in the
section and the symbol is weak, treat the size of the symbol differently.

This doesn't only occur when the symbol is the last in the section. It
occurs any time a weak symbol shares content with another.

The Reader doesnot allocate any space for __stdout_used. The size of the
current symbol = (value of next symbol - current symbol). In this case
its
zero.

__stdout_used is the last symbol at that address, so it gets the data.
The hack was to make __stdout_used not get the data.

Got it. Thanks for explainining things.

This works great until another object file provides a definition of
__stdout_used. The weak definition of it gets totally removed, meaning
so does the content for the other __stdx_used symbols.

When the other object provides a definition for __stdout_used, the atom
gets the property of the other object which defines the atom isnt it,
and
so as the ordinal too riht ?

Couldnt follow how did the others move ?

I'm not quite sure what you mean here.

Sorry for not making it clear. I was not sure how did the content of the
other symbols change when another object file provided a definition of
__stdout_used ?

Because __stdout_used in __stdio_exit.o was the only symbol with the 8
null bytes, when another object file defined a non weak version of
__stdout_used, the __stdio_exit.o::__stdout_used was removed along
with its 8 null bytes. The non-weak definition ends up in a different
location.

Thanks

Shankar Easwaran

- Michael Spencer