Weak undefined symbols and dynamic libraries

lld as well as other linkers work hard to make weak undefined symbols work beyond ELF binaries when dynamic linking involved, but unless everything is compiled with -fPIC, they don’t actually work as people would expect.

I think most people do not know that fact, and even for those who have knowledge on ELF, the current half-broken behavior is confusing and not useful. So I’d like to propose we simplify it.

Let me explain why it is half-broken. Assume that we have foo.c with the following contents:

attribute((weak)) void weakfn(void) {}
int main() { if (weakfn) weakfn(); }

What it’s intended to do is to call weakfn only when the function is defined. If you link foo.o against a shared library providing a definition of weakfn, the symbol is added to the executable’s dynamic symbol table as a weak undefined symbol.

Create a shared library

$ echo ‘void weakfn() { puts(“hello”); }’ | clang -xc -o bar.so -shared -fPIC -

Link foo.o and bar.so to create an executable

$ clang -c foo.c
$ clang foo.o bar.so
$ LD_LIBRARY_PATH=. ./a.out
hello

Looks good so far. weakfn is in the dynamic symbol table as a weak undefined symbol.

$ objdump --dynamic-syms a.out |grep weak
0000000000400500 w DF UND 0000000000000000 weakfn

But, is it really weak? Not really. If we remove the symbol from bar.so, the main executable starts to crash.

$ clang -xc -o bar.so -shared -fPIC /dev/null

$ LD_LIBRARY_PATH=. ./a.out
Segmentation fault (core dumped)

This is because weakfn is always resolved to its PLT entry’s address in the main executable. Since the PLT slot address is not zero, weakfn in if (weakfn) weakfn() is always called even if real weakfn is missing. If weakfn is missing, it’s PLT entry jumps to address zero, so calling the function caused a crash.

We cannot avoid it if we are creating a non-PIC binary, because for non-PIC code, function addresses need to be known at link-time. For imported functions, we use their PLT addresses as their symbol values. Dynamic weak undefined symbol is not representable in non-PIC.

If we are linking a position-independent code, weak undefined symbols work fine. In this case, function addresses are read from GOT slots, and their values can be zero or non-zero depending on their load-time symbol resolution results.

I think the current behavior is bad. I’d like to propose the following changes:

  1. If a linker is creating a non-PIC ELF binary, and if it finds a DSO symbol foo for an undefined weak symbol foo, then it adds foo as a strong undefined symbol to the dynamic symbol table. This prevents the above crash because the program fails to start if foo is not found at load-time, instead of crashing at run-time.

  2. If a linker is creating a non-PIC ELF binary, and if it cannot find a DSO symbol foo for an undefined weak symbol foo, then it does not add foo to the dynamic symbol table, and it sets foo’s value to zero.

In other words, my suggestion is to make the linker to not try too hard for weak undefined symbols in non-PIC. In non-PIC, if weak undefined symbols cannot be resolved at link-time, their values should be set to zero. Otherwise, they should be handled as if they were regular undefined symbol.

I believe it simplifies the semantics and also simplifies the implementation of the linker. What do you think?

Rui Ueyama <ruiu@google.com> writes:

I think the current behavior is bad. I'd like to propose the following
changes:

1. If a linker is creating a non-PIC ELF binary, and if it finds a DSO
symbol foo for an undefined weak symbol foo, then it adds foo as a *strong*
undefined symbol to the dynamic symbol table. This prevents the above crash
because the program fails to start if foo is not found at load-time,
instead of crashing at run-time.

2. If a linker is creating a non-PIC ELF binary, and if it *cannot* find a
DSO symbol foo for an undefined weak symbol foo, then it *does not* add foo to
the dynamic symbol table, and it sets foo's value to zero.

I would not phrase this as pic/non-pic. From the linker point of view
there are just relocations. I assume then that the intention is:

We have -shared/-pie options, so my intention was to use these flags. We
could use relocations to make a decision whether we should export an weak
undefined symbols or not, but I think there are a few issues with that:

1. We cannot make a decision until we visit all relocations, but we need a
decision beforehand in order to create GOT entries or report errors.

2. Sometimes we could get mixed signals -- for example, if some object file
contains a direct reference to a weak symbol, and other object file
contains a GOTPCREL reference to the same symbol, they are somewhat
conflicting.

So, just using -pie/-shared flags is simple, I guess?

Unless my jet-lagged brain has misunderstood I think the libc crt.o
module [*] that is included in non-pic Linux ELF executables has a
platform specific mechanism of evaluating whether a weak reference to
the library function "__gmon_start__ " exists before calling it. In
essence it checks the GOT entry for the function and not address of
the function, which as you point out could be the PLT entry for the
function, which will be non 0.

Ideally I think we would want for each undefined weak reference
- If we are dynamic linking create a PLT and GOT entry for each PLTGOT
generating relocation.
- The dynamic symbol of the weak undefined symbol has type STT_WEAK, I
think the programmer is responsible for writing their weak call so
that it can handle the dynamic loader not being able to find the
symbol, such as the call_weak_fn in crti.S [**].
- If there is no dynamic linking then set the value of the undefined
weak reference to 0, or any special case like Arm or AArch64.

I'm deliberately glossing over implementation problems such as how do
we know there is no dynamic linking at the point we have to make a
PLT/GOT entry? I'll try and think about this a bit more tomorrow.

[*] References, __gmon_start__ is the PREINIT_FUNCTION:
AArch64 https://code.woboq.org/userspace/glibc/sysdeps/aarch64/crti.S.html
Arm https://code.woboq.org/userspace/glibc/sysdeps/arm/crti.S.html
X86_64 https://code.woboq.org/userspace/glibc/sysdeps/x86_64/crti.S.html

[**] The ELF spec says "The behavior of weak symbols in areas not
specified by this document is implementation defined. Weak symbols are
intended primarily for use in system software. Applications using weak
symbols are unreliable since changes in the runtime environment might
cause the execution to fail."

Peter

Rui Ueyama <ruiu@google.com> writes:

I would not phrase this as pic/non-pic. From the linker point of view
there are just relocations. I assume then that the intention is:

We have -shared/-pie options, so my intention was to use these flags. We
could use relocations to make a decision whether we should export an weak
undefined symbols or not, but I think there are a few issues with that:

1. We cannot make a decision until we visit all relocations, but we need a
decision beforehand in order to create GOT entries or report errors.

2. Sometimes we could get mixed signals -- for example, if some object file
contains a direct reference to a weak symbol, and other object file
contains a GOTPCREL reference to the same symbol, they are somewhat
conflicting.

So, just using -pie/-shared flags is simple, I guess?

The question of producing a executable or not is important, but not if
that executable is position independent or not.

If we are *not* producing an executable (-shared), there is no way to
preempt a symbol and we should just produce an error if a relocation is
not using a got entry and is in a ro section. We already get that right.

What would change is what is done for executable (pie or not) and in
here I think we should consider each relocation.

If the relocation can be handled without preemption, nothing changes.

If a preemption (dummy plt or copy relocation) is needed, this is where
the change comes into play:

* If the symbol was found in a .so the resulting undefined reference
   will be strong.
* If the symbol was not found in a .so, it is resolved to 0.

So the case were we would get a different result for different
relocations is:

* We never got a definition for the undef weak in a .so.
* We first find some relocations using a .got.
* We then find a relocation that needs preemption. This will resolve to
  0.

I think this is fine, as we already switch our understanding if a symbol
is preempted or not depending on a copy relocation being needed. It can
also only happen if pic/non-pic .o files are mixed.

Cheers,
Rafael