[LLD] Incorrect comparision of pointers to function defined in DSO

Hi,

It looks like I have found a bug in LLD. Suppose DSO defines a global
variable 'data' and initializes it by the address of function
'set_data' defined in the same DSO. If an executable file (linked by
LLD) gets address of the '&set_data' function and compares it with a
value stored in the 'data' variable it gets different result. If the
executable is linked by BFD or Gold linker it gets the same result.

Right now I do not have a time to investigate this problem further. I
will plan to do that later. But maybe the reason of this problem is
obvious to somebody?

The reproduction script:

% cat so.c
void set_data(void *v) {}
void *data = &set_data;

% cat main.c
int printf(const char *, ...);

extern void *data;
void set_data(void *v);

int main(void)
{
  printf("%p = %p\n", &set_data, data);
}

% clang -fPIC -shared so.c -o libdump.so
% clang -c main.c

% clang main.o -Wl,-rpath -Wl,. -L. -ldump
% ./a.out
0x400600 = 0x400600 # The same addresses

% lld -flavor gnu --sysroot=/ --build-id --no-add-needed --eh-frame-hdr \
    -m elf_x86_64 --hash-style=both \
    -dynamic-linker /lib64/ld-linux-x86-64.so.2 \
    /usr/lib/x86_64-linux-gnu/crt1.o /usr/lib/x86_64-linux-gnu/crti.o \
    /usr/lib/gcc/x86_64-linux-gnu/4.7/crtbegin.o \
    -L. -L/usr/lib/gcc/x86_64-linux-gnu/4.7 \
    -L/usr/lib/x86_64-linux-gnu \
    -L/usr/lib -L/lib/x86_64-linux-gnu -L/lib \
    -L/usr/lib/x86_64-linux-gnu -L/usr/lib \
    main.o -rpath . -ldump -lgcc --as-needed -lgcc_s --no-as-needed \
    -lc -lgcc --as-needed -lgcc_s --no-as-needed \
    /usr/lib/gcc/x86_64-linux-gnu/4.7/crtend.o \
    /usr/lib/x86_64-linux-gnu/crtn.o
% ./a.out
0x11250 = 0x7f02915bd6b0 # garbage in 'data'

Sounds like it is related to this:

http://www.airs.com/blog/archives/42

“”"

The fact that C permits taking the address of a function introduces an interesting wrinkle. In C you are permitted to take the address of a function, and you are permitted to compare that address to another function address. The problem is that if you take the address of a function in a shared library, the natural result would be to get the address of the PLT entry. After all, that is address to which a call to the function will jump. However, each shared library has its own PLT, and thus the address of a particular function would differ in each shared library. That means that comparisons of function pointers generated in different shraed libraries may be different when they should be the same. This is not a purely hypothetical problem; when I did a port which got it wrong, before I fixed the bug I saw failures in the Tcl shared library when it compared function pointers.

The fix for this bug on most processors is a special marking for a symbol which has a PLT entry but is not defined. Typically the symbol will be marked as undefined, but with a non-zero value–the value will be set to the address of the PLT entry. When the dynamic linker is searching for the value of a symbol to use for a reloc other than a JMP_SLOT reloc, if it finds such a specially marked symbol, it will use the non-zero value. This will ensure that all references to the symbol which are not function calls will use the same value. To make this work, the compiler and assembler must make sure that any reference to a function which does not involve calling it will not carry a standard PLT reloc. This special handling of function addresses needs to be implemented in both the program linker and the dynamic linker.

“”"

Indeed, comparing the llvm-readobj -dyn-symbols output on the executables from gold and lld, I see:

— a.out.gold.readobj 2016-02-08 14:08:52.678160575 -0800
+++ a.out.lld.readobj 2016-02-08 14:08:52.678160575 -0800

Symbol {

  • Name: set_data@ (142)
  • Value: 0x400560
  • Size: 0
  • Name: set_data@ (46)
  • Value: 0x0
  • Size: 10
    Binding: Global (0x1)
    Type: Function (0x2)
    Other: 0
    Section: Undefined (0x0)
    }

You can also see this in LD_DEBUG=all when running the executables (to avoid extraneous diffs, both executables are called “./a.out.lld”; look at the diff header to know which is output from the gold executable vs lld executable):

— ld_debug-a.out.gold 2016-02-08 14:07:27.255734743 -0800
+++ ld_debug-a.out.lld 2016-02-08 14:07:27.255734743 -0800

relocation processing: ./libdump.so (lazy)
symbol=set_data; lookup in file=./a.out.lld [0]

  • binding file ./libdump.so [0] to ./a.out.lld [0]: normal symbol `set_data’
  • symbol=set_data; lookup in file=./libdump.so [0]
  • binding file ./libdump.so [0] to ./libdump.so [0]: normal symbol `set_data’

For gold, the symbol is bound to the one in a.out (the PLT entry), while for lld it is bound to the one in libdump.so.

– Sean Silva

Yes, I had just reduced it too.

I am pretty sure this was implemented at some point. Taking a look.

Cheers,
Rafael

No, it was not implemented, I just added a note:

    // The remaining (unimplemented) problem is making sure pointer
equality
    // still works. We need the help of the dynamic linker for that.
We
    // let it know that we have a direct reference to a so symbol by
creating
    // an undefined symbol with a non zero st_value. Seeing that, the
    // dynamic linker resolves the symbol to the value of the symbol
we created.
    // This is true even for got entries, so pointer equality is
maintained.
    // To avoid an infinite loop, the only entry that points to the
    // real function is a dedicated got entry used by the plt. That is
    // identified by special relocation types (R_X86_64_JUMP_SLOT,
    // R_386_JMP_SLOT, etc).

Taking a look.

http://reviews.llvm.org/D17007

Cheers,
Rafael

Thanks for the explanation. For some reasons I was sure that this
functionality has been implemented in LLD2. Probably I mixed up LLD
and LLD2.

It is interesting that as usual MIPS requires some additional work. In
particular we need to set up STO_MIPS_PLT bit in the st_other filed of
the symbol which points to PLT.

"Function addresses" chapter in the
https://sourceware.org/ml/binutils/2008-07/txt00000.txt