[lld] elf linker creates undefined empty symbol

Hi,

When running my own lld generated library/executable I'm getting:

LD_LIBRARY_PATH=. ./ConsoleApplication347
./ConsoleApplication347: symbol lookup error: ./ConsoleApplication347: undefined symbol:

(theres nothing after undefined symbol)

How can I figure out what's I'm doing wrong?

Full log:
https://gist.github.com/carlokok/1dd510a16e1922271b520f1c00b14656

readelf -s for ConsoleApplication347:
https://gist.github.com/carlokok/0950e4b33e0bf421852b3ac58fc18aea

readelf -s for ClassLibrary22.so:
https://gist.github.com/carlokok/109a03620abb95bdad6479426e3dce11

lld command line used:
lld -flavor gnu -O0 --lto-O0
--eh-frame-hdr --dynamic-linker "/lib64/ld-linux-x86-64.so.2"
"-L."
"-oConsoleApplication347"
"RemObjects.Elements.Cirrus.ConsoleApplication347.o"
"libgc.a" "Island.a" "RemObjects.Elements.Cirrus.importlib-ConsoleApplication347-libc.so" "RemObjects.Elements.Cirrus.importlib-ConsoleApplication347-ClassLibrary22" "RemObjects.Elements.Cirrus.importlib-ConsoleApplication347-libpthread.so" "RemObjects.Elements.Cirrus.importlib-ConsoleApplication347-librt.so" "RemObjects.Elements.Cirrus.importlib-ConsoleApplication347-libgcc_s.so"

Input files:
https://www.dropbox.com/s/8yn3dggx05atn47/binLinux.zip?dl=0

output files:
https://www.dropbox.com/s/vxzl9jfkssmp3tk/testapp.zip?dl=0

Did you try to make a smaller test case? I’d try to remove unrelated code from your project to minimize the code size so that it is easy to find a cause.

Input files:
https://www.dropbox.com/s/8yn3dggx05atn47/binLinux.zip?dl=0

If you pass --reproduce foo.tar to lld it will create a foo.tar file
with all that is needed to reproduce the link.

Can you also share how you created the various .o files? If so I might
be able to try reducing the issue.

Cheers,
Rafael

It's created by my own compiler.

https://www.dropbox.com/s/rmkyqks4lnr85rz/foo.tar?dl=0

My biggest problem is that I have no idea where I can start trying to narrow it down, on the so side, or on the executable side, the error is rather strange to begin with.

It’s scary to run the binary on my machine (and I don’t have a sandbox at the moment), so I cannot actually run it, but there are a few things you can do.

First of all, “symbol lookup error” is not LLD’s error message. It’s likely an error message of the dynamic linker. Try running it with LD_VERBOSE=1 LD_TRACE_LOADED_OBJECTS=1 or something. Refer man ld.so for other dynamic linker debug options.

Carlo Kok <ck@remobjects.com> writes:

Input files:
https://www.dropbox.com/s/8yn3dggx05atn47/binLinux.zip?dl=0

If you pass --reproduce foo.tar to lld it will create a foo.tar file
with all that is needed to reproduce the link.

Can you also share how you created the various .o files? If so I might
be able to try reducing the issue.

It's created by my own compiler.

https://www.dropbox.com/s/rmkyqks4lnr85rz/foo.tar?dl=0

My biggest problem is that I have no idea where I can start trying to
narrow it down, on the so side, or on the executable side, the error is
rather strange to begin with.

I would suggest setting up a script that links each .so and executable
with either lld or bfd. That way you should be able to find which link
causes the problem.

After that start reducing the problem. If it was c++, you would run
delta on the .ii file checking that the bfd linked program/library works
and that the lld linked one fails to load.

Cheers,
Rafael

Carlo Kok <ck@remobjects.com> writes:

>>> Input files:
>>> https://www.dropbox.com/s/8yn3dggx05atn47/binLinux.zip?dl=0
>>
>> If you pass --reproduce foo.tar to lld it will create a foo.tar file
>> with all that is needed to reproduce the link.
>>
>> Can you also share how you created the various .o files? If so I might
>> be able to try reducing the issue.
>
> It's created by my own compiler.
>
> https://www.dropbox.com/s/rmkyqks4lnr85rz/foo.tar?dl=0
>
> My biggest problem is that I have no idea where I can start trying to
> narrow it down, on the so side, or on the executable side, the error is
> rather strange to begin with.

I would suggest setting up a script that links each .so and executable
with either lld or bfd. That way you should be able to find which link
causes the problem.

After that start reducing the problem. If it was c++, you would run
delta on the .ii file checking that the bfd linked program/library works
and that the lld linked one fails to load.

Carlo seems to be passing --lto-O0 so bugpoint might be a viable
alternative as well if the input is bitcode.

-- Sean Silva

    Carlo Kok <ck@remobjects.com <mailto:ck@remobjects.com>> writes:

    >>> Input files:
    >>> https://www.dropbox.com/s/8yn3dggx05atn47/binLinux.zip?dl=0
    <https://www.dropbox.com/s/8yn3dggx05atn47/binLinux.zip?dl=0>
    >>
    >> If you pass --reproduce foo.tar to lld it will create a foo.tar file
    >> with all that is needed to reproduce the link.
    >>
    >> Can you also share how you created the various .o files? If so I might
    >> be able to try reducing the issue.
    >
    > It's created by my own compiler.
    >
    > https://www.dropbox.com/s/rmkyqks4lnr85rz/foo.tar?dl=0
    <https://www.dropbox.com/s/rmkyqks4lnr85rz/foo.tar?dl=0>
    >
    > My biggest problem is that I have no idea where I can start trying to
    > narrow it down, on the so side, or on the executable side, the error is
    > rather strange to begin with.

    I would suggest setting up a script that links each .so and executable
    with either lld or bfd. That way you should be able to find which link
    causes the problem.

    After that start reducing the problem. If it was c++, you would run
    delta on the .ii file checking that the bfd linked program/library works
    and that the lld linked one fails to load.

Carlo seems to be passing --lto-O0 so bugpoint might be a viable
alternative as well if the input is bitcode.

-- Sean Silva

Should anyone ever get this, Sean Silva found this:

declare extern_weak hidden void @__libc_start_main(i32 (i32, i8**, i8**)*, i32, i16**, i32 (i32, i8**, i8**)*, void ()*)

triggered a rogue relocation to (0). Making it non hidden fixes this.

Rafael, weird thing is, gnu ld is perfectly fine with this, so not sure if this is a bug.

Rafael, here is a repro.tar to look at: https://reviews.llvm.org/F3100177

The attached foo.diff adds a print which shows the issue.


NAME: sleep SYMINDEX: 2
NAME: sched_yield SYMINDEX: 1
NAME: __libc_start_main SYMINDEX: 0

readelf --relocs Shows that we create :

000000255110 002900000007 R_X86_64_JUMP_SLO 0000000000254410 __xstat@GLIBC_2.2.5 + 0
000000255118 001e00000007 R_X86_64_JUMP_SLO 0000000000254420 __fxstat@GLIBC_2.2.5 + 0
000000255120 000000000007 R_X86_64_JUMP_SLO 0
000000255128 002c00000007 R_X86_64_JUMP_SLO 0000000000254440 uname@GLIBC_2.2.5 + 0
000000255130 001b00000007 R_X86_64_JUMP_SLO 0000000000254450 getenv@GLIBC_2.2.5 + 0

When __libc_start_main is hidden, it doesn’t end up in the dynamic symbol table and so we use the default DynsymIndex of 0.

– Sean Silva

foo.diff (897 Bytes)

Rafael, here is a repro.tar to look at: https://reviews.llvm.org/F3100177

The attached foo.diff adds a print which shows the issue.

NAME: sleep SYMINDEX: 2
NAME: sched_yield SYMINDEX: 1
NAME: __libc_start_main SYMINDEX: 0

`readelf --relocs` Shows that we create :

...
000000255110 002900000007 R_X86_64_JUMP_SLO 0000000000254410
__xstat@GLIBC_2.2.5 + 0
000000255118 001e00000007 R_X86_64_JUMP_SLO 0000000000254420
__fxstat@GLIBC_2.2.5 + 0
000000255120 000000000007 R_X86_64_JUMP_SLO 0
000000255128 002c00000007 R_X86_64_JUMP_SLO 0000000000254440
uname@GLIBC_2.2.5 + 0
000000255130 001b00000007 R_X86_64_JUMP_SLO 0000000000254450
getenv@GLIBC_2.2.5 + 0
...

When __libc_start_main is hidden, it doesn't end up in the dynamic symbol
table and so we use the default DynsymIndex of 0.

I think BFD is doing the right thing. The `extern_weak hidden` is resolved
to null since by virtue of being hidden it doesn't refer to the function in
libc.so (and there is no definition in the executable). For some reason,
LLD thinks that it needs to resolve __libc_start_main dynamically and
things go horribly wrong. Rafael, what do you think?

-- Sean Silva

Sean Silva <chisophugis@gmail.com> writes:

When __libc_start_main is hidden, it doesn't end up in the dynamic symbol
table and so we use the default DynsymIndex of 0.

I think BFD is doing the right thing. The `extern_weak hidden` is resolved
to null since by virtue of being hidden it doesn't refer to the function in
libc.so (and there is no definition in the executable). For some reason,
LLD thinks that it needs to resolve __libc_start_main dynamically and
things go horribly wrong. Rafael, what do you think?

If it is hidden it should really not end up in the dynamic symbol
table. I will try to take a look at your reproducible latter in the day.

Cheers,
Rafael

Sean Silva <chisophugis@gmail.com> writes:
>> When __libc_start_main is hidden, it doesn't end up in the dynamic
symbol
>> table and so we use the default DynsymIndex of 0.
>>
>
>
> I think BFD is doing the right thing. The `extern_weak hidden` is
resolved
> to null since by virtue of being hidden it doesn't refer to the function
in
> libc.so (and there is no definition in the executable). For some reason,
> LLD thinks that it needs to resolve __libc_start_main dynamically and
> things go horribly wrong. Rafael, what do you think?

If it is hidden it should really not end up in the dynamic symbol
table. I will try to take a look at your reproducible latter in the day.

FWIW, gold issues an error:

/usr/bin/ld.gold: error: hidden symbol '__libc_start_main' is not defined
locally
/usr/bin/ld.gold: error: hidden symbol '__libc_start_main' is not defined
locally

In theory `extern_weak hidden` could still fail to have a local definition
and then it would resolve to 0 due to being weak, which is what BFD does.
But `extern_weak hidden` is still pretty questionable. E.g. in the original
case that Carlo ran into, this was not the expected behavior (and this
error may have saved him some time actually).

-- Sean Silva

Rafael, did you ever get a chance to look at this?

– Sean Silva

Sorry, not yet. Crazy busy with internal bots.

Cheers,
Rafael

Sean Silva <chisophugis@gmail.com> writes:

OK, I finally had time to take a look. The issue was just us not
realizing that a hidden undef and a definition in a .so should not be
treated like the same symbol.

Fixed in r299464.

Cheers,
Rafael

Thanks!