lld symbol choice for symbol present in both a shared and a static library, with and without LTO

I filed https://bugs.llvm.org/show_bug.cgi?id=42273 last night, about an inconsistency between LTO and non-LTO workflows.

The basic scenario is that we have an object file which calls a function “foo”, a static library that provides an implementation of “foo”, and a shared library that also provides an implementation of “foo”. Currently, whether lld chooses the symbol from the static library or the shared library depends on the order the files are specified on the command-line. For “obj.o static.a shared.so”, or “static.a obj.o shared.so”, lld chooses the symbol from the static library. For any other order, it chooses the symbol from the shared library. Is this the expected behavior? (As far as I can tell, this matches binutils ld except for the “static.a obj.o shared.so” case.)

If “obj.o” is built with LTO enabled, and the function is specifically a runtime function, the behavior is different. For example, suppose the IR contains a call to “llvm.memcpy”, and the generated code eventually calls “memcpy”. Or suppose the IR contains a “resume” instruction, and the generated code eventually calls “_Unwind_Resume”. In this case, the choice is different: lld always chooses the “memcpy” or “_Unwind_Resume” from the shared library, ignoring the order the files are specified on the command-line. Is this the expected behavior?

-Eli

I filed https://bugs.llvm.org/show_bug.cgi?id=42273 last night, about an inconsistency between LTO and non-LTO workflows.

The basic scenario is that we have an object file which calls a function “foo”, a static library that provides an implementation of “foo”, and a shared library that also provides an implementation of “foo”. Currently, whether lld chooses the symbol from the static library or the shared library depends on the order the files are specified on the command-line. For “obj.o static.a shared.so”, or “static.a obj.o shared.so”, lld chooses the symbol from the static library. For any other order, it chooses the symbol from the shared library. Is this the expected behavior? (As far as I can tell, this matches binutils ld except for the “static.a obj.o shared.so” case.)

That would match my expectations. The symbol tables are loaded in left
to right order so if static.a comes before shared.so it's symbols will
be matched against first. In GNU ld, as you point out, once a library
has been passed in the command line its symbols are forgotten whereas
in LLD they are not, hence the difference with static.a obj.o
shared.so).

One area where the dynamic library is preferred is when -l or --library is used.
When -lfoo is used and libfoo.a and libfoo.so both exist, both LLD and
ld.bfd will prefer libfoo.so to libfoo.a when searching for the
library, unless -Bstatic is in force at the time.

If “obj.o” is built with LTO enabled, and the function is specifically a runtime function, the behavior is different. For example, suppose the IR contains a call to “llvm.memcpy”, and the generated code eventually calls “memcpy”. Or suppose the IR contains a “resume” instruction, and the generated code eventually calls “_Unwind_Resume”. In this case, the choice is different: lld always chooses the “memcpy” or “_Unwind_Resume” from the shared library, ignoring the order the files are specified on the command-line. Is this the expected behavior?

As I understand it, there is no more selection of members from static
libraries after the LTO code-generator has run. In the example from
the PR there is no other object with a reference to memcpy so the
member containing the static definition is not loaded, leaving only
the shared library to match against. I would expect if there were
another reference to memcpy from a bitcode file or another ELF file
and the static library was before the shared then it would match
against that.

As to whether this is expected or not, I don't know for certain. One
desirable property of not selecting more objects from static libraries
is that you are guaranteed not to load any more bitcode files from
static libraries, which would either need compiling separately from
the other bitcode files, or have the whole compilation done again with
the new objects, which could cause more bitcode files to be loaded
etc.

There is a comment at
https://github.com/llvm-mirror/lld/blob/master/ELF/Driver.cpp#L1733
which hints at special treatment for functions named in
llvm/IR/RuntimeLibcalls.def this includes memcpy and _Unwind_Resume. I
don't know enough about LTO to know whether it makes a difference in
this case. May be worth a look.

Peter

I filed https://bugs.llvm.org/show_bug.cgi?id=42273 last night, about an inconsistency between LTO and non-LTO workflows.

The basic scenario is that we have an object file which calls a function “foo”, a static library that provides an implementation of “foo”, and a shared library that also provides an implementation of “foo”. Currently, whether lld chooses the symbol from the static library or the shared library depends on the order the files are specified on the command-line. For “obj.o static.a shared.so”, or “static.a obj.o shared.so”, lld chooses the symbol from the static library. For any other order, it chooses the symbol from the shared library. Is this the expected behavior? (As far as I can tell, this matches binutils ld except for the “static.a obj.o shared.so” case.)

This is what I expected. When lld visits an object file A and find an undefined symbol, and there’s a file B that appears before the object file in the command line that defines the symbol, then B gets linked. If there’s more than one file that define the symbol, the leftmost one is chosen.

If “obj.o” is built with LTO enabled, and the function is specifically a runtime function, the behavior is different. For example, suppose the IR contains a call to “llvm.memcpy”, and the generated code eventually calls “memcpy”. Or suppose the IR contains a “resume” instruction, and the generated code eventually calls “_Unwind_Resume”. In this case, the choice is different: lld always chooses the “memcpy” or “_Unwind_Resume” from the shared library, ignoring the order the files are specified on the command-line. Is this the expected behavior?

That’s not expected, but I suspect that that only occurs when you use a builtin function like memcpy. Does this happen when you define some random function like “foo”?

I filed https://bugs.llvm.org/show_bug.cgi?id=42273 last night, about an inconsistency between LTO and non-LTO workflows.

The basic scenario is that we have an object file which calls a function “foo”, a static library that provides an implementation of “foo”, and a shared library that also provides an implementation of “foo”. Currently, whether lld chooses the symbol from the static library or the shared library depends on the order the files are specified on the command-line. For “obj.o static.a shared.so”, or “static.a obj.o shared.so”, lld chooses the symbol from the static library. For any other order, it chooses the symbol from the shared library. Is this the expected behavior? (As far as I can tell, this matches binutils ld except for the “static.a obj.o shared.so” case.)

This is what I expected. When lld visits an object file A and find an undefined symbol, and there’s a file B that appears before the object file in the command line that defines the symbol, then B gets linked. If there’s more than one file that define the symbol, the leftmost one is chosen.

If “obj.o” is built with LTO enabled, and the function is specifically a runtime function, the behavior is different. For example, suppose the IR contains a call to “llvm.memcpy”, and the generated code eventually calls “memcpy”. Or suppose the IR contains a “resume” instruction, and the generated code eventually calls “_Unwind_Resume”. In this case, the choice is different: lld always chooses the “memcpy” or “_Unwind_Resume” from the shared library, ignoring the order the files are specified on the command-line. Is this the expected behavior?

That’s not expected, but I suspect that that only occurs when you use a builtin function like memcpy. Does this happen when you define some random function like “foo”?

I believe this is going to be specific to builtin functions. The reason is that the LTO link is fed by bitcode files, which at this point have references to the llvm intrinsic, not the library call. So the linker, which invokes the LTO compilation and provides the symbol resolutions, does not see any call to e.g. “memcpy”. Later, in the LTO backends, the intrinsic gets turned into something, depending on the compiler’s heuristics. This something could be an inline expansion of memcpy, or a regular call to memcpy.

For these libcalls, to avoid this behavior build with -fno-builtin-memcpy (or other libcall name), or more generally, -fno-builtin or -ffreestanding to block them all.

Teresa

If “obj.o” is built with LTO enabled, and the function is specifically a runtime function, the behavior is different. For example, suppose the IR contains a call to “llvm.memcpy”, and the generated code eventually calls “memcpy”. Or suppose the IR contains a “resume” instruction, and the generated code eventually calls “_Unwind_Resume”. In this case, the choice is different: lld always chooses the “memcpy” or “_Unwind_Resume” from the shared library, ignoring the order the files are specified on the command-line. Is this the expected behavior?

That's not expected, but I suspect that that only occurs when you use a builtin function like memcpy. Does this happen when you define some random function like "foo"?

I believe this is going to be specific to builtin functions. The reason is that the LTO link is fed by bitcode files, which at this point have references to the llvm intrinsic, not the library call. So the linker, which invokes the LTO compilation and provides the symbol resolutions, does not see any call to e.g. "memcpy". Later, in the LTO backends, the intrinsic gets turned into something, depending on the compiler's heuristics. This something could be an inline expansion of memcpy, or a regular call to memcpy.

Yes, this specifically applies to builtin functions, as far as I can tell.

For these libcalls, to avoid this behavior build with -fno-builtin-memcpy (or other libcall name), or more generally, -fno-builtin or -ffreestanding to block them all.

Unfortunately, this doesn’t work for _Unwind_Resume, which is the symbol that’s actually causing the issue in the scenario I’m looking at.
-Eli

For runtime functions defined in bitcode, we avoid the "double-LTO" scenario you describe by including them in the LTO link even if we can't prove they will be used. This is the handleLibcall code you pointed out. (https://github.com/llvm-mirror/lld/blob/master/ELF/Driver.cpp#L1733). As the comment there describes, we don't do this for runtime functions which are not defined in bitcode, to avoid other side-effects; instead we resolve those symbols after LTO.

For the scenario I'm describing, though, it looks like the key decision here is made in SymbolTable::addShared, before handleLibcall and LTO. If a symbol is defined in both a static library and a shared library, and we haven't seen a reference to the static library's symbol at that point, we throw away the record of the symbol defined in the static library.

Ultimately, I guess the question is what alternatives are possible, without breaking the scenarios handleLibcall is supposed to handle. I see a few possibilities here:

1. Whenever we see any bitcode file, treat it as referencing every possible runtime function, even those defined in non-bitcode static libraries. Then we try to resolve the __sync_val_compare_and_swap_8 issue from https://reviews.llvm.org/D50475 some other way.
2. Change the symbol resolution that runs after LTO to use a different symbol resolution rules from normal non-LTO/before-LTO symbol resolution, so it finds the function from the static library instead of the shared library.
3. Change symbol resolution in general to prefer "lazy" symbols from static libraries over symbols from shared libraries, even outside LTO. So "static.a shared.so object.o" picks the symbol from static.a, instead of shared.so like it does now.
4. We WONTFIX https://bugs.llvm.org/show_bug.cgi?id=42273 .

-Eli

From: Peter Smith <peter.smith@linaro.org>
Sent: Monday, June 17, 2019 3:33 AM
To: Eli Friedman <efriedma@qualcomm.com>
Cc: llvm-dev <llvm-dev@lists.llvm.org>
Subject: [EXT] Re: [llvm-dev] lld symbol choice for symbol present in both a
shared and a static library, with and without LTO

If “obj.o” is built with LTO enabled, and the function is specifically a runtime
function, the behavior is different. For example, suppose the IR contains a call
to “llvm.memcpy”, and the generated code eventually calls “memcpy”. Or
suppose the IR contains a “resume” instruction, and the generated code
eventually calls “_Unwind_Resume”. In this case, the choice is different: lld
always chooses the “memcpy” or “_Unwind_Resume” from the shared library,
ignoring the order the files are specified on the command-line. Is this the
expected behavior?

As I understand it, there is no more selection of members from static
libraries after the LTO code-generator has run. In the example from
the PR there is no other object with a reference to memcpy so the
member containing the static definition is not loaded, leaving only
the shared library to match against. I would expect if there were
another reference to memcpy from a bitcode file or another ELF file
and the static library was before the shared then it would match
against that.

As to whether this is expected or not, I don’t know for certain. One
desirable property of not selecting more objects from static libraries
is that you are guaranteed not to load any more bitcode files from
static libraries, which would either need compiling separately from
the other bitcode files, or have the whole compilation done again with
the new objects, which could cause more bitcode files to be loaded
etc.

For runtime functions defined in bitcode, we avoid the “double-LTO” scenario you describe by including them in the LTO link even if we can’t prove they will be used. This is the handleLibcall code you pointed out. (https://github.com/llvm-mirror/lld/blob/master/ELF/Driver.cpp#L1733). As the comment there describes, we don’t do this for runtime functions which are not defined in bitcode, to avoid other side-effects; instead we resolve those symbols after LTO.

For the scenario I’m describing, though, it looks like the key decision here is made in SymbolTable::addShared, before handleLibcall and LTO. If a symbol is defined in both a static library and a shared library, and we haven’t seen a reference to the static library’s symbol at that point, we throw away the record of the symbol defined in the static library.

Ultimately, I guess the question is what alternatives are possible, without breaking the scenarios handleLibcall is supposed to handle. I see a few possibilities here:

  1. Whenever we see any bitcode file, treat it as referencing every possible runtime function, even those defined in non-bitcode static libraries. Then we try to resolve the __sync_val_compare_and_swap_8 issue from https://reviews.llvm.org/D50475 some other way.

That seems technically doable, but how do we know the names of all possible runtime functions?

  1. Change the symbol resolution that runs after LTO to use a different symbol resolution rules from normal non-LTO/before-LTO symbol resolution, so it finds the function from the static library instead of the shared library.

We don’t actually do any symbol resolution after LTO. We merge a result of LTO to other object files, but no new symbols are expected to appear after LTO. We can change that assumption of course, but that’s perhaps too much.

  1. Change symbol resolution in general to prefer “lazy” symbols from static libraries over symbols from shared libraries, even outside LTO. So “static.a shared.so object.o” picks the symbol from static.a, instead of shared.so like it does now.

This change seems risky.

  1. We WONTFIX https://bugs.llvm.org/show_bug.cgi?id=42273 .

The other option I can think of is to add a command line option to force loading a file from a static archive. With -u, we can force loading a member file when a specified name remains undefined after name resolution. That doesn’t work for this case because after LTO memcpy is not an undefined symbol but a library symbol. So, maybe we can define a new option -U to insert a given name as an undefined symbol from the beginning, which forces the linker to load a member file immediately after it finds one.

> From: Peter Smith <peter.smith@linaro.org>
> Sent: Monday, June 17, 2019 3:33 AM
> To: Eli Friedman <efriedma@qualcomm.com>
> Cc: llvm-dev <llvm-dev@lists.llvm.org>
> Subject: [EXT] Re: [llvm-dev] lld symbol choice for symbol present in both a
> shared and a static library, with and without LTO
>
> >
> >
> >
> > If “obj.o” is built with LTO enabled, and the function is specifically a runtime
> function, the behavior is different. For example, suppose the IR contains a call
> to “llvm.memcpy”, and the generated code eventually calls “memcpy”. Or
> suppose the IR contains a “resume” instruction, and the generated code
> eventually calls “_Unwind_Resume”. In this case, the choice is different: lld
> always chooses the “memcpy” or “_Unwind_Resume” from the shared library,
> ignoring the order the files are specified on the command-line. Is this the
> expected behavior?
>
> As I understand it, there is no more selection of members from static
> libraries after the LTO code-generator has run. In the example from
> the PR there is no other object with a reference to memcpy so the
> member containing the static definition is not loaded, leaving only
> the shared library to match against. I would expect if there were
> another reference to memcpy from a bitcode file or another ELF file
> and the static library was before the shared then it would match
> against that.
>
> As to whether this is expected or not, I don't know for certain. One
> desirable property of not selecting more objects from static libraries
> is that you are guaranteed not to load any more bitcode files from
> static libraries, which would either need compiling separately from
> the other bitcode files, or have the whole compilation done again with
> the new objects, which could cause more bitcode files to be loaded
> etc.

For runtime functions defined in bitcode, we avoid the "double-LTO" scenario you describe by including them in the LTO link even if we can't prove they will be used. This is the handleLibcall code you pointed out. (https://github.com/llvm-mirror/lld/blob/master/ELF/Driver.cpp#L1733). As the comment there describes, we don't do this for runtime functions which are not defined in bitcode, to avoid other side-effects; instead we resolve those symbols after LTO.

For the scenario I'm describing, though, it looks like the key decision here is made in SymbolTable::addShared, before handleLibcall and LTO. If a symbol is defined in both a static library and a shared library, and we haven't seen a reference to the static library's symbol at that point, we throw away the record of the symbol defined in the static library.

Ultimately, I guess the question is what alternatives are possible, without breaking the scenarios handleLibcall is supposed to handle. I see a few possibilities here:

1. Whenever we see any bitcode file, treat it as referencing every possible runtime function, even those defined in non-bitcode static libraries. Then we try to resolve the __sync_val_compare_and_swap_8 issue from https://reviews.llvm.org/D50475 some other way.

Is it out of the question for the bitcode files to add the set of
libcalls they may potentially call in the bitcode symbol table? If
this were possible then handleLibcall wouldn't be necessary as all the
dependencies would be explicit in the bitcode file symbol table. I can
see this working if LTO only eliminates the libcall, but would not if
the decision between incompatible libcalls was made at LTO time.

2. Change the symbol resolution that runs after LTO to use a different symbol resolution rules from normal non-LTO/before-LTO symbol resolution, so it finds the function from the static library instead of the shared library.

I think this would be tricky with LLD's current implementation as a
lot of the information about what candidate symbols in which library
is lost as part of the merging process. I think it would essentially
be another implementation.

3. Change symbol resolution in general to prefer "lazy" symbols from static libraries over symbols from shared libraries, even outside LTO. So "static.a shared.so object.o" picks the symbol from static.a, instead of shared.so like it does now.

While there isn't any requirements or specification for how a linker
should do symbol resolution; LLD does seem to match ld.bfd with
--start-group memcpy.a memcpy.so input.o --end-group, the symbol from
memcpy.so is preferred. As Rui points out this would be risky as it
could open up projects to code-size increases to more multiply defined
symbol errors.

4. We WONTFIX https://bugs.llvm.org/show_bug.cgi?id=42273 .

I guess this depends on to what extent this is a problem. If it is a
small number of programs affected then it can probably be resolved by
adding an ELF file placed at the start of the command line with
undefined references to the specific ELF libcall symbols. If it is a
serious problem for almost everyone using LTO then it might be worth
an alternative library scan code in LLD to handle it.

Peter

From: Peter Smith <peter.smith@linaro.org>
Sent: Monday, June 17, 2019 3:33 AM
To: Eli Friedman <efriedma@qualcomm.com>
Cc: llvm-dev <llvm-dev@lists.llvm.org>
Subject: [EXT] Re: [llvm-dev] lld symbol choice for symbol present in both a
shared and a static library, with and without LTO

If “obj.o” is built with LTO enabled, and the function is specifically a runtime
function, the behavior is different. For example, suppose the IR contains a call
to “llvm.memcpy”, and the generated code eventually calls “memcpy”. Or
suppose the IR contains a “resume” instruction, and the generated code
eventually calls “_Unwind_Resume”. In this case, the choice is different: lld
always chooses the “memcpy” or “_Unwind_Resume” from the shared library,
ignoring the order the files are specified on the command-line. Is this the
expected behavior?

As I understand it, there is no more selection of members from static
libraries after the LTO code-generator has run. In the example from
the PR there is no other object with a reference to memcpy so the
member containing the static definition is not loaded, leaving only
the shared library to match against. I would expect if there were
another reference to memcpy from a bitcode file or another ELF file
and the static library was before the shared then it would match
against that.

As to whether this is expected or not, I don’t know for certain. One
desirable property of not selecting more objects from static libraries
is that you are guaranteed not to load any more bitcode files from
static libraries, which would either need compiling separately from
the other bitcode files, or have the whole compilation done again with
the new objects, which could cause more bitcode files to be loaded
etc.

For runtime functions defined in bitcode, we avoid the “double-LTO” scenario you describe by including them in the LTO link even if we can’t prove they will be used. This is the handleLibcall code you pointed out. (https://github.com/llvm-mirror/lld/blob/master/ELF/Driver.cpp#L1733). As the comment there describes, we don’t do this for runtime functions which are not defined in bitcode, to avoid other side-effects; instead we resolve those symbols after LTO.

For the scenario I’m describing, though, it looks like the key decision here is made in SymbolTable::addShared, before handleLibcall and LTO. If a symbol is defined in both a static library and a shared library, and we haven’t seen a reference to the static library’s symbol at that point, we throw away the record of the symbol defined in the static library.

Ultimately, I guess the question is what alternatives are possible, without breaking the scenarios handleLibcall is supposed to handle. I see a few possibilities here:

  1. Whenever we see any bitcode file, treat it as referencing every possible runtime function, even those defined in non-bitcode static libraries. Then we try to resolve the __sync_val_compare_and_swap_8 issue from https://reviews.llvm.org/D50475 some other way.

Is it out of the question for the bitcode files to add the set of
libcalls they may potentially call in the bitcode symbol table? If
this were possible then handleLibcall wouldn’t be necessary as all the
dependencies would be explicit in the bitcode file symbol table. I can
see this working if LTO only eliminates the libcall, but would not if
the decision between incompatible libcalls was made at LTO time.

I haven’t thought of that before, but this might be a good idea. At least it is not out of the question.