[RFC] Linkage of user-supplied library functions in LTO

+nick and rafael, who seem to a lot about linkage.

I made the following claim on llmv-commits [1]:

Giving these functions internal linkage allows them to be dead-stripped.

Is that even correct?

This is the assumption I’ve been working under, but I’m not sure where I
got it from. It seems like the linker is free to dead-strip symbols
whether they’re internal or external.

The current state of user-supplied library functions in LTO is that we
internalize library functions, add them to llvm.compiler.used so that
optimizations don’t modify them, and then optimizations incorrectly
modify them. LLVM *really* doesn't expect library functions to have
local linkage.

And I’m not sure internal linkage is the right model anyway.

I see two paths forward:

1. Add a new linkage type called “linker_internal”, which LLVM treats as
    a type of non-local linkage, but gets emitted as internal. This
    might be worth it if linkers don’t dead strip external symbols.

2. If linkers *do* dead strip external symbols, then we should not
    internalize user-supplied library functions in -internalize.

Do linkers dead strip symbols with external linkage? Any other reason to
prefer one path over the other? Is there another way?

Duncan

[1]: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140303/207033.html

+nick and rafael, who seem to a lot about linkage.

I made the following claim on llmv-commits [1]:

Giving these functions internal linkage allows them to be dead-stripped.

Is that even correct?

If by "linker" you mean things like /bin/ld, then linkers can garbage-collect sections that don't contain any referenced symbols, not the symbols themselves. I believe it doesn't matter if the symbols in sections are internal or external---that only matters for symbol resolution.

The current state of user-supplied library functions in LTO is that we
internalize library functions, add them to llvm.compiler.used so that
optimizations don’t modify them, and then optimizations incorrectly
modify them. LLVM *really* doesn't expect library functions to have
local linkage.

I've read the original thread (the 3 emails), and I'm still not sure what the purpose of internalization is in the context of user-provided library functions. If the output of LTO is one giant object file, it could make some sense (since the assembler could potentially do the "symbol resolution"). Otherwise the problem is in telling the "ld" which definition of "printf" it needs to pick up, or asking the user not to link the program with libc (a bit of a questionable request).

How are optimizations "incorrectly modifying" (user-provided) library functions?

-Krzysztof

+nick and rafael, who seem to a lot about linkage.

I made the following claim on llmv-commits [1]:

Giving these functions internal linkage allows them to be dead-stripped.

Is that even correct?

This is the assumption I’ve been working under, but I’m not sure where I
got it from. It seems like the linker is free to dead-strip symbols
whether they’re internal or external.

The darwin linker does not care about internal vs external when doing
liveness analysis. But, by default when dylibs (DSOs) are created,
all global symbols are marked live at the start of the analysis. This
is not the case for main executables which just have main() and any
initializers marked live initially.

The current state of user-supplied library functions in LTO is that we
internalize library functions, add them to llvm.compiler.used so that
optimizations don’t modify them, and then optimizations incorrectly
modify them. LLVM *really* doesn't expect library functions to have
local linkage.

Since most code is dynamic that is probably why LLVM expects library
functions to not be local.

The case that revealed this bug was someone building a static binary.
Most code these days is dynamic, so libLTO will not see the implementation
of libcalls functions. Whereas with static binaries, it will always see libcall
function implementations. I don't know if supply that (static vs dynamic)
to libLTO would help.

And I’m not sure internal linkage is the right model anyway.

I see two paths forward:

1. Add a new linkage type called “linker_internal”, which LLVM treats as
   a type of non-local linkage, but gets emitted as internal. This
   might be worth it if linkers don’t dead strip external symbols.

2. If linkers *do* dead strip external symbols, then we should not
   internalize user-supplied library functions in -internalize.

Do linkers dead strip symbols with external linkage?

This is probably the wrong question. When linking main executables, the
linker can dead strip external functions. But if LTO is used with a main
executable, the linker will not tell libLTO to preserve the global symbols,
so they will quickly be made internal.

My question: will libLTO ever dead strip a non-internal function? If not,
is that one of the reasons the internalize pass tries to make functions
internal?

Any other reason to
prefer one path over the other? Is there another way?
From the darwin linker's perspective, if libLTO does not dead strip

a libcall function implementation, that is OK. The linker runs another
liveness analysis pass after the libLTO result and any other mach-o f
files are merged. So any extra functions will still get deleted from the
final output.

-Nick

Giving these functions internal linkage allows them to be dead-stripped.

Is that even correct?

By LLVM point of view, yes. It can drop linkonce and local (private* +
internal) globals.

This is the assumption I’ve been working under, but I’m not sure where I
got it from. It seems like the linker is free to dead-strip symbols
whether they’re internal or external.

The system linker, yes. LLVM knows it is not seeing the full picture
with regards to external ones.

The current state of user-supplied library functions in LTO is that we
internalize library functions, add them to llvm.compiler.used so that
optimizations don’t modify them, and then optimizations incorrectly
modify them. LLVM *really* doesn't expect library functions to have
local linkage.

Why not just fix the optimizations that are not handling
llvm.compiler.used correctly?

And I’m not sure internal linkage is the right model anyway.

I see two paths forward:

1. Add a new linkage type called “linker_internal”, which LLVM treats as
    a type of non-local linkage, but gets emitted as internal. This
    might be worth it if linkers don’t dead strip external symbols.

2. If linkers *do* dead strip external symbols, then we should not
    internalize user-supplied library functions in -internalize.

Do linkers dead strip symbols with external linkage? Any other reason to
prefer one path over the other? Is there another way?

So, linkers have a better view of what is and is not used. They pass
that information down to LLVM during link. The thing llvm has to be
careful about are symbols it can introduce calls to (like memcpy). For
those llvm.compiler.used should be fine.

Using llvm.compiler.used and llvm.used is pretty annoying, and should
probably be made into an easier to use attribute, but probably not
folded into linkage. It is pretty orthogonal to other linkage
properties. We can have a llvm.used that is weak_odr, external or
internal for example.

Cheers,
Rafael

I believe it doesn't matter if the symbols in sections are internal or external---that only matters for symbol resolution.

Given this...

I've read the original thread (the 3 emails), and I'm still not sure what the purpose of internalization is in the context of user-provided library functions.

...I’m not sure there is a point.

The general idea is: unless the linker has told us to preserve a symbol,
internalize it, exposing it to other optimizations (like -globalopt).
However, for library functions, this breaks down because later passes
insert calls (e.g., -instcombine converts printf => puts, and
-codegenprepare converts llvm.memcpy => memcpy). So, add them to
@llvm.compiler.used to protect them temporarily.

If

  - the linker (e.g., /bin/ld) will delete unreferenced symbols (through
    -dead_strip, etc.) only if they have local linkage, or

  - LTO has a pass that will delete unreferenced symbols with local
    linkage *after* @llvm.compiler.used gets dropped (maybe we can add
    this),

then there’s a point.

If the output of LTO is one giant object file, it could make some sense (since the assembler could potentially do the "symbol resolution”).

The output of LTO *is* one giant object file, but the linker (e.g.,
/bin/ld) may be linking it with other object files.

Otherwise the problem is in telling the "ld" which definition of "printf" it needs to pick up,

In the LTO API, the linker should call
lto_codegen_add_must_preserve_symbol() on symbols it expects to come out
the other side. Basically, the user of LTO decides which version of
printf to pick up. If there are any calls to printf from outside the
bitcode and the linker is using the one in the bitcode, then the one in
the bitcode won’t be internalized.

or asking the user not to link the program with libc (a bit of a questionable request).

A common case for user-supplied library functions is that users cannot
link against libc, so they supply their own. This shouldn’t be the only
supported case, though.

How are optimizations "incorrectly modifying" (user-provided) library functions?

The current problem is that -instcombine will rename the function through
Module::getOrInsertFunction(). getOrInsertFunction() chooses this path
because the function has local linkage. However, the function is a
member of @llvm.compiler.used, so it shouldn’t really be modified.

I think in the normal case (non-LTO, where -internalize hasn’t run),
Module::getOrInsertFunction() *should* take this path with functions that
have local linkage. And it’s not trivial to check for membership in
@llvm.compiler.used.

The darwin linker does not care about internal vs external when doing
liveness analysis. But, by default when dylibs (DSOs) are created,
all global symbols are marked live at the start of the analysis. This
is not the case for main executables which just have main() and any
initializers marked live initially.

This is interesting. So the distinction does matter for shared objects,
but not for main executables. But LTO will be told to preserve all the
global symbols anyway for shared objects.

Since most code is dynamic that is probably why LLVM expects library
functions to not be local.

The case that revealed this bug was someone building a static binary.
Most code these days is dynamic, so libLTO will not see the implementation
of libcalls functions. Whereas with static binaries, it will always see libcall
function implementations. I don't know if supply that (static vs dynamic)
to libLTO would help.

That’s an idea, but I don’t think it’s necessary.

In static binaries, it’s important to optimize out unused user-supplied
library functions for size reasons. But the linker is going to do that
whether we make these functions local or global, so there’s no benefit
to internalizing them.

In dynamic binaries, users will link against a dynamic libc and are
unlikely to provide their own library function implementations. So
there’s no benefit to internalizing them here, either.

Do linkers dead strip symbols with external linkage?

This is probably the wrong question. When linking main executables, the
linker can dead strip external functions. But if LTO is used with a main
executable, the linker will not tell libLTO to preserve the global symbols,
so they will quickly be made internal.

My question: will libLTO ever dead strip a non-internal function? If not,
is that one of the reasons the internalize pass tries to make functions
internal?

Exactly. libLTO will only dead strip functions with local linkage (such
as internal). During normal (non-LTO) optimizations, only functions with
local linkage are safe to remove. LTO runs -internalize before other
optimizations so that the rest of the optimizations don’t need to know
that anything is different.

(This is where I got the (apparently incorrect) idea that the linker
would only dead-strip internal functions.)

From the darwin linker's perspective, if libLTO does not dead strip
a libcall function implementation, that is OK. The linker runs another
liveness analysis pass after the libLTO result and any other mach-o f
files are merged. So any extra functions will still get deleted from the
final output.

Okay, great. I think then it’s safe to remove the @llvm.compiler.used
hack that I went with originally, and just leave them external.

By LLVM point of view, yes. It can drop linkonce and local (private* +
internal) globals.

[...]

The system linker, yes. LLVM knows it is not seeing the full picture
with regards to external ones.

I mistakenly assumed the LLVM perspective applied also to the system
linker (!).

Why not just fix the optimizations that are not handling
llvm.compiler.used correctly?

That’s valuable work. However, for this use case, there doesn’t seem
to be any benefit in relying on @llvm.compiler.used. If I’d properly
understood how linkers work, I wouldn’t have complicated the flow in
the first place (i.e., I think r194514 should have just blocked
-internalize from giving these functions local linkage).

I’m also not sure what Module::getOrInsertFunction() *should* do when
it finds a function with local linkage in @llvm.compiler.used. I
think its current behaviour is correct most of the time (moving
functions with local linkage seems correct in the usual case), and as
you point out below, checking for membership in @llvm.compiler.used
is not cheap.

Why not just fix the optimizations that are not handling
llvm.compiler.used correctly?

That’s valuable work. However, for this use case, there doesn’t seem
to be any benefit in relying on @llvm.compiler.used. If I’d properly
understood how linkers work, I wouldn’t have complicated the flow in
the first place (i.e., I think r194514 should have just blocked
-internalize from giving these functions local linkage).

More or less. There is still a (very small) advantage of having
internal + llvm.compiler.used. In the case llvm does introduce an use
to it, the linker cannot GC it, but will not put it in the symbol
table of the resulting DSO.

I’m also not sure what Module::getOrInsertFunction() *should* do when
it finds a function with local linkage in @llvm.compiler.used. I
think its current behaviour is correct most of the time (moving
functions with local linkage seems correct in the usual case), and as
you point out below, checking for membership in @llvm.compiler.used
is not cheap.

I think it shouldn't be doing any special treatment. I mean, the
function is named getOrInsertFunction. It is surprising that it does
an insert when there is a function already :slight_smile: That functionality
should probably be moved to another, clearly named function.

Cheers,
Rafael

I see. Thanks for the explanation.

How about this: resolve symbols during LTO and detect which ones are used in the program, and which are not? It would probably require a lot more work in the LTO framework, but it has the benefit that we no longer need any "preserve" list merely to allow it to link, or "internalization". The only exception would be export lists for shared objects, but it would be a lot easier for the user to provide that, than to list all the functions referenced from other objects/libraries.

To help with user-provided library functions we could develop a way for the user to specify "resolution preference", i.e. if "printf" is not explicitly defined, pick it up from /usr/lib/libc.a, otherwise ignore the definition from libc.a.
This would only work for linking non-shared objects though. If the user wants to get the rest of the functions from libc, the replaced ones would need to be internally renamed to avoid conflicts with those in libc (the system linker could otherwise complain).

Has anything like this been considered?

-Krzysztof

I see. Thanks for the explanation.

How about this: resolve symbols during LTO and detect which ones are used in
the program, and which are not? It would probably require a lot more work
in the LTO framework, but it has the benefit that we no longer need any
"preserve" list merely to allow it to link, or "internalization". The only
exception would be export lists for shared objects, but it would be a lot
easier for the user to provide that, than to list all the functions
referenced from other objects/libraries.

To help with user-provided library functions we could develop a way for the
user to specify "resolution preference", i.e. if "printf" is not explicitly
defined, pick it up from /usr/lib/libc.a, otherwise ignore the definition
from libc.a.
This would only work for linking non-shared objects though. If the user
wants to get the rest of the functions from libc, the replaced ones would
need to be internally renamed to avoid conflicts with those in libc (the
system linker could otherwise complain).

During LTO time we don't know exactly which functions will be used. If
memcpy is used or not depends on which backend we are using for
example. The only reliable way would be iterate codegen, making not at
each step which new undefined references shows up, which is an
overkill.

The current strategy of just knowing which symbols llvm *might* use
seems appropriate, we just to fix llvm to always respect it.

Cheers,
Rafael