LTO, deplibs, and libcalls, oh my

I’ve been investigating Issue 56070, a corner case in the interaction of dependent libraries (aka deplibs, #pragma comment(lib, "name")), Link Time Optimization (LTO), and backend generated library calls (libcalls). The interaction of these had produced an undefined reference in an otherwise valid link in ELF lld. I wanted to see what the COFF lld backend did in this case, so I built an equivalent scenario for COFF. This worked out of the box, but the solution used by the COFF broke a different, but related scenario, which, of course, works fine on the ELF backend.

Given that this issue has escaped the ELF backend, and since the ideal behavior here doesn’t seem obvious bit, it seems like a good time to open this question to the broader LLD community.

Background

The deplibs feature is intended to allow object files added to the link to automatically pull in libraries without the need to manually specify them on the command line. It originated in MSVC then made its way to lld COFF, then lld ELF.

LTO allows symbol resolution to drive optimization by deferring code generation until after symbols have been resolved. The resolved symbol table is processed and provided to the code generator as one of its inputs. In particular, external symbols that aren’t actually accessed outside of a translation unit can be made internal and possibly removed.

During the course of code generation, the compiler may emit calls to external library functions. This happens fairly late in the code generation process, so it’s not easy to predict which will be produced. Emitting a libcall creates a new reference to the external symbol. Since LTO depends on having a finalized symbol table before code generation begins (at least with respect to symbols defined in the translation unit), any possible library calls that are satisfied using bitcode are summarily added to the link beforehand, at least in the ELF and COFF LLD backends.

ELF problem

In ELF lld, LTO codegen may cause object files containing libcalls to be strongly referenced. This may cause these files to be parsed. If they contain deplibs, this may cause additional libraries to be pulled into the link. However, since the symbol table was supposed to be finalized already, these libraries are never parsed, so their symbols are never added to the symbol table. This causes Issue 56070.

For a concrete example, in the Fuchsia portion of compiler-rt, an object file used to satisfy a libcall (AArch64 atomics) needs to issue a syscall to query whether certain instructions are supported. The implementation uses deplibs to pull in the vDSO containing the syscall interface. Since the above issue causes the vDSO library to never be parsed, the syscall remains an undefined reference in compiler-rt.

COFF problem

An equivalent scenario can be constructed for LLD’s COFF backend. This uses a task queue approach. Tasks can recursively enqueue other tasks, and the queue runs to completion at various points in the link. One of these points is after LTO occurs, so when the deplibs object file adds another library to the link, this adds a task to parse the library. Thus, the issue above does not occur, since any newly added deplibs are transitively parsed.

However, this approach pulls in code too eagerly. LTO considers its symbol table complete, so it’s free to internalize and possibly DCE symbols not referenced outside of the LTO unit. One of these symbols may be referenced by code pulled by deplibs after LTO. I was able to create a scenario where the COFF LLD improperly internalized such a variable, which creates an undefined reference when the deplib library references it.

MSVC?

Given that the feature is originally from MSVC, and since MSVC does have LTO, it’s an open question how they deal with this scenario. Unfortunately, I lack the expertise needed to reproduce either of the above scenarios for MSVC.

Possible solution

One way to establish a consistent semantics would be to summarily pull in deplibs for any function that could possibly be issued as a libcall before LTO occurs, similar to libcalls implemented in bitcode. This would allow the Fuchsia case to work in ELF, but would prevent COFF’s LTO instance from internalizing deplibs, since the effects of deplibs would have occurred in the symbol table before LTO. This would change the existing semantics of both linkers, since if a libcall never ends up being issued, deplibs for it would still be pulled in for the containing object files. Adding these as lazy libraries should mitigate this somewhat.

Although there is what appears to be a best-effort list of possible libcalls, this isn’t presently sufficient for the above approach to work for the Fuchsia case, as the AArch64-specific atomic libcalls aren’t in the list. This would required maintaining exhaustive and target-specific lists of libcalls to be made available to the various linkers. The issue contains a prototype solution along these lines for just ELF/AArch64.

Other approaches

While the above seems to at least be internally consistent, there’s a lot not to like about it. I’m at a loss to think of another way to make these three features fit together consistently, though. You could say “you can’t use deplibs in anything that might be used to satisfy a libcall”, but it’s unclear whether or not there’s any way to diagnose a condition like that, and it seems rather strange to ban uses of language features in core libraries due to compiler options specified in the callers of those libraries. If there’s prior art for that though, let me know.

Bless you, for using its and it’s correctly.

Wouldn’t that make the Fuchsia case simply not work? It implements a libcall in a way that depends on another libcall, IIUC your example, and the chaining of dependencies pretty much can’t be satisfied any other way.

Yep, this option is to just codify the status quo, where there’s no safe way to use deplibs as part of the satisfaction of a libcall, lest one of the transitive users of your library enable LTO. We’re hacking our way around this in Fuchsia today by manually adding the vDSO to the command line of each link in our build that breaks because of this issue.

Attn: @teresajohnson @davidxl @mehdi_amini @jyknight @fhahn

I want to say that “you can’t use deplibs in anything that might be used to satisfy a libcall” needs to be true. Fundamentally, with LTO, we are trying to run symbol resolution twice, once with IR and once with native objects, and get approximately the same result both times. LLVM’s ability to create new references to library functions during code generation means the second symbol resolution attempt will inevitably be slightly different from the first attempt. If we allow anything more than pulling in a few leaf object files that reference nothing else (think compiler-rt builtins), those symbol resolution results will diverge drastically.

On the other hand, you mention that we have some support for libcalls implemented in bitcode, so maybe your proposed solution can be viewed as an extension of this knowledge.

It might be feasible to implement some form of checking by writing down the set of .deplibs that were discovered during the LTO pre-link and then comparing with the set of deplibs from the native link, and if they differ, produce an error. That would be a better user experience than the current result of undefined symbols.

Some time has passed, and I wanted to circle back around to this issue before it slipped out of my mind completely.

Looking at a bit broader of a scope, the interaction between LTO and libcalls seems fraught with hidden peril:

  1. The linker uses a target-independent list to pull in bitcode libcalls. If a target-specific bitcode libcall isn’t on the list, linking fails with a fairly obscure error. Apparently no-one implements target-specific libcalls in bitcode, but it doesn’t seem to be written anywhere that you can’t.

  2. If libcall emitted by LTO uses deplibs, ELF lld won’t parse the target, and this may break the link with a strange undefined reference. COFF LLD pulls it in, but too late, after symbols it may reference have been internalized. This again would break with an obscure undefined reference. It also doesn’t seem to be written down anywhere that you can’t do this.

I think any desirable outcome here would remove the hidden peril; either by removing the peril, or by making it no longer hidden.

To remove the peril, we’d need probably a fair amount of linker work to make sure that all backends eagerly pull in everything they need to to prevent post-LTO link failures that would have not have occurred without LTO.

To make the peril obvious, we’d need at the very least to define and document the semantics of what you are and aren’t allowed to do in implementations of libcalls. Right now, the rules seem to involve reading a list in the source code of LLD, which doesn’t seem great. Ideally, violations of these rules could be caught when building compiler-rt. If that’s not practical, it might be easier to provide more specific link errors in case one of these scenarios ever recurs in practice.

In short, to me, the status quo kinda sucks, and I’d like to fix it. There’s still (apparently) quite broken libcall implementations in the Fuchsia compiler-rt configuration because of this, and we’ve had to work around the issue manually in our build system.

Since I’m the one bugged by this, I’m willing to spend some time on any of these alternatives. That being said, I’d definitely like to know which way folks are leaning before putting any serious time into it.