I’ve been investigating Issue 56070, a corner case in the interaction of dependent libraries (aka deplibs, #pragma comment(lib, "name")
), Link Time Optimization (LTO), and backend generated library calls (libcalls). The interaction of these had produced an undefined reference in an otherwise valid link in ELF lld. I wanted to see what the COFF lld backend did in this case, so I built an equivalent scenario for COFF. This worked out of the box, but the solution used by the COFF broke a different, but related scenario, which, of course, works fine on the ELF backend.
Given that this issue has escaped the ELF backend, and since the ideal behavior here doesn’t seem obvious bit, it seems like a good time to open this question to the broader LLD community.
Background
The deplibs feature is intended to allow object files added to the link to automatically pull in libraries without the need to manually specify them on the command line. It originated in MSVC then made its way to lld COFF, then lld ELF.
LTO allows symbol resolution to drive optimization by deferring code generation until after symbols have been resolved. The resolved symbol table is processed and provided to the code generator as one of its inputs. In particular, external symbols that aren’t actually accessed outside of a translation unit can be made internal and possibly removed.
During the course of code generation, the compiler may emit calls to external library functions. This happens fairly late in the code generation process, so it’s not easy to predict which will be produced. Emitting a libcall creates a new reference to the external symbol. Since LTO depends on having a finalized symbol table before code generation begins (at least with respect to symbols defined in the translation unit), any possible library calls that are satisfied using bitcode are summarily added to the link beforehand, at least in the ELF and COFF LLD backends.
ELF problem
In ELF lld, LTO codegen may cause object files containing libcalls to be strongly referenced. This may cause these files to be parsed. If they contain deplibs, this may cause additional libraries to be pulled into the link. However, since the symbol table was supposed to be finalized already, these libraries are never parsed, so their symbols are never added to the symbol table. This causes Issue 56070.
For a concrete example, in the Fuchsia portion of compiler-rt, an object file used to satisfy a libcall (AArch64 atomics) needs to issue a syscall to query whether certain instructions are supported. The implementation uses deplibs to pull in the vDSO containing the syscall interface. Since the above issue causes the vDSO library to never be parsed, the syscall remains an undefined reference in compiler-rt.
COFF problem
An equivalent scenario can be constructed for LLD’s COFF backend. This uses a task queue approach. Tasks can recursively enqueue other tasks, and the queue runs to completion at various points in the link. One of these points is after LTO occurs, so when the deplibs object file adds another library to the link, this adds a task to parse the library. Thus, the issue above does not occur, since any newly added deplibs are transitively parsed.
However, this approach pulls in code too eagerly. LTO considers its symbol table complete, so it’s free to internalize and possibly DCE symbols not referenced outside of the LTO unit. One of these symbols may be referenced by code pulled by deplibs after LTO. I was able to create a scenario where the COFF LLD improperly internalized such a variable, which creates an undefined reference when the deplib library references it.
MSVC?
Given that the feature is originally from MSVC, and since MSVC does have LTO, it’s an open question how they deal with this scenario. Unfortunately, I lack the expertise needed to reproduce either of the above scenarios for MSVC.
Possible solution
One way to establish a consistent semantics would be to summarily pull in deplibs for any function that could possibly be issued as a libcall before LTO occurs, similar to libcalls implemented in bitcode. This would allow the Fuchsia case to work in ELF, but would prevent COFF’s LTO instance from internalizing deplibs, since the effects of deplibs would have occurred in the symbol table before LTO. This would change the existing semantics of both linkers, since if a libcall never ends up being issued, deplibs for it would still be pulled in for the containing object files. Adding these as lazy libraries should mitigate this somewhat.
Although there is what appears to be a best-effort list of possible libcalls, this isn’t presently sufficient for the above approach to work for the Fuchsia case, as the AArch64-specific atomic libcalls aren’t in the list. This would required maintaining exhaustive and target-specific lists of libcalls to be made available to the various linkers. The issue contains a prototype solution along these lines for just ELF/AArch64.
Other approaches
While the above seems to at least be internally consistent, there’s a lot not to like about it. I’m at a loss to think of another way to make these three features fit together consistently, though. You could say “you can’t use deplibs in anything that might be used to satisfy a libcall”, but it’s unclear whether or not there’s any way to diagnose a condition like that, and it seems rather strange to ban uses of language features in core libraries due to compiler options specified in the callers of those libraries. If there’s prior art for that though, let me know.