Hello,
I’ve put some comments on the proposal inline. Having to had to debug
library selection problems where all the libraries are visible on the
linker command line, I would prefer if people didn’t embed difficult
to find directives in object files, but I’m guessing in some languages
this is the natural way of adding libraries.
At Sony we offer autolinking as a feature in our ELF toolchain. We would like to see full support for this feature upstream as there is anecdotal evidence that it would find use beyond Sony.
I’ve not got any use of the existing code. Personally I’ve not come
across anyone wanting this type of feature, but that is also anecdotal
on my part.
For ELF we need limited autolinking support. Specifically, we only need support for “comment lib” pragmas (https://docs.microsoft.com/en-us/cpp/preprocessor/comment-c-cpp?view=vs-2017) in C/C++ e.g. #pragma comment(lib, “foo”). My suggestion that we keep the implementation as lean as possible.
Principles to guide the implementation:
- Developers should be able to easily understand autolinking behavior.
- Developers should be able to override autolinking from the linker command line.
- Inputs specified via pragmas should be handled in a general way to allow the same source code to work in different environments.
I would like to propose that we focus on autolinking exclusively and that we divorce the implementation from the idea of “linker options” which, by nature, would tie source code to the vagaries of particular linkers. I don’t see much value in supporting other linker operations so I suggest that the binary representation be a mergable string section (SHF_MERGE, SHF_STRINGS), called .autolink, with custom type SHT_LLVM_AUTOLINK (0x6fff4c04), and SHF_EXCLUDE set (to avoid the contents appearing in the output). The compiler can form this section by concatenating the arguments of the “comment lib” pragmas in the order they are encountered. Partial (-r, -Ur) links can be handled by concatenating .autolink sections with the normal mergeable string section rules. The current .linker-options can remain (or be removed); but, “comment lib” pragmas for ELF should be lowered to .autolink not to .linker-options. This makes sense as there is no linker option that “comment lib” pragmas map directly to. As an example, #pragma comment(lib, “foo”) would result in:
.section “.autolink”,“eMS”,@llvm_autolink,1
.asciz “foo”
For LTO, equivalent information to the contents of a the .autolink section will be written to the IRSymtab so that it is available to the linker for symbol resolution.
I’m not sure I understand the bit about “for symbol resolution”. I
think that what you mean is that you will encode the autolink section
using symbols instead of as a section, and the linker is expected to
extract this when it reads the symbol table?
Whoops… might have used a bit of a colloquialism there; sorry. All I mean is that there will be a method on the IRSymtab that LLD can use to retrieve the same set of strings that would be written into the the .autolink section of the relocatable object files by the backend.
The linker will process the .autolink strings in the following way:
- Inputs from the .autolink sections of a relocatable object file are added when the linker decides to include that file (which could itself be in a library) in the link. Autolinked inputs behave as if they were appended to the command line as a group after all other options. As a consequence the set of autolinked libraries are searched last to resolve symbols.
- It is an error if a file cannot be found for a given string.
- Any command line options in effect at the end of the command line parsing apply to autolinked inputs, e.g. --whole-archive.
I’ve not got any experience of autolinking as a user, so I’m
struggling a bit with this one. I’m guessing that autolinking is
useful because someone can do the equivalent of #include <library.h>
and #pragma comment lib “library.so” in the same place without having
to fight the build system.
Right. Consider that many codebases have multiple build configurations and the linker needs to be given the correct version of a library to use for the particular build configuration. This is often easier to do using the preprocessor than in the build system. Also, if a program is dependent on an external library, autolinking allows the library writer to reorganize how that library is structured transparently to the users of the library. There are notes about utility in https://stackoverflow.com/questions/1685206/pragma-commentlib-xxx-lib-equivalent-under-linux and https://stackoverflow.com/questions/3851956/whats-pragma-comment-lib-lib-glut32-lib?noredirect=1&lq=1.
I’m less convinced about --whole-archive as
I think this tends to be a way of structuring the build and would be
best made explicit in the build system. Moreover, what if someone
wants to not use --whole-archive, for their autolink, but one already
exists.
Then they can specify --no-whole-archive on the end of the command line, no?
This could be quite difficult to check with a large project.
Personally I’d have the user be explicit in the .autolink whether they
were intending it to be whole-archive or not.
I was hoping to avoid this as I want to avoid getting into how to specify linker specific options in the frontend. If we dislike the idea that the state of the command line parser at the end of the linker command line affects the autolinked libraries then I would rather go for a scheme in which the default state of the command line parser applies when linking the autolinked libraries; however, that seems harder to implement in LLD and gives the user less control over autolinking.
I think that handling .autolink’ed files in the default state is simpler, and it doesn’t seem too hard to implement.
Right… definitely possible to implement. So the trade offs are that it is possibly confusing if options like --whole-archive start applying to the “invisible” autolinked inputs. OTOH why not allow command line options to affect the autolinked inputs? It gives developers some more control at no cost (apart form the possible confusion).
The other option is to handle autolinked libraries as soon as we find them, so that if foo.o autolinks libbar, the linker would act as if foo.o in the command line is followed by -lbar. I’d think that’s not too bad or arguably more straightforward semantics than autolinking everything all at once at the end.
So I played around with this idea a bit. Some background info:
MSVC searches libraries added via “comment lib” pragmas last, after searching all of the libraries specified on the command line; however, symbols that are unresolved when bringing in an object file from a library are searched for in that library first (https://docs.microsoft.com/en-us/cpp/build/reference/link-input-files?view=vs-2017).
In the upstream discussion for autolinking, Cary Coutant offered the following as a good compromise for traditional ELF linkers (http://lists.llvm.org/pipermail/llvm-dev/2018-January/120382.html.):
“”“I think what would work is to insert each requested object or shared
library into the link order immediately after the object that requests
it, but only if the object hasn’t already been inserted and isn’t
already listed on the command line (i.e., we won’t try to load the
same file twice); and to search each requested archive library
immediately after each object that requests it (of course, because of
how library searching works, we would load a given archive member once
at most). With this method, libm would be searched after both a.o and
b.o, so we’d load any members needed by a.o before b.o, and any
remaining members needed by b.o before c.o.”""
The problem with what your suggesting is that with the GNU linkers it is always possible to define “where” in the command line parsing you are. However for MSVC or LLD it is not always possible… think of a object file in a library that autolinks foo.a that gets pulled into the link (by a undefined symbol) much later on in the link order. My RFC is careful to try to set out a scheme that all linkers can implement (as much as is possible).
- Duplicate autolinked inputs are ignored.
If we take the issue of --whole-archive off the table does it matter
that there are duplicate libraries? Unresolved symbols will match
against the first library.
It doesn’t matter for libraries in LLD; but, it is important for object files. I think that this mechanism should be usable for object files an libraries. This is common in ELF linkers - for example the --library command line option can be used to link object files.
Do you actually often link .o file using -l? It seems a bit weird use of the option. To me, it seems better to limit the ability of autolinking to link against .so or .a.
I don’t personally but it does seem useful to be able to find .o files on the library search paths.
Rui - I’m sure you know everything about MSVC linking already! For others benefit though, MSVC only allows loading of libraries via “comment lib” pragmas. It rejects .obj files.
C:\temp\library_semantics>more msvc_foo.c
int foo() {return 10;}
C:\temp\library_semantics>cl msvc_foo.c /c
msvc_foo.c
C:\temp\library_semantics>more msvc.c
#pragma comment(lib, "msvc_foo.obj")
int foo ();
int main () {return foo();}
C:\temp\library_semantics>cl msvc.c
Microsoft (R) C/C++ Optimizing Compiler Version 19.00.24213.1 for x64
Copyright (C) Microsoft Corporation. All rights reserved.
msvc.c
Microsoft (R) Incremental Linker Version 14.00.24213.1
Copyright (C) Microsoft Corporation. All rights reserved.
/out:msvc.exe
msvc.obj
msvc_foo.obj : warning LNK4003: invalid library format; library ignored
msvc.obj : error LNK2019: unresolved external symbol foo referenced in function main
msvc.exe : fatal error LNK1120: 1 unresolved externals
C:\temp\library_semantics>lib /out:msvc_foo.lib msvc_foo.obj
Microsoft (R) Library Manager Version 14.00.24213.1
Copyright (C) Microsoft Corporation. All rights reserved.
C:\temp\library_semantics>more msvc.c
#pragma comment(lib, "msvc_foo.lib")
int foo ();
int main (){return foo();}
C:\temp\library_semantics>cl msvc.c /link /verbose | grep msvc
msvc.c
/out:msvc.exe
msvc.obj
Processed /DEFAULTLIB:msvc_foo.lib
Searching msvc_foo.lib:
Referenced in msvc.obj
Loaded msvc_foo.lib(msvc_foo.obj)
Processed /DISALLOWLIB:msvcrt.lib
Processed /DISALLOWLIB:msvcrtd.lib
Searching msvc_foo.lib:
Searching msvc_foo.lib:
Searching msvc_foo.lib:
msvc.obj
msvc_foo.lib(msvc_foo.obj)
Other interesting MSVC behaviour:
MSVC forms the library name to search for based on the file extension. An interesting difference is that on windows import libraries and static archives both have the same naming convention of .lib. Whereas on Unix dynamic libraries are conventionally named .so and static archives are lib.a.
#pragma comment(lib, “winmm”) → Searches for “winmm.lib” (doesn’t search for “winmm”)
#pragma comment(lib, “winmm.lib”) → Searches for “winmm.lib” (doesn’t search for “winmm.lib.lib”)
#pragma comment(lib, “winmm.lix”) → Searches for “winmm.lix” (doesn’t search for “winmm.lix.lib”)
MSVC allows specifying libraries on the command line as just file names or by using the /DEFAULTLIB option. In both cases the rules for locating the library are the same. If a path is specified with the library name, LINK searches for the library in that directory. If no path is specified, LINK looks first in the directory that LINK is running from, and then in any directories specified in the LIB environment variable, see : https://docs.microsoft.com/en-us/cpp/build/reference/dot-lib-files-as-linker-input?view=vs-2017. Additionally, LINK will search for any /LIBPATH paths before those specified in the LIB environment variable, see: https://docs.microsoft.com/en-us/cpp/build/reference/libpath-additional-libpath?view=vs-2017. LINK handles libraries specified via “comment lib” pragmas just as if you had named them at on the command line, see: https://docs.microsoft.com/en-us/cpp/preprocessor/comment-c-cpp?view=vs-2017.
MSVC rules for resolving symbols from libraries: A library specified with /DEFAULTLIB is searched after libraries specified explicitly on the command line and before default libraries named in .obj files (see https://docs.microsoft.com/en-us/cpp/build/reference/nodefaultlib-ignore-libraries?view=vs-2017).
MSVC allows passing not only libraries to the linker via pragams but also a subset of the linkers command line options (https://docs.microsoft.com/en-us/cpp/preprocessor/comment-c-cpp). In addition to the documented options MSVC also accepts some undocumented options. One of these is the /DISALLOWLIB which allows an object file to state that it is incompatible with a given library, see: https://stackoverflow.com/questions/761394/what-does-the-disallowlib-message-mean-in-vc-linker-output and https://stackoverflow.com/questions/3007312/resolving-lnk4098-defaultlib-msvcrt-conflicts-with.
One of the options supported is /DEFAULTLIB. This means you can specify libraries via pragmas with either #pragma comment(lib, ) or #pragma comment(linker, “/DEFAULTLIB:”).
MSVC has the “/NODEFAULTLIB” option which ignores any /DEFAULTLIB options from object files or the command-line. You can also ignore specific libraries, with “/NODEFAULTLIB:name.lib”.
Both Gold and GNU-ld allow loading of non-library files via -l/–library options; but, MSVC only allows adding libraries via its equivalent of the -l command:
C:\temp\library_semantics>more msvc_foo.c
int foo() {return 10;}
C:\temp\library_semantics>cl msvc_foo.c /c
C:\temp\library_semantics>lib /out:msvc_foo.lib msvc_foo.obj
C:\temp\library_semantics>type msvc_main.c
void main(){}
C:\temp\library_semantics>cl msvc_main.c /link /DEFAULTLIB:foo.obj
/out:msvc_main.exe
/DEFAULTLIB:foo.obj
msvc_main.obj
foo.obj : warning LNK4003: invalid library format; library ignored
MSVC also ignores duplicate .objs on the command line:
c:\temp\library_semantics>cl msvc.obj
/out:msvc.exe
msvc.obj
c:\temp\library_semantics>cl msvc.obj msvc.obj
/out:msvc.exe
msvc.obj
msvc.obj
msvc.obj : warning LNK4042: object specified more than once; extras ignored