Hi,
At EuroLLVM, I had some discussions with @mstorsjo, @hansw2000, @wanders, and @petrhosek about how to improve the out-of-the-box behavior for lld to find compiler-rt builtins, sanitizers, and libc++. Later this discussion included @rnk in an email chain.
I will summarize the current situation and suggest steps to move forward.
Background
Some background for people that might not be that familiar with Windows. When linking to compiler-rt/libc++ on Linux, you usually invoke clang
as the linker, and the clang driver will construct the linker line to include the path to compiler-rt/libc++ and add the library on the link line.
On Windows, the norm is to invoke the compiler and linker separately. The compiler can add dependent libraries into the .obj files with the help of pragmas. But the linker needs to find the path to compiler-rt/libc++ itself. MSVC solves this by opening a “developer terminal”, which will populate the shell with some environment variables that point the linker to the correct directories.
The problem
We need to use the compiler-rt builtins for code to work in several cases. This includes int128 support. Currently, the following is a problem:
int main(int argc, char **argv) {
__int128 a = 123;
__int128 b = 1;
return a / b;
}
C:\src\temp>clang-cl /c a.cc
C:\src\temp>lld-link a.obj
lld-link: error: undefined symbol: __divti3
>>> referenced by a.obj:(main)
The same can be said for sanitizers and libc++. We can’t use the clang driver to construct the linker line, so things like: clang-cl -fsanitize=address /c a.cc && lld-link a.obj
Each runtime has different problems. Here are some of the issues we are trying to solve:
- Find the path (libpath) to the libraries
- Find and insert the library name into the obj files.
- Sanitizers (specifically ASAN) have a complex matrix of names and libraries that must be linked in different configurations. MSVC tries to solve this with the new
/inferasanlibs
(/INFERASANLIBS (Use inferred sanitizer libs) | Microsoft Learn)
The “per-target” problem
Finding and constructing the path to the compiler-rt where LLVM_ENABLE_PER_TARGET_RUNTIME_DIR
is set to false
is much easier than when it’s set to true
.
When this option is set to false
, the path to the compiler-rt’s are: lib/clang/16/lib/windows
, and the architecture is encoded in the filename (clang_rt.builtins-x86_64.lib
). Constructing this is easy since all that information is available within the linker context.
But when this option is set to true
, the path looks like this lib/clang/16/lib/x86_64-pc-windows-msvc
. Which contains the triple; lld doesn’t know about the triple, and constructing it from scratch might be tricky since we might not have all the relevant information.
Solutions
Since there is a lot of information that is not available to lld, we can either:
- Teach LLD more stuff, like how to create the triple. I made an attempt here ⚙ D151188 [LLD][COFF] Add LLVM toolchain library paths by default. in the
llvm::Triple LinkerDriver::constructTriple()
method, but it becomes a lot of assumptions and messy code. - Make decisions in the clang driver and transfer them via
--dependent-lib
to the linker.
I think solution two is probably best, and it would look something like this:
- clang driver will insert the builtin library relative path into the obj file with the prefixed target directory: i.e
clang --depentent-lib x86_64-pc-windows-msvc/compiler-rt.builtins.lib
orwindows/compiler-rt.builtins-x86_64.lib
depending on the configuration ofLLVM_ENABLE_PER_TARGET_RUNTIME_DIR
. - lld will add the directory’s root to its default search paths (
lib/clang/16/lib
in this case).
This will work fine for builtins and ubsan.
ASAN is more complicated since the support matrix will need to know if we are building a DLL or executable and which CRT we are linking to. To retain compatibility with MSVC, I think the best would be to implement /inferasanlibs
in LLD, but to avoid the target resolution in LLD as discussed above, we probably need clang to insert some placeholder as the dependent-lib, for example, x86_64-pc-windows-msvc/<asanlib>
. Not quite sure about this yet.
Finally libc++. Currently, libc++ inserts pragmas/dependent-libs
to have the linker link to them. This will work with LLVM_ENABLE_PRE_TARGET_RUNTIME_DIR=OFF
but not when it’s ON. In which case, it makes things a bit more complicated since we would have “rewrite” the dependent-lib line and add the target triple into it… to move triple awareness to lld.
TL;DR
- Windows build systems invoke the linker directly instead of clang to link.
- We need to improve how LLD finds libraries on Windows platforms to support builtins/sanitizers/libc++ better out of the box.
- This is complex because of the
LLVM_ENABLE_PER_TARGET_RUNTIME_DIR
setting since those directories contain the triple, and LLD doesn’t know about the triple - It would be nice to make these decisions in the clang driver and transfer that information to the linker by encoding it in the .obj files.
- There are caveats and edge cases.
Hope to get the community feedback on this topic and suggestions on how to move forward since all ways forward seem to contain some drawbacks.