[RFC] LLVM_LINK_LLVM_DYLIB should default to ON on Posix platforms

TL;DR: We should make LLVM_LINK_LLVM_DYLIB (and potentially later CLANG_LINK_CLANG_DYLIB) default to ON in CMake for non-Windows platforms. While all tradeoffs have pros and cons, I think it’s clear at this point that the benefits outweigh the negatives. The DSO build makes building LLVM more accessible to beginners, and I believe that’s important to the long-term health of the project, so we should make it the default.

These are the benefits of using the libLLVM DSO:

  • The LLVM DSO uses much less diskspace (18GiB vs 3.1GiB for Linux Rel+asserts). This multiplies by ~5x if you enable debug info.
  • Linking the LLVM tools with the DSO is faster and uses less memory. Linking the tools is a major bottleneck for the check-llvm/clang/mlir testing workflow. Running multiple high-memory link steps inhibits development on entry-level hardware, which beginners tend to have.
  • Using the LLVM DSO by default aligns our default build config with Debian, Fedora, and other Linux distro build configs, which all use libLLVM*.so. If you are a vendor who cares about package size, this enables creating a smaller, more modular collection of toolchain packages to distribute.

The downsides of the libLLVM DSO are:

  • Minor hit to startup time on ELF platforms. Recent data says this is marginal, but I’ll expand on this below, since the received wisdom is that this is a major blocker. Non-ELF platforms generally have faster loaders, or collude with the kernel to share relocated pages between processes.
  • The DSO increases the number of compile actions required to rebuild a single tool binary, slowing down iteration times when working with small test inputs. For example, if I make a change to IR/ headers, and I can test my change with llvm-as/dis or opt, the static build would allow me to test my changes without building the entire LLVM DSO, which includes llvm/lib/Target. Static linking, or fine-grained shared linking (BUILD_SHARED_LIBS=ON), can offer faster iteration times in cases like this.
  • The DSO build doesn’t validate internal library dependencies. In the static build, static links often fail if you forget a library dependency. This helps us maintain boundaries between layers in LLVM’s internal architecture. Alternative, unsupported build systems such as Bazel require accurate dependency information. This gap can also be covered with continuous BUILD_SHARED_LIBS testing, which I believe we already have on buildbot, because I broke it at some point. :slight_smile:
  • Tool binaries are less hermetic. They already search for paths relative to the executable path, but if you add shared library dependencies, that’s one more path to manage if you want to copy a tool binary around. IIRC we do this in a regression test, which IMO is questionable.
  • The DSO limits whole-program visibility in LTO-enabled builds. This is a major consideration for a vendor attempting to build a highly optimized toolchain binary, but is probably not a major consideration for our default build configuration.

Regarding the startup time overhead, this came up on LKML in 2021, and it resulted in this widely circulated post from Linus Torvalds “shared libraries are not a good thing in general”. Regardless, all the major Linux distros (Fedora, Debian (see llvm-toolchain package rules)) still use this configuration, presumably because it allows them to split up the toolchain into smaller, more modular packages (clang, llvm, lld, the rest), which use far less disk space.

However, recent benchmark data from @mstorsjo shows that this startup time hit is low. For short-lived executions like clang --version, the mean wall time goes from 9.5ms to 12.5ms, resulting in a 3ms overhead or +31%. However, if you add 3ms to a 1sec compilation action, we’re talking about a 0.3% perf hit. The overhead is lost in the noise on large compile actions, such as the ~20s sqlite3.c aggregate compilation tested in that benchmark.

I think part of why it is received wisdom that dynamically linked LLVM builds are slow comes from experience running the LLVM test suite, where the dynamic linker often has been the bottleneck, because our test processes are short-lived and process startup is the bottleneck. I did a quick A/B comparison of check-llvm, and my numbers show that the DSO build is faster, but I don’t trust my configuration and this deserves external validation.

I think 3ms of startup overhead is something worth optimizing, but this seems like a tradeoff that a vendor should consider, not something people doing their first LLVM build should have to think about. I believe there is also additional room to explore optimization flags such as -Bsymbolic-funcitons-non-weak, -fvisibility-inlines-hidden, as well as finishing up the LLVM_ABI annotations so we can build with -fvisibility=hidden. We can also continue optimizing dynamic relocations, as @chandlerc recently did in a series of PRs improving Clang’s builtin string tables a few months back. These are all good next steps, but I think the pros already outweigh the cons, so they don’t need to be prerequisites.

Regarding platform support, LLVM_LINK_LLVM_DYLIB is not supported on Windows. Once we have accurate LLVM_ABI annotations (I hope this happens), then we can re-evaluate turning this on for Windows.

Simply put, static linking is a bad default for LLVM today. The build directories are too big, and the many redundant static links use a lot of memory. Flags to optimize the linker are already explicitly mentioned in our getting started guide to work around these large static link steps. New developers regularly complain about build directory size (1), and most recently, I personally had a 111GB build directory that caused me to explore merging clang unit tests.

In order to make the project more accessible, we really should adjust our defaults so that building LLVM doesn’t require so many resources. Using large shared libraries by default would be a good step in that direction.

cc relevant folks @mstorsjo @arsenm @nikic @tstellar @MaskRay @compnerd

9 Likes

I think this is a reasonable change, for the reasons you have outlined.

Another advantage is that if you are building LLVM with LTO, using the dylib build means you only have to perform one big LTO link, instead of repeating it a few dozen times for each tool.

2 Likes

I don’t want to block anyone, or participate in such discussion at all, really, so you can consider this an anonymous user feedback. But I debugged a lot of crashes caused by programs loading multiple incompatible versions of libllvm pulled from different package managers (e.g. llvmlite rebuild and publishing to conda-forge channel (to include SVML patch changes) · Issue #72 · conda-forge/llvmlite-feedstock · GitHub and mentioned there issues) and I would prefer dynamically linked llvm to never exist at all.

This would also save us from another frequently asked question “why am I getting a segfault?” :slight_smile: (Answer: the linker ate all your memory.)

You quickly mentioned BUILD_SHARED_LIBS

doesn’t it make this option a better default? Or this topic is about changing the default for external users and not for developers?

I’d say the advantage to using the dylib build instead of the shared-libs build is that the dylib build is a pretty good option for both a production and a development build, while shared-libs is not suitable for production.

1 Like

Can this (at least partially) be circumvented by e.g. putting namespace llvm into inlined versioned namespace similar to what is done in libc++? With at least LLVM version to be used as a namespace tag. This way one won’t at least have symbol clashes across multiple LLVM versions. Certainly there is no “LLVM ABI”, but at least would prevent this?

Fortunately, all the similar cases I had to debug usually ended with assertion failures due to option being registered multiple times, so no real crashes :slight_smile:

Versioned namespace will probably fix a lot of such issues, and some other projects (TBB) has also used such approach for ABI breaking changes. Also, having a way to configure top-level namespace would be useful by itself, e.g. for custom LLVM forks. So I’m +1 for this.

libc also does this. Presumably it would require replacing all namespace llvm with namespace LLVM_NAMESPACE and have that macro be defined by the LLVM cmake configuration. It would be really invasive however, I can imagine hoe many uses of llvm::SmallVector we’d need to change.

What’s the difference between using a versioned namespace versus adding symbol versions with a linker script (which is what LLVM does now)?

Searching for paths relative to the executable path for a shared library dependency is somewhat different from doing so for other reasons: you need system loader support. Some POSIX platforms (e.g., AIX) do not support $ORIGIN in the runtime search path.

I would say I weakly disagree with this.

I would like to understand the problem this seeks to solve.

If the problem is developer iteration, when I change a single source file in clang and run check-clang it takes less than 5 seconds on my machine to rebuild and re-link clang, and a little over 2 minutes to run the lit tests… I feel like speeding up the link to slow down the lit tests is probably the wrong tradeoff (note: this is on Darwin using ld_prime).

If the problem is debug info size and linking memory consumption, could we instead more minimally make LLVM_USE_SPLIT_DWARF default to On?

If the problem is to default build configurations align with distributions… I actually don’t think that is a goal we should have. Distributions have wildly varying constraints that they optimize for which aren’t necessarily aligned with rapid iteration of rebuild/test cycles. I think our default build configuration should be more aligned for developer iteration rather than distribution.

I think one key problem here is that POSIX is a pretty wide brush to paint with. As @reinterpretcast mentioned AIX is quite different from Linux or macOS which are themselves different from each other.

To @s-barannikov’s point, our build configuration system today has some known sharp edges. Like allowing users to launch too many link jobs in paralell when using ld.bfd. That’s plagued us for a long time and we should probably just spend some time making our CMake error out on configurations that are almost certainly going to fail (like using ld.bfd and generating debug info on a laptop with 2GB of RAM).

5 Likes

I think that this is a good thing to do in general.

There is ongoing work to fix the DLL storage for Windows builds (and should be applicable to other platforms as well) so I would think that this is something that we should be able to default to on all targets (hopefully) soon.

One minor point - I think that the startup impact is not a permanent fixed cost. In fact, I think that it might be something that actually goes down as the work for the DLL storage starts making its way to the repository as the correct annotations would reduce the number of PLT entries.

Also, note the WHOPR aspects are dependent on the number of modules we create. If we create a single LLVM DSO, the overall impact should not be too bad compared to a DSO per module in LLVM.

I’m all for verioned/configurable "global’ llvm namespace. Was bit by “some plugin loaded by the application that links to our shared llvm-based lib is picking up wrong symbols (and it’s great if it only ends in cl opt asserts and not a segfault)” one too many times. Even symbol versioning doesn’t really help.
@tstellar

Symbol resolution rules when multiple versions are present are arcane and not really documented (or I’m really bad at searching). Anyway, if I recall my woes correctly 1) if third-party library/plugin/app links to unversioned llvm, then it’ll likely to pick up your versioned symbols other llvm ones. 2) if you versioned your symbols with fun@FOO_1.2.3 and the third-party is linking against fun@BAR_1.2.3 it’s unclear which one will win the resolution (and I definitely encountered cases there the wrong version was picked). So yeah, symbol versioning makes situation a little bit better but doesn’t solve it completely. I guess custom namespace may also help defining the llvm API surface you actually use (by just greping it in nm output) so you could hide everything else for extra benefit (useful while LLVM_ABI work is in limbo).

Anyway, this seems orthogonal to the RFC at hand and could use its own thread.

1 Like

I would add to the downsides that it is not currently possible to build Clang (and possibly other components - I haven’t checked) against a standalone LLVM build when LLVM is built with LLVM_BUILD_LLVM_DYLIB - because of the dependency on the static LLVMSupport library.

Obviously this isn’t a fundamental limitation, just the way the build is currently structured and I suspect it’s not a typical configuration. It’s just something I came across when developing the conan package for LLVM.

3 Likes

libLLVMSupport seems like an issue to me: we have quite a few libraries that depends only on libSupport and would not want all of LLVM.
Would it be possible to have libLLVMSupport.so not included in libLLVM.so and have libLLVM.so depend on libLLVMSupport.so?

I’m not too concerned about changing the defaults but I am interested in testing times. From a quick experiment running check-llvm in a Release+Asserts build:

  • Default: 159 seconds
  • CMAKE_EXE_LINKER_FLAGS=-no-pie: 97 seconds
  • LLVM_LINK_LLVM_DYLIB=ON: 171 seconds

So turning on LLVM_LINK_LLVM_DYLIB is only slightly slower than the current default, but it precludes the large speed-up you can currently get by turning on -no-pie (see Impact of default PIE on llvm testing times).

This is on Ubuntu 24.04 with an AMD Ryzen 9 9950X 16-Core processor. All timings seem consistent within about +/- 1 second.

Incidentally, LLVM_LINK_LLVM_DYLIB=ON seems incompatible with LLVM_USE_LINKER=lld. If I try to use both I get lots of build errors like:

FAILED: bin/llvm-cxxdump 
: && /usr/lib/ccache/clang++ -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -fuse-ld=lld -Wl,--color-diagnostics    -Wl,--gc-sections tools/llvm-cxxdump/CMakeFiles/llvm-cxxdump.dir/llvm-cxxdump.cpp.o tools/llvm-cxxdump/CMakeFiles/llvm-cxxdump.dir/Error.cpp.o -o bin/llvm-cxxdump  -Wl,-rpath,"\$ORIGIN/../lib:"  -lLLVM && :
ld.lld: error: unable to find library -lLLVM
clang++: error: linker command failed with exit code 1 (use -v to see invocation)

Integrating the ability to statically link a few key programs (i.e. clang) would remove a lot of the downsides of the DYLIB build. There isn’t really any benefit for the static tooling.

There are certain workloads where this can be quite impactful. Note that any numbers I quote are from about 3 years ago so the impact may be reduced with current versions of clang/LLVM.

As a distribution, a large % of your packages (in number at least) are quite small. When building a smaller package, the time it takes to run configure (single threaded) is often longer than building the package (5-8s configure, 2-4s make). Cutting the time to load clang can save quite a bit of time on the configure step, with one of the worst offenders I saw called clang about 1500 times over 35 seconds (up to 75 seconds depending on how optimized your clang was).

What I ended up doing was building the stack with LINK_LLVM_DYLIB=ON and then in the same tree rerunning cmake with it off and having it relink just clang and lld statically (a slight hack). By doing this I could achieve 95% of the size savings, but retain the performance benefits of fast startup and the extra performance you gain from LTO.

The numbers I had at the time for a full build (configure, make, make install) on Fedora:

clang: 33.8 seconds
clang: 26.6 seconds (after -Bsymbolic-functions and -fno-semantic-interposition patches released to highlight the massive impact on the DYLIB build!)
gcc: 18.4 seconds ← as a reference point, which is PGO and static

And my build of clang with static clang and lld only:
sunnyflunk clang: 14.9 seconds (I also used PGO which wasn’t in Fedora’s clang)

  • I never finished testing if static LLD was worthwhile, the thought was it could help for LTO links (a distribution default)

Even if you use LLVM_LINK_LLVM_DYLIB, the static libraries are still built. They just get linked into libLLVM.so instead of each individual tool. Whether you actually install/ship the static libraries is up to you, but the LLVM_LINK_LLVM_DYLIB option doesn’t interfere with it.

While we have since switched to monolithic builds, we have been building various subprojects against standalone LLVM compiled with LLVM_BUILD/LINK_LLVM_DYLIB for a long time, and there have been no problems with it.

This seems like a major problem to me. We recommend people to link with lld to save RAM and link time. We need to fix this first IMHO.

On further investigation I only see those errors after I do cmake . -DLLVM_LINK_LLVM_DYLIB=ON in a build tree that was originally configured without that option. Maybe I am not supposed to do this?