Why do we static link all llvm libraries in every executable?

Hi

I found basically all llvm libraries are statically linked into each executable and LLVMgold.so,

This make the clang/llvm package larger and larger with a lot of duplicated code. If I build

debug version, the disk space required is even larger. Is there any particular reason to keep

doing this way? If we separate several shared libraries something like libclang.so, libllvm.so

and let all executables and llvmgold.so just linked with .so. A lot of space could be saved and

loading performance could be improved.

Yin

There is a build option to do exactly that. It comes at a significant
price for startup, e.g. clang will take 10x as long for building a small
example.

Joerg

Namely BUILD_SHARED_LIBS=ON

I find it very useful for dev builds!

If we separate several shared libraries something like
libclang.so, libllvm.so and let all executables and llvmgold.so just
linked with .so. A lot of space could be saved and
loading performance could be improved.

There is a build option to do exactly that. It comes at a significant
price for startup, e.g. clang will take 10x as long for building a small
example.

Is this to process runtime relocations or run constructors? I wonder if Prelink or ElfHack could help.

If we separate several shared libraries something like
libclang.so, libllvm.so and let all executables and llvmgold.so just
linked with .so. A lot of space could be saved and
loading performance could be improved.

There is a build option to do exactly that. It comes at a significant
price for startup, e.g. clang will take 10x as long for building a small
example.

Is this to process runtime relocations or run constructors? I wonder if
Prelink or ElfHack could help.

Runtime relocations, I would imagine (global ctors would have to run in
either mode - so shouldn't represent a difference, I would think?)

Hi,

Thank you for explaining and providing the option. I will give a try.

10x slower…on Linux? If we limit the number of global symbols exposed, it will

help the situation?

Yin

Properly separating global / default symbols from internal / hidden symbols would take a substantial amount of effort. To put the amount of effort in perspective, there are more than 700 LLVM headers that provide the interface between the LLVM static libraries. There are more than 300 Clang headers to provide the interface between the Clang static libraries.

I think the change could be done, and it would be valuable, but it isn’t a quick change. You would also get to deal with all the non-portable platform peculiarities.

Properly separating global / default symbols from internal / hidden
symbols would take a substantial amount of effort. To put the amount of
effort in perspective, there are more than 700 LLVM headers that provide
the interface between the LLVM static libraries. There are more than
300 Clang headers to provide the interface between the Clang static
libraries.

I think the change could be done, and it would be valuable, but it isn't
a quick change. You would also get to deal with all the non-portable
platform peculiarities.

A pity Prelink does not support this use-case (prelinking just a subset of loaded libraries).

10x slower seems like an exaggeration. I tried doing a shared and non-shared build of LLVM and running the LLVM+Clang test suites. The shared library version used 40% more CPU time[1]. I did not compare to a PIC-but-not-shared-library build yet, which would give another interesting data point (i.e. how much do we lose from PC-relative addresses rather than absolute, vs how much do we lose from dynamic relocations vs static).

David

[1] Note: I only did it once, take these results with a grain of salt, though given that they used several hours of CPU time each, there’s probably not a huge variation expected over multiple runs.

Test program is just "int main(void) { return 0; }":

Monolithic clang as used by NetBSD's cross build system: 0.008s
Shared clang using normal cmake build: 0.139s

Both optimised builds with asserts enabled.

Joerg

10x slower..on Linux? If we limit the number of global symbols exposed, it will
help the situation?

10x slower seems like an exaggeration.

Test program is just "int main(void) { return 0; }":

Monolithic clang as used by NetBSD's cross build system: 0.008s
Shared clang using normal cmake build: 0.139s

Both optimised builds with asserts enabled.

Just curious, was this HDD or SSD?

SSD, but pretty much irrelevant due to a hot cache.

Joerg