Practical --enable-shared LLVM builds.

It seems that --enable-shared builds are not used by LLVM developers
because the executables starts slowly. I guess this is related to the
number of symbols the dynamic linker has to resolve. If we could reduce
the symbols exported to those which are required, maybe the startup time
would become bearable.

One way of doing this is to add annotations to each public class and
function (such as __declspec(dllexport)). The hard work here is to
determine which classes are public, mostly because inter-library
dependencies. Then we have the issue of cyclic dependencies. Today
cyclic dependencies are solved with the creation of partially linked
objects, but this defeats the advantages of having shared libraries.

OTOH, I'll like to distribute LLVM as a separate set of files and not to
force my users to update them every time my compiler changes.

One solution is to add annotations only to those classes that shall be
visible by the LLVM user and create one big dll comprising all required
components. This big dll would be smaller and load faster than the sum
of individual LLVM libraries. The counterpart, from the POV of the LLVM
developer, is that you get static constructors that you don't want to
use.

Some questions arise:

   What's the estimated ratio of public/private symbols for a LLVM
   library, taking into account inter-library dependencies.

   Same question but without inter-library dependencies. Just those
   needed by executables.

   As a LLVM developer, do you see any advantage on using shared
   libraries? (supposing that we overcome the startup slowdown).

   Thinking as a LLVM user, are shared libraries interesting to you? If
   the answer is yes, is the one-big-dll mentioned above useful to you?

Hi

I'm not sure that it meets your needs, but in making C# bindings[1], I
had to build the one-big-dll with only public stuff exported. I did
this by processing the llvm-c headers into a linker script (so the
llvm-c interface is the determiner of what's visible, rather than
requiring decoration in the source code).

The llvm-c interface mostly only covers front end IR building, not
code-gen or opt passes, but it might give you an idea of the ratio of
public/internal anyway.

So, in the "LLVM user" category, yes, I'm interested in the
one-big-dll, though I'd probably only use the C-interface, not the C++
parts. I'm not sure how practical it is to use exporting C++ features
across DLL boundaries and trying to upgrade LLVM and your compiler
separately, as small changes would affect name mangling. You'd be
requiring quite a large scale "freeze" on a lot of LLVM changes for
that to work, I think.

I'd also be interested in working on getting at the equivalent
functionality of `llc' and `opt' via the llvm-c interface too, if you
think that might be the way to go.

scott

[1] To be contributed once I'm finally upgraded to somewhere near
HEAD, if anyone's interested in them

"Scott Graham" <scott.llvm@h4ck3r.net> writes:

I'm not sure that it meets your needs, but in making C# bindings[1], I
had to build the one-big-dll with only public stuff exported. I did
this by processing the llvm-c headers into a linker script (so the
llvm-c interface is the determiner of what's visible, rather than
requiring decoration in the source code).

The llvm-c interface mostly only covers front end IR building, not
code-gen or opt passes, but it might give you an idea of the ratio of
public/internal anyway.

The llvm-c interface is quite limited, although very reasonable for some
applicatons. It is of little use to the LLVM developers, though.

So, in the "LLVM user" category, yes, I'm interested in the
one-big-dll, though I'd probably only use the C-interface, not the C++
parts.

This seems easy to do. I'll investigate how much work requires to add a
target to the cmake build for putting together all the libs the user
requires into one dll, exporting the C API.

[snip]

I'd also be interested in working on getting at the equivalent
functionality of `llc' and `opt' via the llvm-c interface too, if you
think that might be the way to go.

The problem here is that the C++ API changes quite often. If your C
wrappings are exhaustive, this can involve quite a bit of maintenance
work and be a good candidate for code rot.

[snip]

Óscar Fuentes wrote:

It seems that --enable-shared builds are not used by LLVM developers
because the executables starts slowly. I guess this is related to the
number of symbols the dynamic linker has to resolve. If we could reduce
the symbols exported to those which are required, maybe the startup time
would become bearable.

I think that premise needs to be retested. I don't doubt that we have a very large number of symbols, but I wouldn't be so sure that it still results in a large load time on modern hardware.

One way of doing this is to add annotations to each public class and
function (such as __declspec(dllexport)). The hard work here is to
determine which classes are public, mostly because inter-library
dependencies. Then we have the issue of cyclic dependencies. Today
cyclic dependencies are solved with the creation of partially linked
objects, but this defeats the advantages of having shared libraries.

There's very very little that we don't want exposed. LLVM is by design a framework where everything is reusable.

But there are two things you can look at. The first is that g++ doesn't (didn't?) properly support anonymous namespaces. Things in anonymous namespaces don't need to have symbols created for them. Second, LLVM classes are littered with VISIBILITY_HIDDEN.

Everything else is public.

OTOH, I'll like to distribute LLVM as a separate set of files and not to
force my users to update them every time my compiler changes.

One solution is to add annotations only to those classes that shall be
visible by the LLVM user and create one big dll comprising all required
components. This big dll would be smaller and load faster than the sum
of individual LLVM libraries. The counterpart, from the POV of the LLVM
developer, is that you get static constructors that you don't want to
use.

Some questions arise:

   What's the estimated ratio of public/private symbols for a LLVM
   library, taking into account inter-library dependencies.

   Same question but without inter-library dependencies. Just those
   needed by executables.

   As a LLVM developer, do you see any advantage on using shared
   libraries? (supposing that we overcome the startup slowdown).

   Thinking as a LLVM user, are shared libraries interesting to you? If
   the answer is yes, is the one-big-dll mentioned above useful to you?

I don't think it matters much to the existing developer base. The people who are likely to get very excited about this are the distributors trying to package LLVM for their Linux distro.

Nick

Note that many parts of LLVM today do not have stable ABIs. From a
quick survey, there are even some ABIs which change when NDEBUG is
defined. Anyone interested in persuing shared libraries of LLVM
should come with a plan for dealing with this.

Dan

Óscar Fuentes wrote:

It seems that --enable-shared builds are not used by LLVM developers
because the executables starts slowly. I guess this is related to the
number of symbols the dynamic linker has to resolve. If we could reduce
the symbols exported to those which are required, maybe the startup time
would become bearable.

I think that premise needs to be retested. I don't doubt that we have a
very large number of symbols, but I wouldn't be so sure that it still
results in a large load time on modern hardware.

A very quick testing on RHEL 5.1 using LLVM GCC resulted in ‘make check’ taking 0:48 minutes in a static build, and 2:28 minutes shared. I wouldn't consider my setup representative of anything, though, so I suggest people try it out themselves :slight_smile:

Attached below is a very rough patch which touches pretty much the entire build system. It also includes an attempt to allow LLVM itself to be compiled with LTO, and a modification to AutoRegen.sh to allow forcing a regeneration, but they aren't strictly related to it.

The configuration I used was:

./configure --enable-optimized --enable-shared --disable-static --disable-bindings --disable-doxygen --enable-targets=x86,cbe,cpp CC=llvm-gcc CXX=llvm-g++

I seriously doubt the patch is ready for inclusion, but it's better than nothing :wink:

llvm-shared.diff (35.2 KB)

I see one advantage, currently if two libraries use llvm and are linked
together in one program, then it asserts at starting since the two libraries
will initialize some of the basic llvm types at start up (or at least I had
the issue a couple of monthes ago) which was easily solved in my case in
making sure that llvm stuff was only linked in a common library.

As for loading time, I guess it's a big problem for llvm's tools (such as
llvm-gcc), since it is usual to call them a lot while compiling.

For me, the main motivation for building LLVM as shared libraries was disk footprint. LLVM built statically is 84 megabytes, whereas a shared build is 48 megabytes, both installed.

It might be worthwhile to reduce the number of distinct libraries. For what it's worth, I had to merge the JIT and Interpreter libraries into ExecutionEngine to avoid circular references. A similar approach (but done in a cleaner way) could be used for other components.