Why are LLVM libraries enormous?

I am considering using LLVM in a project for a Windows CE where space is at a premium. My jaw dropped when I checked the size of HowToUseJIT.exe (VC++ Win32 debug): 15.4 MB! The release build of HowToUseJIT is “only” 3.39 MB, but this is still 85% larger than the binary to which I was thinking of adding LLVM.

The top ten LLVM libraries (Win32 *.lib) are pretty huge:

Release Bld Debug Bld Name

24,510,490 71,038,240 LLVMCodeGen.lib

21,084,666 56,724,338 LLVMCore.lib

14,624,218 37,070,488 LLVMAnalysis.lib

11,987,202 30,711,450 LLVMScalarOpts.lib

8,600,668 23,837,478 LLVMSelectionDAG.lib

8,634,324 23,802,952 LLVMTransformUtils.lib

8,347,134 20,840,744 LLVMipo.lib

5,061,702 11,028,744 LLVMX86CodeGen.lib

3,857,612 9,270,012 LLVMInstCombine.lib

3,330,608 7,820,760 LLVMSupport.lib

The binaries are vastly larger than the source code; for example, everything in lib/CodeGen is 3.63 MB and everything in lib/VMCore is 907 KB. This is quite different than my typical experience; my own C++ source code is larger than the Release DLL it compiles into.

Does anyone know why this stuff is so big, and whether there is a way to get a bare subset of LLVM that fits in under 1 MB?

Not sure about Win32, but here are some numbers on OS X for comparison:

5,282,356 libLLVMCodeGen.a
3,087,436 libLLVMAnalysis.a
1,682,476 libLLVMInstCombine.a

I believe these are all release builds.

Trevor

Trevor Harmon <Trevor.W.Harmon@nasa.gov> writes:

The top ten LLVM libraries (Win32 *.lib) are pretty huge:

Release Bld Debug Bld Name
24,510,490 71,038,240 LLVMCodeGen.lib

[snip]

Not sure about Win32, but here are some numbers on OS X for comparison:

5,282,356 libLLVMCodeGen.a

Comparing the size of the static libraries makes little sense, and even
less when they are compiled by different tools. What really matters is
the size of the executables.

I agree that LLVM can be considered a heavyweight dependency on this
aspect.

[snip]

Why is the size of static libraries a "nonsensical" topic of discussion? Anyway, in the same example I mentioned that the size of HowToUseJIT (an executable) is very large (15.4 MB debug, 3.4 MB release); as is a small example, I'd expect any real-world executable to be larger. I think it's fair to wonder what makes it is so large, why MacOS seems to get different results, and whether it is possible to construct an example less than 1 MB.

David Piepgrass <dpiepgrass@mentoreng.com> writes:

Comparing the size of the static libraries makes little sense, and even
less when they are compiled by different tools. What really matters is
the size of the executables.

I agree that LLVM can be considered a heavyweight dependency on this
aspect.

Why is the size of static libraries a "nonsensical" topic of
discussion?

Why do you care about the size of library files?

Anyway, in the same example I mentioned that the size of HowToUseJIT
(an executable) is very large (15.4 MB debug, 3.4 MB release); as is a
small example, I'd expect any real-world executable to be larger.

Of course a real-world project would be larger, but not on a "linear"
proportion compared to HowToUseJIT. That example application pulls a big
chunk from the LLVM libraries. That is what makes it large, not the code
in howtousejit.cpp. My compiler, for instance, is anything but a toy
application and is 5.7 MB.

I think it's fair to wonder what makes it is so large, why MacOS seems
to get different results,

If you want to compare sizes, you must limit your comparisons to
executable files. Why would be relevant that XCode produces library
files smaller than Visual Studio? Its comparing apples to oranges.

and whether it is possible to construct an example less than 1 MB.

A LLVM JIT compiler for x86 under 1 MB? I doubt it is possible without a
major rewriting of LLVM.

I think it’s fair to wonder what makes it is so large, why MacOS seems to get different results, and whether it is possible to construct an example less than 1 MB.

My experience with trying to make the smallest JIT possible resulted in something that was around 1.3MB on Windows, using VC++ to compile, optimizing for size, and linking with /OPT:REF,ICF. This JIT did no optimization at all - pulling in something like instcombine tripled the size if I recall correctly. It’s possible I missed opportunities for making it smaller, but when I asked on the list I got no further suggestions for improving on this.

I don’t know how feasible it would be to refactor the existing code to get something significantly smaller that actually performed some optimization as well.

Mark

Why do you care about the size of library files?

I assumed dynamic libraries and static libraries were similar in size, but I just checked some of my own static libraries and they are indeed much larger than the executables they compile to. Sorry, it just never occurred to me that they would be much different.

> Anyway, in the same example I mentioned that the size of HowToUseJIT
> (an executable) is very large (15.4 MB debug, 3.4 MB release); as is
a
> small example, I'd expect any real-world executable to be larger.

Of course a real-world project would be larger, but not on a "linear"
proportion compared to HowToUseJIT. That example application pulls a
big
chunk from the LLVM libraries. That is what makes it large, not the
code
in howtousejit.cpp. My compiler, for instance, is anything but a toy
application and is 5.7 MB.

> and whether it is possible to construct an example less than 1 MB.

A LLVM JIT compiler for x86 under 1 MB? I doubt it is possible without
a major rewriting of LLVM.

Even with no optimizations? Drat. That means I can't use it.

It's too bad nobody's written a utility to profile the sizes of C++ classes/functions... that would sure help an investigation like this. A question at StackOverflow didn't turn up any such utility:

http://stackoverflow.com/questions/1051597/is-there-a-function-size-profiler-out-there

Why? I'd never checked, but I always assumed the LLVM JIT was much
larger than 3.4 MB.

For comparison:
[rnk@tamalpais google3]$ du -h /usr/lib/gcc/x86_64-linux-gnu/4.4/cc1plus
10M /usr/lib/gcc/x86_64-linux-gnu/4.4/cc1plus
[rnk@tamalpais google3]$ du -h `which python2.6`
2.5M /usr/bin/python2.6

It seems reasonable that a JIT compiler with optimizers would weigh in
somewhere between an interpreter and a full C++ compiler.

Reid

The size of static libraries is relevant because it places an upper bound on the size of the executable. Otherwise we can only speak anecdotally about "typical" executables that use "some" of the LLVM features.

As for the apples-to-oranges comparison between GCC output and Visual Studio output, having additional data points from other environments may be helpful in understanding whether a size issue affects all platforms or is specific to Visual Studio.

Trevor

Reid Kleckner wrote:

A LLVM JIT compiler for x86 under 1 MB? I doubt it is possible without
a major rewriting of LLVM.

Even with no optimizations? Drat. That means I can't use it.

Why? I'd never checked, but I always assumed the LLVM JIT was much
larger than 3.4 MB.

For comparison:
[rnk@tamalpais google3]$ du -h /usr/lib/gcc/x86_64-linux-gnu/4.4/cc1plus
10M /usr/lib/gcc/x86_64-linux-gnu/4.4/cc1plus
[rnk@tamalpais google3]$ du -h `which python2.6`
2.5M /usr/bin/python2.6

It seems reasonable that a JIT compiler with optimizers would weigh in
somewhere between an interpreter and a full C++ compiler.

You are forgetting that python2.6 includes a lot of libraries.
The object file for the interpreter is about 130k.

Reid
_______________________________________________
LLVM Developers mailing list
LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Mark.

Trevor Harmon <Trevor.W.Harmon@nasa.gov> writes:

Why would be relevant that XCode produces library
files smaller than Visual Studio? Its comparing apples to oranges.

The size of static libraries is relevant because it places an upper
bound on the size of the executable.

This is not strictly true, as some compilers can generate code from the
library contents (think on a library containing LLVM bytecode, or a C++
library that relies on the linker for template instantiation.) But for
our purposes, let's accept that upper bound.

Otherwise we can only speak anecdotally about "typical" executables
that use "some" of the LLVM features.

That upper bound is useful just in the case where the combined size of
*all* the libraries looks small enough to you. OTOH, labeling a library
as fatware just looking at the size of the static libraries is wrong.

I'll say that the right method for estimating the size of a project that
uses LLVM is to determine the features you need (JIT? static code
generation? optimizations? backend(s)? etc) and to create an executable
that links then in. That is far more accurate than adding the file size
of static libraries.

As for the apples-to-oranges comparison between GCC output and Visual
Studio output, having additional data points from other environments
may be helpful in understanding whether a size issue affects all
platforms or is specific to Visual Studio.

As mentioned before, this only makes sense for executable files. Of
course with debug info stripped, optimizations enabled and with the
runtime C/C++ libraries dynamically linked.

>> A LLVM JIT compiler for x86 under 1 MB? I doubt it is possible
>> without a major rewriting of LLVM.
>
> Even with no optimizations? Drat. That means I can't use it.

Why? I'd never checked, but I always assumed the LLVM JIT was much
larger than 3.4 MB.

It is ~4.8M here.
Here are some size comparisons from ClamAV on Linux:
without JIT, -m32, -Os, stripped: 835K
with JIT, -m32, -Os, stripped: 5.6M
with JIT, -m64, -O2, stripped: 8.8M

If LLVM is compiled with debug info, and not stripped then it can be as
big as 70MB.

The JIT of course includes the code to generate LLVM IR for x86, and do
some minimal optimizations on it (mem2reg, dce).

Why is the size of static libraries a "nonsensical" topic of
discussion?

Because they include copies of the same code multiple times.
When you link an executable you only get 1 copy.
Think of templates being instantiated in different files with the same
type.
They also include symbol (and perhaps debug) information on Linux.
I think VS keeps symbols separate.

A more useful upper bound for size would be to create a shared library
from all of LLVM.

Hi David,

I've had some success analyzing the binary size by making Visual C++
generate a .map file. This basically tells you at what binary offset each
function is located in the exe or dll. By subtracting the offset of the next
function you get the actual binary size of each function (including
alignment). Then you can aggregate these by class or by object file and sort
by size to get a real idea of where the big code is.

If I recall correctly, I got the JIT down to about 2 MB using information
collected from the .map file. This still included some optimization passes.

Unfortunately a lot of features are fairly tightly interwoven. If you don't
need support for debugging, exceptions, garbage collection, intrinsics,
arbitrary precision integers and/or vectors, I bet you could get it way
smaller. But it would take considerable effort to pry loose. Also, some
passes can do a lot of things which you might not be interested in. For
example instcomb is huge but you probably only need a handful of the
possible combinations to make your JIT code a lot faster. Most of the
optimizations can be performed statically in the high-level language anyway
(e.g. replacing a division by 2 by a shift right).

So I'm quite convinced that LLVM can be made smaller than 1 MB, but it will
take some custom work. With a good test suite you can systematically cut
things out and ensure everything keeps working. Unfortunately it will become
infeasible to merge patches from main development into your local tree, so
you won't benefit from any advances or bug fixes there. But if you're happy
with LLVM 2.7's functionality, now and later, then this is a feasible option
if you have a couple months to do the custom work.

Cheers,

Nicolas