RFC: Upcoming Build System Changes

Hi all,

As you might have inferred, I'm in the process of working on some changes to the
way LLVM builds. I have outlined a rough proposal below, unless there are any
major objections I will probably start committing stuff next week.

This message may be verbose, if you want the executive summary, skip
to 'What This
Means For Jane "LLVM Developer" Doe' at the bottom.

Motivation

I have a very high level comment, and you may be able to directly shed light on it before I dig into a lot more detail.

Why not simply standardize on CMake? It’s not my favorite tool, but it seems to work well, we have established usage of it, and several people involved in the project who understand how it works. It doesn’t seem like a significantly more burdensome dependency than Python when developing, and it remains possible to build installable packages for the casual hacker.

I can see some objections to CMake, but it’s not clear to me that they should carry the day. I’m also probably missing some.

The one I see most clearly is that the CMake build, as it stands, involves Too Much Magic. I don’t at all disagree. That said, I strongly believe this could be completely addressed.

  • If we moved to CMake as the standard build system, numerous kludgy aspects of the current build would go away. They are often in existence purely to support interoperation with the old system.

  • It would be very straight forward to centralize all of the library dependencies and descriptions in the single top-level CMakeLists.txt file, making it easily consumable by your average developer. It would have a format no harder to edit or understand than the one you propose, and they would both (at worst) be unfamiliar to existing developers.

  • It would likely improve the quality of our CMake builds by ensuring it was well tested and always in a consistent state.

  • It already has a relatively optimized makefile-generation system, so we wouldn’t need to re-invent this wheel again.

The biggest downside to making CMake the standard build system is the dependence on CMake to my eyes. However, among the cross-platform users of LLVM, I think CMake is often the preferred build system. I know of folks using it under xcode, visual studio, mingw, cygwin, and all flavors of Linux.

Anyways, I’m sure there are more considerations than just these, I just think it would be beneficial to seriously consider using an existing meta-build system rather than rolling our own.
-Chandler

Something else I wanted to mention, although I don’t know how relevant it really is to most LLVM and/or Clang developers is that we have several Clang developers who are actually contributing to CMake specifically around integration with Clang and related tools. I expect these to increase over the next year…

Daniel Dunbar <daniel@zuster.org> writes:

[snip]

So as having two build systems is problematic, you pretend to improve
the situation by adding a third component that introduces a new
dependency and by inventing a new specification that requires extensive
changes to the existing systems... just to factorize the most obvious
duplicities.

Sorry to be blunt, but this is insane.

If having two build systems is a problem, just standardize on cmake. It
has a few disadvantages (mostly related with it being a build generator,
not a build tool) but lots of advantages. The LLVM cmake system was
heavily influenced by the strict requirement of replicating the concepts
and several key features of the Makefile-based system. I don't know how
much the cmake build changed since I stepped back as the maintainer
(apart from the transition to explicit library dependencies), but there
are lots of room for improvement.

IMO the right thing here is to ask what is missing on the cmake build
that prevents its acceptance as the replacement for the makefile-based
system. My guess is that whatever the cmake system lacks can be fixed
with a fraction of the work your plan requires.

I have a very high level comment, and you may be able to directly shed light on it before I dig into a lot more detail.

Why not simply standardize on CMake? It’s not my favorite tool, but it seems to work well, we have established usage of it, and several people involved in the project who understand how it works. It doesn’t seem like a significantly more burdensome dependency than Python when developing, and it remains possible to build installable packages for the casual hacker.

I can see some objections to CMake, but it’s not clear to me that they should carry the day. I’m also probably missing some.

There are several major problems with CMake IMO:

  1. It generates really slow build systems.

  2. The build system generated by cmake is absolute garbage, at least for Xcode. The build times of it are really bad, and having to work with it in the IDE is even more terrible.

  3. I’d really like us to get to explicit and principled library dependencies, where the build horks if you accidentally add a library dependency. Without this, I don’t see how LLVM will ever scale up.

  4. I’d really like us to move in a direction where LLVM is less monolithic. Right now “llvm” contains all the MC stuff, all the targets, all the mid-level optimizer, etc. Adding a new LLVM MC-based linker will cause it to naturally get dropped into the llvm project, which is great but not helping the monolithicness. :slight_smile:

To me at least, I see this as a first step to getting LLVM to be a more scalable project. Things like llvm-config are essential, but not extensible to subprojects like Clang, etc.

Anyways, I’m sure there are more considerations than just these, I just think it would be beneficial to seriously consider using an existing meta-build system rather than rolling our own.

It seems that all sufficiently large open source projects evolve their own meta build systems!

-Chris

Chris Lattner <clattner@apple.com> writes:

There are several major problems with CMake IMO:

1. It generates really slow build systems.

In my Linux box, last time I checked (long time ago) the cmake build was
a bit faster than the Makefiles. But this is a tricky terrain, because
they are not identical, not on the features supported (and some have an
impact on build-time) nor even on the options passed to the compiler.

2. The build system generated by cmake is absolute garbage, at least
for Xcode. The build times of it are really bad, and having to work
with it in the IDE is even more terrible.

AFAIK there is a Xcode project file on the LLVM source tree. Are the
LLVM makefiles used by the Xcode project? If the Xcode project files
generated by cmake is not satisfactory, can't they use the Makefiles
generated by cmake instead?

Months ago an Apple developer contacted the cmake team for exposing
those problems. I don't know if the effort had some positive result.

3. I'd really like us to get to explicit and principled library
dependencies, where the build horks if you accidentally add a library
dependency. Without this, I don't see how LLVM will ever scale up.

Explicit library dependencies are not a requirement if you want to
inform the developer(s) about a change on the dependency graph. Figuring
out the change is a boring task that can be automated. Approving the
change is the part where a human brain is required. The proponents of
explicit dependencies seem to mix those two things.

Anyways, that's not a problem with cmake. Actually, as it actually
switched to explicit dependencies, it holds and "advantage" there.

4. I'd really like us to move in a direction where LLVM is less
monolithic. Right now "llvm" contains all the MC stuff, all the
targets, all the mid-level optimizer, etc. Adding a new LLVM MC-based
linker will cause it to naturally get dropped into the llvm project,
which is great but not helping the monolithicness. :slight_smile:

Cmake is far more flexible here. Just consider the changes required for
building clang outside the LLVM source tree, a feature that was added
long time after the cmake build was stable. Implementing that took less
than an hour.

[snip]

Hi Daniel,

Hi all,

As you might have inferred, I'm in the process of working on some changes to the
way LLVM builds. I have outlined a rough proposal below, unless there are any
major objections I will probably start committing stuff next week.

I'm not an LLVM dev, but I am the maintainer for all of the LLVM-related packages for my distro. I have a few thoughts on changes to the build system that could make my life easier, but they're largely orthogonal to the work you described. Are you interested in hearing about them now, or would you rather just focus on the (rather large) task you've already set out? I've been planning on doing a writeup to send to the list anyway, but if there's a chance you'll integrate the changes in work starting next week I'll be sure to collect all my thoughts and send them sooner.

Regards,
Shea Levy

Chris Lattner <clattner@apple.com> writes:

There are several major problems with CMake IMO:

1. It generates really slow build systems.

In my Linux box, last time I checked (long time ago) the cmake build was
a bit faster than the Makefiles. But this is a tricky terrain, because
they are not identical, not on the features supported (and some have an
impact on build-time) nor even on the options passed to the compiler.

The makefiles are known to be really slow, among other problems being based on recursive make. One goal of this is to get a non-recursive makefile generated. We've prototyped this in the past and found it to be substantially faster than the recursive makefile.

2. The build system generated by cmake is absolute garbage, at least
for Xcode. The build times of it are really bad, and having to work
with it in the IDE is even more terrible.

AFAIK there is a Xcode project file on the LLVM source tree.

Nope, there was once but it was removed a long time ago though.

Are the
LLVM makefiles used by the Xcode project?

No, it is generated by Cmake.

If the Xcode project files
generated by cmake is not satisfactory, can't they use the Makefiles
generated by cmake instead?

Xcode can drive a makefile, but it doesn't provide any of the IDE integration features, e.g. clang code completion.

-Chris

Chris Lattner <clattner@apple.com> writes:

1. It generates really slow build systems.

In my Linux box, last time I checked (long time ago) the cmake build was
a bit faster than the Makefiles. But this is a tricky terrain, because
they are not identical, not on the features supported (and some have an
impact on build-time) nor even on the options passed to the compiler.

The makefiles are known to be really slow, among other problems being
based on recursive make. One goal of this is to get a non-recursive
makefile generated. We've prototyped this in the past and found it to
be substantially faster than the recursive makefile.

A good measure of how fast a set of Makefile are is to run the build
with all targets up-to-date. Both builds takes a few seconds (3 or so)
on my Linux quad core box. Whatever improvement can be achieved on this
seems pretty insignifant.

Furthermore, recursive make is necessary for automatic generation of
header dependencies, among other things. The makefiles generated by
cmake are "partially" recursive for that reason:

http://www.cmake.org/Wiki/CMake_FAQ#Why_does_CMake_generate_recursive_Makefiles.3F

2. The build system generated by cmake is absolute garbage, at least
for Xcode. The build times of it are really bad, and having to work
with it in the IDE is even more terrible.

AFAIK there is a Xcode project file on the LLVM source tree.

Nope, there was once but it was removed a long time ago though.

Are the
LLVM makefiles used by the Xcode project?

No, it is generated by Cmake.

So, the cmake-generated Xcode file was considered good enough or... ?

[snip]

Anyways, if you wish to avoid duplicating info on both Makefile and
CMakeLists.txt there is a simple solution: read and parse the Makefile
from the corresponding CMakeLists.txt. For instance, if you put the
library dependencies on the Makefile like this:

LLVMLIBDEPS := foo zoo bar

obtaining that info from the CMakeLists.txt and generating the cmake
library dependencies is very simple, nor even you have to put anything
new on all those CMakeLists.txt, just modify one of the macros that are
(indirectly) called from each CMakeLists.txt.

+1: Extract full, exact (meta) data in pure form in easy to handle format and generate bits for concrete build system on demand - simply great!

As a side effect, probably LLVMBuild.txt data would help the distribution maintainers in the support of the package specifications too.

Hi Oskar,

If having two build systems is a problem, just standardize on cmake.

Does cmake support cross-compilation? Can it cross-compile LLVM ?

Hi Anton,

Cmake definitely support cross-compilation. This is achieved by defining some so-called 'toolchain' files. Doing this is not well documented in the web documentation, as far as I know, but the cmake paper book describe it extensively. Several reasonnable size projects here have used it successfully.

Regards,

I disagree there. Perl is pretty much guaranteed to be installed on any UNIXish system. Even FreeBSD, which has removed it from the base system, tends to install the Perl package by default. In contrast, a lot of the machines I use don't have Python installed. I need to install it if I'm doing LLVM development because it's needed for the tests, but needing it just to build seems like massive overkill.

That said, if the information required for the build is going to be made explicit, maybe this isn't such a problem, as other tools can be written to parse it and run the build.

David

That would establish a hard dependency on CMake. Not every system has
CMake whereas most systems do have Python by default (on the machines
I use daily, Python has a 5-1 lead).
See also David Chisnall's mail about Perl > Python.

Csaba

While eliminating duplication is one of the goals I see in this build system change, I think the more important ones are a) simplifying the build files and b) making the build faster.

Adding CMake code (I agree it’s a terrible scripting language) to parse Makefiles will make the build slower and more complicated.

I wouldn’t say that. I know quite a few systems here around that even try to avoid python where possible. but cmake however, as a build system, is welcomed by all of us (working as a sysop in a unix environment).

I’d also (as a non-llvm-dev but llvm-userdev) vote for NOT reinventing the wheel but to use the tool the fits you the best, personally that’s even cmake, too. it has a well list of great backing companies / projects and is still improving well, e.g. Qt planned (I do not know how up-to-date this info is) improve it in a way to make it more suitable for IDEs, however, from the sysop point of view, it’s much more a pleasure to work with cmake than with autotools, and when you introduce (yet) another new build system, it would be just a headache :slight_smile:

Best regards,
Christian Parpart.

I wouldn’t say that. I know quite a few systems here around that even try to avoid python where possible. but cmake however, as a build system, is welcomed by all of us (working as a sysop in a unix environment).

I’d also (as a non-llvm-dev but llvm-userdev) vote for NOT reinventing the wheel but to use the tool the fits you the best, personally that’s even cmake, too. it has a well list of great backing companies / projects and is still improving well, e.g. Qt planned (I do not know how up-to-date this info is) improve it in a way to make it more suitable for IDEs, however, from the sysop point of view, it’s much more a pleasure to work with cmake than with autotools, and when you introduce (yet) another new build system, it would be just a headache :slight_smile:

If I understand the proposal correctly, from a Jan Doe (llvm-user) point of view, you will just continue to use cmake. The only difference will be that cmake will call a python script to generate a bunch of files used by cmake or make.

That said, wouldn’t it be possible to not require this script to be run for the users, but simply add resulting files in the repository (just like many project do not require you run autoconf, and just distribute the generated configure script).

Like that, the python dependency will be required only if you change the module description files, and not for casual developers and users who just plan to recompile llvm.

Best regards,
Christian Parpart.

I have a very high level comment, and you may be able to directly shed light
on it before I dig into a lot more detail.
Why not simply standardize on CMake?

That would establish a hard dependency on CMake. Not every system has
CMake whereas most systems do have Python by default (on the machines
I use daily, Python has a 5-1 lead).
See also David Chisnall’s mail about Perl > Python.

Csaba

GCS a+ e++ d- C++ ULS$ L+$ !E- W++ P+++$ w++$ tv+ b++ DI D++ 5++
The Tao of math: The numbers you can count are not the real numbers.
Life is complex, with real and imaginary parts.
"Ok, it boots. Which means it must be bug-free and perfect. " – Linus Torvalds
“People disagree with me. I just ignore them.” – Linus Torvalds


LLVM Developers mailing list
LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


LLVM Developers mailing list
LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

– Jean-Daniel

+1

We build our OpenCL SDK (for windows and Linux) using CMake. We’ve integrated LLVM’s Cmake hierarchy into our own (customizing LLVM external parameters like build and install directories, added passes, etc)

Migrating LLVM’s build system from CMake to something else would require us to change the way we currently do things.

Hi David,

I disagree there. Perl is pretty much guaranteed to be installed on any UNIXish system. Even FreeBSD, which has removed it from the base system, tends to install the Perl package by default. In contrast, a lot of the machines I use don't have Python installed. I need to install it if I'm doing LLVM development because it's needed for the tests, but needing it just to build seems like massive overkill.

It is possible that you are right about the overall index of "presence". Just to mention that in Fedora derivatives (RedHat, CentOS, maybe 35-50% of Linux dev stations - I don't know) Python is guaranteed because of yum (the package manager).

That said, if the information required for the build is going to be made explicit, maybe this isn't such a problem, as other tools can be written to parse it and run the build.

Absolutely - once the generators are prototyped and tested in Python, if current Perl (presence) > Python's, they can be easily ported to Perl.

Kind Regards,
Alek

Hi,

my 2 cents:

1) If you're targeting python 2.6 or less, I prefer a python
dependency over cmake 'cause it's already installed on all machines I
build clang on. If you're targeting 2.7+, I don't care either way.

2) On the chromium project, we're using a custom meta build system
written in python, and it's been a lot more useful than we initially
expected. Based on this experience, your proposal sounds good to me.

Nico