Building clang outside of LLVM (with CMake)

Some time ago Doug mentioned that it would be useful to build clang
outside of LLVM: basically LLVM acts as an external library which is
used by clang.

Just to be sure that I'm working on the right track, what's the
motivation for this requirement?

IIRC Doug mentioned something about build times, and indeed clang takes
about 4 minutes to build alone and almost 10 minutes to build within
LLVM, so if the goal is to quickly build clang multiple times with the
same LLVM base then it is a must-have feature.

Some time ago Doug mentioned that it would be useful to build clang
outside of LLVM: basically LLVM acts as an external library which is
used by clang.

I just was about to propose this myself.

IIRC Doug mentioned something about build times, and indeed clang takes
about 4 minutes to build alone and almost 10 minutes to build within
LLVM, so if the goal is to quickly build clang multiple times with the
same LLVM base then it is a must-have feature.

I’m not sure, but if someone wish to only recompile clang, can’t he do
cd cmake_build_dir/tools/clang && make?

Some time ago Doug mentioned that it would be useful to build clang
outside of LLVM: basically LLVM acts as an external library which is
used by clang.

I just was about to propose this myself.

This would be useful. In particular, projects that are not using clang as a compiler want to build just the libraries required for libclang, which only depend on a few LLVM libraries (Support and System, and possibly the asm parser things if you want inline asm).

IIRC Doug mentioned something about build times, and indeed clang takes
about 4 minutes to build alone and almost 10 minutes to build within
LLVM, so if the goal is to quickly build clang multiple times with the
same LLVM base then it is a must-have feature.

I'm not sure, but if someone wish to only recompile clang, can't he do
cd cmake_build_dir/tools/clang && make?

You can also just do cd llvm/tools/clang && gmake - this works without the CMAKE stuff. If you're hacking on clang, then this is a lot faster than doing a complete LLVM build each time. There's no requirement to do an LLVM build for every clang build, unless the LLVM ABI has changed since your last build (and this rarely happens more than once in any given hour...).

In fact, it would be better if clang used the installed LLVM headers, rather than the ones from the LLVM tree, so that you'd always be building clang with the headers that matched the library.

David

Don't you also need tblgen given that it is used to generate code/classes for clang?
If correct the problem here being that tblgen is a non-library found under utils vs lib.

Garrison

Yes, sorry, I was thinking in terms of run-time dependencies, not build-time dependencies.

David

Well, if someone already installed llvm it’s very likely to have tblgen too.

arrowdodger <6yearold@gmail.com> writes:

Don't you also need tblgen given that it is used to generate code/classes
for clang?
If correct the problem here being that tblgen is a non-library found under
utils vs lib.

Well, if someone already installed llvm it's very likely to have tblgen too.

The feature is working here(*), so you can stop speculation about what's
needed and what's not.

I'm interested on *why* people want the feature, not on *how* to
implement it.

BTW, building libclang and only libclang with its dependencies should
Just Work (TM). If it doesn't, then the dependencies needs some
tightening.

* Testing support may be a serious hurdle, depending on how lit works.

Yeah, I’m just thrown off by the original email to this thread where the phrase “llvm as
an external library” was used in the context of building clang. Also although its volitility is
small, there is a direct dependency of tblgen on clang given the clang specific tblgen
backends such as ClangASTNodesEmitter. Although these backends obviously do not
use clang libs, and their semantics are really concerned with expression tree like
concepts (in my view), this dependency could be fixed and then exploited if tblgen
could load backend plugins. There would still be build order issues for clang, but
the code would be self contained except for LLVM libs. However I’m off topic, so I’ll
shut up now. :slight_smile:

Garrison

For example, my long term primary motivation in joining the clang
project is to bring the static analyzer to work inside Visual Studio
via a plugin. For such plugin any codegen/llvm libraries are not
useful and not needed.

is that what you mean?

We'd like to be able to consider LLVM as a stable library that Clang depends on. This would allow us to build against an installed LLVM (rather than embedding our own full copy of LLVM in Clang), test Clang and LLVM more independently, avoid embedding all of LLVM into the Clang executable (assuming that LLVM becomes a shared library at some point), and make our Xcode/MSVC projects significantly smaller (since they'll only have Clang).

I think this will also become more important in the future, since, for example, LLDB layers on top of LLVM and Clang.

  - Doug

Francois Pichet <pichet2000@gmail.com>
writes:

For example, my long term primary motivation in joining the clang
project is to bring the static analyzer to work inside Visual Studio
via a plugin. For such plugin any codegen/llvm libraries are not
useful and not needed.

is that what you mean?

Not really.

For this you need at least tblgen, Support and System, i.e. you need the
LLVM sources. You already can build a piece of clang without building
everything else. I'm not sure how `make install' works on this case,
though, but that is not a serious issue.

Douglas Gregor <dgregor@apple.com> writes:

We'd like to be able to consider LLVM as a stable library that Clang
depends on.

Okay, that's the idea I had.

This would allow us to build against an installed LLVM
(rather than embedding our own full copy of LLVM in Clang), test Clang
and LLVM more independently,

No problem with the first part and it is mostly done. Testing Clang
outside of LLVM seems the hard part. AFAIK it is not possible to test
LLVM once it is installed, and Clang tests are embedded in LLVM's
testing framework.

I understand that building Clang standalone is of little benefit if you
can't test it. Some comment from someone who knows well the testing part
would be helpful. Dan?

avoid embedding all of LLVM into the Clang executable (assuming that
LLVM becomes a shared library at some point),

AFAIK it is possible to build Clang as a Huge Single Shared Library with
the autoconf build and I don't think it would be too difficult to do the
same with cmake. Right now we can build LLVM as a set of shared
libraries, which possibly is even better than a huge library. But maybe
Clang uses LLVM so extensively that it is not interesting to deal with
multiple libraries instead of just one.

and make our Xcode/MSVC projects significantly smaller (since
they'll only have Clang).

I think this will also become more important in the future, since, for
example, LLDB layers on top of LLVM and Clang.

Ok.

Douglas Gregor <dgregor@apple.com> writes:

We'd like to be able to consider LLVM as a stable library that Clang
depends on.

Okay, that's the idea I had.

This would allow us to build against an installed LLVM
(rather than embedding our own full copy of LLVM in Clang), test Clang
and LLVM more independently,

No problem with the first part and it is mostly done.

Very cool.

Testing Clang
outside of LLVM seems the hard part. AFAIK it is not possible to test
LLVM once it is installed, and Clang tests are embedded in LLVM's
testing framework.

I understand that building Clang standalone is of little benefit if you
can't test it. Some comment from someone who knows well the testing part
would be helpful. Dan?

It's mainly the lit-based Clang testing (in clang's "test" subdirectory) that's of interest here, and I think we can just make sure to pick up the lit.ty that is installed with LLVM. There may be a little lit configuration work to do.

avoid embedding all of LLVM into the Clang executable (assuming that
LLVM becomes a shared library at some point),

AFAIK it is possible to build Clang as a Huge Single Shared Library with
the autoconf build and I don't think it would be too difficult to do the
same with cmake. Right now we can build LLVM as a set of shared
libraries, which possibly is even better than a huge library. But maybe
Clang uses LLVM so extensively that it is not interesting to deal with
multiple libraries instead of just one.

I think the actual packaging as one large shared library vs. several smaller libraries is less important at this point. Just the ability to build against an installed system (and linking against the installed shared libraries, if they're shared) would be a big leap forward.

  - Doug

Douglas Gregor <dgregor@apple.com> writes:

We'd like to be able to consider LLVM as a stable library that Clang
depends on. This would allow us to build against an installed LLVM
(rather than embedding our own full copy of LLVM in Clang), test Clang
and LLVM more independently,

[snip]

There are some configuration options (C_INCLUDE_DIRS,
CLANG_RESOURCE_DIR) which are stored on LLVM's config.h file.

The Right Thing is to create a config.h file for Clang, which is a task
a bit more obtrusive than I used to. It would consist on passing
-DHAVE_CLANG_CONFIG_H on the command line and adding

#ifdef HAVE_CLANG_CONFIG_H
# include "clang/Config/config.h"
#endif

where necessary. This allows the autoconf build to remain unaffected.

The other possibility is to pass those options on the compiler's command
line, which is tricky because they can contain all sort of nasty
characters.

This sounds like the right approach.

  - Doug

Why not just have clang’s config.h just be entirely separate from llvm? While
you may be duplicating a few things in a new configure.ac for clang it wouldn’t
be a whole lot and then you could just (effectively) ./configure clang as
well and it’ll just look for llvm on the system, otherwise just do a recursive
configure?

-eric

You know, now that you mention it… yeah, that would probably be better. Thanks, Eric!

  • Doug

Douglas Gregor <dgregor@apple.com> writes:

Why not just have clang's config.h just be entirely separate from llvm? While
you may be duplicating a few things in a new configure.ac for clang it wouldn't
be a whole lot and then you could just (effectively) ./configure clang as
well and it'll just look for llvm on the system, otherwise just do a recursive
configure?

You know, now that you mention it.... yeah, that would probably be
better. Thanks, Eric!

IMO it is perfectly fine to use the config.h header generated for
LLVM. Else we are duplicating a big chunk of code. Sharing as much
infrastructure with LLVM as posible reduces maintenance and ensures up
to some point that Clang operates the same regardless how you build it.

Apart from that, Eric, please note that we are talking about a
CMake-only feature. Unless you plan to change the LLVM build system (the
traditional one) for implementing it there is no gain at all for the
users on duplicating the platform tests inside the Clang tree.

Douglas Gregor <dgregor@apple.com> writes:

Why not just have clang’s config.h just be entirely separate from llvm? While

you may be duplicating a few things in a new configure.ac for clang it wouldn’t

be a whole lot and then you could just (effectively) ./configure clang as

well and it’ll just look for llvm on the system, otherwise just do a recursive

configure?

You know, now that you mention it… yeah, that would probably be

better. Thanks, Eric!

IMO it is perfectly fine to use the config.h header generated for
LLVM. Else we are duplicating a big chunk of code. Sharing as much
infrastructure with LLVM as posible reduces maintenance and ensures up
to some point that Clang operates the same regardless how you build it.

True, however…

Apart from that, Eric, please note that we are talking about a
CMake-only feature. Unless you plan to change the LLVM build system (the
traditional one) for implementing it there is no gain at all for the
users on duplicating the platform tests inside the Clang tree.

Conditionally including files into llvm based on build system is a non-starter
for me. It’s just ridiculous.

That said, it looks like the config files are already installed by default when
you install llvm which should solve the needing to configure llvm problem when
building clang as a top level. There should still be a way for it to configure
itself, but that’s less of an issue for the cmake side of things.

In addition, I’m sure there are a number of things that are configured at the top
level for llvm that aren’t needed for clang and for clang that aren’t needed
for llvm.

-eric

Eric Christopher <echristo@apple.com> writes:

Apart from that, Eric, please note that we are talking about a
CMake-only feature. Unless you plan to change the LLVM build system (the
traditional one) for implementing it there is no gain at all for the
users on duplicating the platform tests inside the Clang tree.

Conditionally including files into llvm based on build system is a non-starter
for me. It's just ridiculous.

As a generic rule I agree with you, but this case is about a file
containing two macros which it is not included depending on the build
system, but depending on its existence.

However, if someone generates that header from the autoconfigured build
the conditionality goes away :wink: Seriously, though, I don't think the
divergence is serious enough to bother.

That said, it looks like the config files are already installed by
default when you install llvm which should solve the needing to
configure llvm problem when building clang as a top level. There
should still be a way for it to configure itself, but that's less of
an issue for the cmake side of things.

IMO, as long as Clang is indivisible from LLVM, there is no reason for
not exploiting what LLVM has to offer.

In addition, I'm sure there are a number of things that are configured
at the top level for llvm that aren't needed for clang and for clang
that aren't needed for llvm.

Exactly, this is what prompted this subthread. Those two macros
shouldn't be on LLVM's config.h, neither should be any mention of Clang
on LLVM's source tree.

It is an historic accident that Clang and other projects are built from
within LLVM. It was convenient to use the existing LLVM build system but
putting Clang on the same level as llvm-as is simply wrong, not only
from the conceptual point of view but as a practical matter too. The
CMake-based build will allow to treat Clang as a proper project at the
cost of conditionally including one header. That's a bargain, if you ask
me.