Compile Server?

Hi,

So, over the Christmas break I spent some time trying to get my head
around the clang code base. While looking around for a little project I
could do to get me started, I happened upon this thread [1] from last
summer which piqued my interest.

[1] http://lists.cs.uiuc.edu/pipermail/cfe-dev/2010-June/009473.html

The thread describes the idea of a compile-server and some ideas of how
such a thing might be used to improve compile speeds. However I had a
poke around the source and cfe-commits and as far as I can tell nothing
much has happened with the idea.

Is someone still actively looking at this, or planning on doing so? If
not, I wouldn't mind having a go at it myself.

Mind you, I had a go this weekend at hacking in some (really ugly)
socket code just to get a feel for it, and TBH I'm starting to wonder if
an actual server is the best approach - it seems a tad over-engineered.

As I understand it, the goal is to present a single instance of clang
with a list of compile jobs it needs to perform, allowing it to cache
headers and intermediate results in memory. IMHO the most obvious way
of doing this is to simply read a list of job descriptions from a file.

The main (only?) reason of using a server process is because 'make' and
other build tools do not call the compiler in this way, but call it
repeatedly for each source file. However I can't help but think that if
some sort of "batch mode" was available and it allowed for significantly
improved compile times, then people would find ways to make the various
build tools work with it.

What do you guys think?

Cheers,

  David.

Hi,

So, over the Christmas break I spent some time trying to get my head
around the clang code base. While looking around for a little project I
could do to get me started, I happened upon this thread [1] from last
summer which piqued my interest.

[1] http://lists.cs.uiuc.edu/pipermail/cfe-dev/2010-June/009473.html

The thread describes the idea of a compile-server and some ideas of how
such a thing might be used to improve compile speeds. However I had a
poke around the source and cfe-commits and as far as I can tell nothing
much has happened with the idea.

Is someone still actively looking at this, or planning on doing so? If
not, I wouldn't mind having a go at it myself.

Not to my knowledge. At the time I was an intern at Google, and a few
weeks after I sent that message, we decided that without changing C++
semantics, the approach wouldn't work, as suggested by Doug. That
didn't really fit the scope of an intern project, so I moved on.

That said, last I heard there were some meetings trying to sketch out
what those changes would look like.

Mind you, I had a go this weekend at hacking in some (really ugly)
socket code just to get a feel for it, and TBH I'm starting to wonder if
an actual server is the best approach - it seems a tad over-engineered.

As I understand it, the goal is to present a single instance of clang
with a list of compile jobs it needs to perform, allowing it to cache
headers and intermediate results in memory. IMHO the most obvious way
of doing this is to simply read a list of job descriptions from a file.

One use case we had in mind for the server was to reuse intermediate
results from other builds. For example, two unrelated people run two
builds including the same system header to separate source files using
sufficiently similar build flags that we can (somehow magically) tell
the results will be the same. Then we could reuse the header's ASTs.

The main (only?) reason of using a server process is because 'make' and
other build tools do not call the compiler in this way, but call it
repeatedly for each source file. However I can't help but think that if
some sort of "batch mode" was available and it allowed for significantly
improved compile times, then people would find ways to make the various
build tools work with it.

What do you guys think?

Sounds like integrating a minimalist build system into the compiler.

I think it will also depend on what you're trying to optimize:
incremental builds, or clean builds.

There's been some work that I don't know much about to speed up the
former in XCode for diagnostics.

For the latter, I don't think cutting out the subprocess spawning is
worth very much. To really win, you want to cut out time spent
reparsing the same old headers. If your project uses #include
"world.h", PCH should work for you. If you don't, then C++ language
semantics make it difficult to reuse results.

I think efforts on speeding up that kind of build should be built on
some slightly modified variant of C++, so that you can parse headers
in isolation and stitch together the ASTs.

Reid

Hi,

So, over the Christmas break I spent some time trying to get my head
around the clang code base. While looking around for a little project I
could do to get me started, I happened upon this thread [1] from last
summer which piqued my interest.

[1] http://lists.cs.uiuc.edu/pipermail/cfe-dev/2010-June/009473.html

The thread describes the idea of a compile-server and some ideas of how
such a thing might be used to improve compile speeds. However I had a
poke around the source and cfe-commits and as far as I can tell nothing
much has happened with the idea.

Is someone still actively looking at this, or planning on doing so? If
not, I wouldn't mind having a go at it myself.

I don't know of anyone planning to work on this.

Mind you, I had a go this weekend at hacking in some (really ugly)
socket code just to get a feel for it, and TBH I'm starting to wonder if
an actual server is the best approach - it seems a tad over-engineered.

As I understand it, the goal is to present a single instance of clang
with a list of compile jobs it needs to perform, allowing it to cache
headers and intermediate results in memory. IMHO the most obvious way
of doing this is to simply read a list of job descriptions from a file.

As Reid notes, C/C++ make caching of intermediate results really tricky. However, you can perform some optimizations, e.g., detecting common sequences of prefix headers and automatically generating precompiled headers for them.

The main (only?) reason of using a server process is because 'make' and
other build tools do not call the compiler in this way, but call it
repeatedly for each source file. However I can't help but think that if
some sort of "batch mode" was available and it allowed for significantly
improved compile times, then people would find ways to make the various
build tools work with it.

I think one of the other goals of a compiler server is to support interactive clients, such as an IDE. So, it's not that we have a list of compile jobs provided in advance: we have requests coming in, typically for some smallish set of files that are constantly changing, and we need to return results quickly.

libclang does some optimization of these cases internally; a compiler server could generalize those optimizations.

  - Doug

As Reid notes, C/C++ make caching of intermediate results really
tricky. However, you can perform some optimizations, e.g., detecting
common sequences of prefix headers and automatically generating
precompiled headers for them.

Yes, this is the sort of thing I had in mind, start with automatically
creating a PCH file for all the headers in the first source file and
check whether it can be used for the next. There is, I think, a tiny
bit of flexibility in that if the second source file only includes the
first few headers (in the same order) than it should be OK to use the
first part of the PCH. I think that should cover quite a lot of common
cases.

Once that is in place, it might then be possible to try to go further.
I've read the gcc-compile-server paper referenced in the original thread
and I can see that navigating the dependencies problem is fraught with
difficulties - however it might be possible to identify some limited
cases where it might be safe to proceed and sort of nibble at the
problem from the edges? If nothing else I will probably learn a lot
from trying, which is almost never a bad thing :wink:

I think one of the other goals of a compiler server is to support
interactive clients, such as an IDE. So, it's not that we have a list
of compile jobs provided in advance: we have requests coming in,
typically for some smallish set of files that are constantly
changing, and we need to return results quickly.

Hmm, interesting. If the IDE were written in C++ it would probably be
more efficient for it to call the libraries directly - but I can see the
advantages for something like Eclipse.

Cheers,

  David.

As Reid notes, C/C++ make caching of intermediate results really
tricky. However, you can perform some optimizations, e.g., detecting
common sequences of prefix headers and automatically generating
precompiled headers for them.

Yes, this is the sort of thing I had in mind, start with automatically
creating a PCH file for all the headers in the first source file and
check whether it can be used for the next. There is, I think, a tiny
bit of flexibility in that if the second source file only includes the
first few headers (in the same order) than it should be OK to use the
first part of the PCH. I think that should cover quite a lot of common
cases.

Right. If the compiler server knows what files it will be building, it can compute common prefixes and build PCH files for them. It could also cache stat() calls, coalesce file-opening operations, etc.

Once that is in place, it might then be possible to try to go further.
I've read the gcc-compile-server paper referenced in the original thread
and I can see that navigating the dependencies problem is fraught with
difficulties - however it might be possible to identify some limited
cases where it might be safe to proceed and sort of nibble at the
problem from the edges? If nothing else I will probably learn a lot
from trying, which is almost never a bad thing :wink:

I don't think that the dependencies problem can ever be nibbled away. The header-inclusion model is so fundamentally broken that it can only really be fixed by introducing some kind of real module system. That's not to say that the compile server is a bad idea---it isn't, and it could be very powerful---but I don't think it's worth going for a partial/limited solution to the dependencies problem.

I think one of the other goals of a compiler server is to support
interactive clients, such as an IDE. So, it's not that we have a list
of compile jobs provided in advance: we have requests coming in,
typically for some smallish set of files that are constantly
changing, and we need to return results quickly.

Hmm, interesting. If the IDE were written in C++ it would probably be
more efficient for it to call the libraries directly - but I can see the
advantages for something like Eclipse.

There are other reasons to want a compile server. Perhaps the IDE is just vim, where people start and kill sessions constantly and could benefit from persistent state in the compile server. Or the build step that's invoked from the IDE is actually calling out to "make", so your compile server could optimize the in-editor queries alongside the build itself. Or you want to isolate your perfect, bug-free IDE from that crash-ridden Clang compiler :slight_smile:

  - Doug

Douglas Gregor <dgregor-2kanFRK1NckAvxtiuMwx3w@public.gmane.org> writes:

[...]

There are other reasons to want a compile server. Perhaps the IDE is
just vim, where people start and kill sessions constantly and could
benefit from persistent state in the compile server.

Or it's Emacs with flymake-mode, compiling the buffer frequently to show
warnings and errors inline. Fast compiles of very slightly varying
contents is an obvious win there (just as it would be in other IDEs).

Sounds like integrating a minimalist build system into the compiler.

I was only thinking of a simple "compile these input files into these
output files using these compile flags" operation. Incremental builds
would be done by an external build system dynamically generating this
file prior to calling clang.

I think it will also depend on what you're trying to optimize:
incremental builds, or clean builds.

For the latter, I don't think cutting out the subprocess spawning is
worth very much. To really win, you want to cut out time spent
reparsing the same old headers. If your project uses #include
"world.h", PCH should work for you. If you don't, then C++ language
semantics make it difficult to reuse results.

A single clang instance can cache the PCH structures in memory, saving
the need to read the PCH file repeatedly. I don't know if that saves
much though.

I am interested in the idea of auto-generating a PCH file where one
doesn't currently exist. That might be worth exploring.

I think efforts on speeding up that kind of build should be built on
some slightly modified variant of C++, so that you can parse headers
in isolation and stitch together the ASTs.

I definitely agree that something along these lines would probably need
to be done to get truly impressive improvements in compile speeds - but
I don't yet know nearly enough about compiling C++ to even sensibly join
the discussion on that subject. I'm keen to learn though :wink:

Cheers,

  David.