final distributed clang patch

Hi!
Here is the final patch for clang to support network distributed compilation. (clang.patch file)
There is also the server part attached. (the tar.gz file)

There 3 new files added to Driver directory: PrintPreprocessedOutputBuffer.cpp what is a modification of PrintPreprocessedOutput.cpp to support print text to a std::ostream.
Other new files: NetSession.h and NetSession.cpp which handles and contains all networking code (portable thin networking code).
There are some files changed, mostly to support saving its output to a std::stream. I’ve used that way to pass clang ASTConsumers data to an other computer via network.

There are 3 new option added to clang. The basic one is -distribute what enables distributed compilation. The other two are: -dist-preprocesslocally and -dist-serializelocally.
If the first one enabled then clang sends a preprocessed file for clangserver (a process in an other machine) to compile. In the second case the lexing and parsing is done locally and the built and serialized AST is sent to clangserver.

You can play with this using -dist-preprocesslocally because it is working.

P.S.
There is some features left to implement like getting and sending full diagnostics (currently only diagID is sended by clangserver), and handling remote file requests (when clangserver (compile server) requires an include file)
But what is currently is implemented is stable and final, there probably will not be big change in this code later.

What is your opinion of this ?

Cheers,
Peter Neumark

clang.patch (61.6 KB)

clangserver.tar.gz (6.47 KB)

Hi!
Here is the final patch for clang to support network distributed
compilation. (clang.patch file)
There is also the server part attached. (the tar.gz file)

Okay... I'm not going to try and review the new code, just the
modifications to existing code.

There 3 new files added to Driver directory:
PrintPreprocessedOutputBuffer.cpp what is a modification of
PrintPreprocessedOutput.cpp to support print text to a std::ostream.

No good; that's a large chunk of code which you're duplicating. Would
a callback-based API be a significant performance hit? Otherwise, can
you refactor the code to avoid duplication?

Other new files: NetSession.h and NetSession.cpp which handles and contains
all networking code (portable thin networking code).

Have you considered separating out the new code into a new directory?
I don't think it's a good idea to be mixing this stuff together; it'll
make reading the code in the Driver/ directory even more confusing
than it is now.

And actually, it might be a good idea to shift around some of the
existing consumers so that they can be used by outside projects more
easily. A lot of the code currently in Driver/, like
PrintPreprocessedOutput, the rewriters, and the diagnostic printer, is
non-trivial. Anyone have any thoughts on making a "lib/Consumers" or
a "lib/ConsumerUtils" directory to make this easier? I know this is
something we've been putting off for a while, but it's gotten to the
point where it really needs to be done.

There are some files changed, mostly to support saving its output to a
std::stream. I've used that way to pass clang ASTConsumers data to an other
computer via network.

I think we want to try and make the APIs of the included ASTConsumers
as general as possible. To that end, it might be better to use some
sort of callback-based system, rather than having to force everything
to use iostream, or having to duplicate code to allow multiple IO
systems.

(I'm sorry about dragging your work into this architectural
discussion, Peter; I don't mean this as a comment on the quality of
your work. That said, making the changes to existing code as clean as
possible is important, so this needs to be discussed.)

-Eli

Hi Peter,

I haven't looked at your patch in detail, but some thoughts in no particular order:

1) it would be very useful to separate and submit the mechanical changes independent from any other changes in your patch. As Eli mentioned, duplicating the -E printer is not acceptable. We should refactor that code into a form acceptable for your work independently of the rest of the patch.

2) It is generally bad form to have headers like NetSession.h #include system-specific stuff (like sys/socket or netinet/in.h) these #includes should move to a .cpp file. I like that you put the system specific code in a platform abstraction layer, it just needs to be a bit tighter.

3) I think we should have a high level discussion about how the new dist-cc implementation integrates with clang. I had envisioned a different *driver* on the client side that shared code but was independent of the clang driver. It doesn't make any sense to distribute many of the things the clang driver does (e.g. -Eonly, -ast-dump, etc). If we can come to a design, I think a number of the changes you made would be unneeded. This would basically amount to your "-dist-preprocesslocally" option, but would be simpler. Once that is working well, more aggressive models can be attempted.

4) It would be good to have some HTML documentation for this, including end-user documentation on how to set it up and use it. This should go in clang/docs.

5) Have you done any timings of this?

Thanks for working on this. It is very exciting to see progress on this project! I know a number of people who are very interested in this work,

-Chris

Hi Peter,

I agree with the comments that Eli and Chris made; the code duplication is something we want to avoid. Eli brought up an excellent point that key pieces of the driver should be factored off to a separate library, and I too have felt this way for some time. I think that even resolving all the various preprocessor and compiler options (e.g., -I, -D, etc.) that is needed to instantiate a Preprocessor should also be factored out of clang.cpp into a separate library.

I also agree with Chris's comments that separating the "distcc" driver from a regular clang driver is a good idea. That keeps the distcc implementation simpler, and potentially allows it to be used with multiple compilers (not just clang). I myself was fine with integrating the distcc support directly into the clang driver for a first pass, but because the distcc driver will not use all of the same functionality as the regular clang driver (and obviously do a few things that the regular clang driver does not), the better long term approach is to factor key components of the clang driver into libraries, make clang and distcc-clang separate executables, and simplify the logic for both.

One thing that hasn't emerged in this discussion is whether or not the clang distcc should interoperate with the traditional distcc implementation, or (a different but related issue) is whether we should require that the compiler itself be clang. One advantage of a clang-based distcc, independent of using clang to perform compilation, is that clang-distcc can do the source preprocessing itself without forking off a separate process (which is what the traditional distcc implementation does). This seems more like a good step one: build a distcc client that just takes care of preprocessing in-process, and see what kind of speedups you get over forking and preprocessing. Ultimately we're interested in speed and scalability, and small steps like these help guide the design.

Interoperability with other compilers doesn't mean we should limit the design of clang-distcc. We can certainly implement special functionality when multiple compiler "workers" are based on clang (e.g., serializing ASTs, special caching, etc.).

I like the concept of the NetSession class, although the issue of interoperability with existing distcc implementations is something that is worth discussing. Chris is right that the system-specific APIs, such as the use of sockets, should not be in header files. A PIMPL approach, like what we use for FileManager, would probably work well (where the system-specific stuff only appears in the .cpp file).

As for the clang server, both pthreads and sockets are system-specific APIs. We'll want a design that keeps the threading modeling separate from the code that processes a unit of work. This will allow us to tailor the implementation to use the best parallel computing primitives that are available on a specific architecture.

I'm also a little confused with the overall design. It looks like a client (a 'clang' process) connects to a server, sends the preprocessed source to the server, waits for the server to chew on the file, gets the processed output from the server, and then writes the output to disk. It appears that the client attempts to connect to different servers in a serial fashion, and then picks the first available server. Is this how traditional distcc works? (I actually don't know) It's a simple design, but it doesn't amend itself well to good load balancing as well as reducing the latencies in firing off compilation jobs (a bunch of connection attempts in serial fashion seems potentially disastrous for performance). This particular point isn't a criticism of your patch; what's there is fine to get things started. I'm not a distributed computing expert, but something akin to the Google MapReduce system (which has workers and controllers) seems more flexible for fault tolerance, load balancing, and so forth. This is certainly something worth discussing in a higher-level discussion of the overall design of the system.

A few comments inline.

Here is the final patch for clang to support network distributed compilation. (clang.patch file)
There is also the server part attached. (the tar.gz file)

Like the client, the server shouldn't have so much code copied from the Driver, and it certainly doesn't need to use all of the ASTConsumers in the regular Clang driver. General work (by anyone who is interested) on modularizing the driver will help make this much easier.

There 3 new files added to Driver directory: PrintPreprocessedOutputBuffer.cpp what is a modification of PrintPreprocessedOutput.cpp to support print text to a std::ostream.

I'm not certain why a separate version of PrintPreprocessedOutput was necessary. iostreams are slow, and writing to sockets using the FILE* abstraction is perfectly acceptable (via fdopen()).

Other new files: NetSession.h and NetSession.cpp which handles and contains all networking code (portable thin networking code).
There are some files changed, mostly to support saving its output to a std::stream. I've used that way to pass clang ASTConsumers data to an other computer via network.

There are 3 new option added to clang. The basic one is -distribute what enables distributed compilation. The other two are: -dist-preprocesslocally and -dist-serializelocally.
If the first one enabled then clang sends a preprocessed file for clangserver (a process in an other machine) to compile. In the second case the lexing and parsing is done locally and the built and serialized AST is sent to clangserver.

You can play with this using -dist-preprocesslocally because it is working.

Overall, I think this a good initial start! I think that next logical steps would be to look at both overall design as well as issues of code structure (addressing the comments on modularity, isolating various implementation details, etc.). Getting a few interesting performance timings would also be extremely useful to help shape some of those design decisions.

Incidentally, how well does the code work when the two processes (client and server) are on actually two different machines? Right now, the client always connects to "localhost". Getting performance timings when both client and server are on the same and different machines is also interesting to see how much things like network latency, etc., are a factor in the design. There may also be some correctness issues that are masked by having the client on server on the same machine.

Ted

Hi!
Here are the cutted patch files of indepenent changes:
TranslationUnit.patch - adds support (de)serialization (from)to (in memory) buffer.
Preprocessor.patch - adds a new method (getPredefines) needed for passing predefines list to clangserver.
ASTConsumers.patch - adds functions what supports capruing ASTConsumer’s output.
I haven’t attached clang.cpp, clang.h NetSession.cpp, NetSession.h patch because they place in source structure is unknown.
We have to refactor the Driver directory content first.

distclang and clangserver requirements (needs redesign or cleanup):

  • ASTConsumer library
  • system headers and defines setup and target setup library (all target dependent stuff)
  • capture output of PrintPreprocessedOutput.cpp. Ted: openfd is not acceptable, because we have to know the output size before we send it. (required by net packet header)
  • platform independent thread and network support. Should these added to llvm/Support or System library ?

We also have to decide if we want to be distcc compatible or not. And what’s the benefit of distcc compatibility?

I’ve also attached a skeleton of distclang documentation page (no info yet, it will be added after the initial commit).

Cheers,
Csaba Hruska

TranslationUnit.patch (5.08 KB)

Preprocessor.patch (571 Bytes)

ASTConsumers.patch (6.51 KB)

distclang_status.html (921 Bytes)

Hi!
Here are the cutted patch files of indepenent changes:

Great!

TranslationUnit.patch - adds support (de)serialization (from)to (in memory) buffer.

Ted, can you look at this?

Preprocessor.patch - adds a new method (getPredefines) needed for passing predefines list to clangserver.

Looks good to me, applied!

ASTConsumers.patch - adds functions what supports capruing ASTConsumer’s output.

-ASTConsumer *clang::CreateASTDumper() { return new ASTDumper(); }
+ASTConsumer clang::CreateASTDumper(std::ostream OS) { return new ASTDumper(OS); }

Please stay in 80 cols.

ASTConsumer* CreateHTMLPrinter(const std::string &OutFile, Diagnostic &D,
Preprocessor PP, PreprocessorFactory PPF);

+ASTConsumer* CreateHTMLPrinter(std::ostream &OutStream, Diagnostic &D,

  • Preprocessor PP, PreprocessorFactory PPF);

It’s unclear to me that duplicating each of these is really the right approach. Would it work to make the only entry points take an ostream, and then just have the caller create the ofstream etc and pass that in as the OutStream?

I’ve also attached a skeleton of distclang documentation page (no info yet, it will be added after the initial commit).

Applied.

I haven’t attached clang.cpp, clang.h NetSession.cpp, NetSession.h patch because they place in source structure is unknown.

We have to refactor the Driver directory content first.

Sounds good. Thank you for taking this in steps, it makes it far easier to review.

  • platform independent thread and network support. Should these added to llvm/Support or System library ?

The general rule is that really low level things that are highly system specific should go in libsystem. libsupport can contain target independent abstractions that may optionally be built on libsystem components.

We also have to decide if we want to be distcc compatible or not. And what’s the benefit of distcc compatibility?

I don’t really have an opinion or an answer here. :slight_smile: Are there benefits? anyone have an opinion?

  • capture output of PrintPreprocessedOutput.cpp. Ted: openfd is not acceptable, because we have to know the output size before we send it. (required by net packet header)

Lets talk about this one specifically. The “Simple buffered I/O” code in PrintPreprocessedOutput.cpp was written with extensive profiling, tuning and other tweaking and it is quite effective, though very ugly, code. The basic algorithm is that it buffers up chunks of 64K, then writes them out to disk with open/write/close. The API for it is very simple, and is basically built around not copying strings unnecessarily.

This is pretty simple stuff and would be very useful elsewhere in LLVM (e.g. the .s printer, the bc writer, …). It has been on my todo list for a long time, and seems required for the distcc project, to refactor this out to be a more general interface. I think something like this would be a great interface:

class outstream {

char *OutBufStart, *OutBufEnd, *OutBufCur;
outstream(const outstream&); // can’t be copied.
void operator=(const outstream&); // or assigned.
protected:
outstream() {} // may only be subclassed
virtual void FlushBuffer() = 0;
public:
virtual ~outstream();

void OutputChar(char c) { …}
void OutputString(const char *Ptr, unsigned Size) { … }
void Close();

static outstream* CreateFile(…);
static outstream* CreateOStream(std::ostream &O);

};

The basic idea here is that all the buffering of simple things (e.g. strings and chars) is inline and trivial, and the flushing happens via a virtual method that does the write or whatever else is needed. This is a very simple and useful API that is mostly target-independent (and should thus live in libsupport) but does rely on a few system specific things (which could live in libsystem).

Is this something you could take on? It should be pretty straight-forward. Once it exists, switching the -E printer over to it should be easy, and this will make it easy to get it to output to an std::ostream or whatever else is desired (you could even make an outstream for a socket or whatever, to avoid the extra std::ostream overhead).

-Chris

2008/7/10 Chris Lattner <clattner@apple.com>:

Hi!
Here are the cutted patch files of indepenent changes:

Great!

TranslationUnit.patch - adds support (de)serialization (from)to (in memory) buffer.

Ted, can you look at this?

Preprocessor.patch - adds a new method (getPredefines) needed for passing predefines list to clangserver.

Looks good to me, applied!

ASTConsumers.patch - adds functions what supports capruing ASTConsumer’s output.

-ASTConsumer *clang::CreateASTDumper() { return new ASTDumper(); }
+ASTConsumer clang::CreateASTDumper(std::ostream OS) { return new ASTDumper(OS); }

Please stay in 80 cols.

ASTConsumer* CreateHTMLPrinter(const std::string &OutFile, Diagnostic &D,
Preprocessor PP, PreprocessorFactory PPF);

+ASTConsumer* CreateHTMLPrinter(std::ostream &OutStream, Diagnostic &D,

  • Preprocessor PP, PreprocessorFactory PPF);

It’s unclear to me that duplicating each of these is really the right approach. Would it work to make the only entry points take an ostream, and then just have the caller create the ofstream etc and pass that in as the OutStream?

I’ve also attached a skeleton of distclang documentation page (no info yet, it will be added after the initial commit).

Applied.

I haven’t attached clang.cpp, clang.h NetSession.cpp, NetSession.h patch because they place in source structure is unknown.

We have to refactor the Driver directory content first.

Sounds good. Thank you for taking this in steps, it makes it far easier to review.

  • platform independent thread and network support. Should these added to llvm/Support or System library ?

The general rule is that really low level things that are highly system specific should go in libsystem. libsupport can contain target independent abstractions that may optionally be built on libsystem components.

We also have to decide if we want to be distcc compatible or not. And what’s the benefit of distcc compatibility?

I don’t really have an opinion or an answer here. :slight_smile: Are there benefits? anyone have an opinion?

  • capture output of PrintPreprocessedOutput.cpp. Ted: openfd is not acceptable, because we have to know the output size before we send it. (required by net packet header)

Lets talk about this one specifically. The “Simple buffered I/O” code in PrintPreprocessedOutput.cpp was written with extensive profiling, tuning and other tweaking and it is quite effective, though very ugly, code. The basic algorithm is that it buffers up chunks of 64K, then writes them out to disk with open/write/close. The API for it is very simple, and is basically built around not copying strings unnecessarily.

This is pretty simple stuff and would be very useful elsewhere in LLVM (e.g. the .s printer, the bc writer, …). It has been on my todo list for a long time, and seems required for the distcc project, to refactor this out to be a more general interface. I think something like this would be a great interface:

class outstream {

char *OutBufStart, *OutBufEnd, *OutBufCur;
outstream(const outstream&); // can’t be copied.
void operator=(const outstream&); // or assigned.
protected:
outstream() {} // may only be subclassed
virtual void FlushBuffer() = 0;
public:
virtual ~outstream();

void OutputChar(char c) { …}
void OutputString(const char *Ptr, unsigned Size) { … }
void Close();

static outstream* CreateFile(…);
static outstream* CreateOStream(std::ostream &O);

};

The basic idea here is that all the buffering of simple things (e.g. strings and chars) is inline and trivial, and the flushing happens via a virtual method that does the write or whatever else is needed. This is a very simple and useful API that is mostly target-independent (and should thus live in libsupport) but does rely on a few system specific things (which could live in libsystem).

Is this something you could take on? It should be pretty straight-forward. Once it exists, switching the -E printer over to it should be easy, and this will make it easy to get it to output to an std::ostream or whatever else is desired (you could even make an outstream for a socket or whatever, to avoid the extra std::ostream overhead).

This can solve my problem. And if we make the ASTConsumer library they should use the outstream class to write out they output (This will remove duplications from ASTConsumer.h functions).
There is one more requirement if we follow this. Most of ASTConsumers have one output stream or file, but there is StaticAnalysis what is make a couple of HTML files and because distcc static analysis is a big speedup, we must support this astconsumer, so we have to capture its output somehow. Maybe with some outstream manager interface class, one implementation for normal clang to support files and stdout and one implementation for clangserver for netstreams. The outstream class interface is ok for this except we cannot overload CreateFile and CreateOStream.

-Chris

Csaba

Is this something you could take on? It should be pretty straight-forward. Once it exists, switching the -E printer over to it should be easy, and this will make it easy to get it to output to an std::ostream or whatever else is desired (you could even make an outstream for a socket or whatever, to avoid the extra std::ostream overhead).

This can solve my problem. And if we make the ASTConsumer library they should use the outstream class to write out they output (This will remove duplications from ASTConsumer.h functions).

That makes sense to me!

There is one more requirement if we follow this. Most of ASTConsumers have one output stream or file, but there is StaticAnalysis what is make a couple of HTML files and because distcc static analysis is a big speedup, we must support this astconsumer, so we have to capture its output somehow.

I would worry about this one later. I think it would be good to focus on distributing the other ASTConsumers before the static analysis one.

Thanks for tackling this!

-Chris

I think the solution for the AnalysisConsumer is two steps:

1) Factor out the creation of the HTMLDiagnostics object used for HTML rendering out of AnalysisConsumer. Instead, the ctor for AnalysisConsumer takes a PathDiagnosticClient* (which in the regular driver is an HTMLDiagnostics object). I planned on doing this anyway.

2) In the distcc client, instead of creating an HTMLDiagnostics object, create a different PathDiagnosticClient object that just batches the diagnostics (this doesn't exist yet, but is easy to implement). The distcc client can then send the diagnostics back to the original client, and not have to worry about HTML rendering.

There will also probably be some ugly details, but ultimately the distcc client shouldn't care how the final analysis results are consumed by the end-user.

2008/7/10 Ted Kremenek <kremenek@apple.com>:

Applied!

http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20080707/006452.html

Suppose (for whatever reason) that a user wants to use icc/gcc to compile their sources instead of Clang. They can still benefit from the clang-distcc because of the performance advantage of Clang's preprocessor over using the preprocessor typically used by those compilers. Right now, distcc has to fork off a process just to preprocess a file, which also involves writing preprocessed files out to disk. A clang-distcc can have an integrated preprocessor that can be much faster, as it can be linked directly into the same executable (no fork and exec). Much of the scalability of distcc comes with how quickly you can send source code to the slave machines, so even if the end compiler is not Clang there is still a benefit to using clang-distcc over vanilla distcc.

Another advantage of allowing other compilers to work with clang-distcc is that you get a *staged* implementation. Instead of requiring Clang to preprocess, parse, and compile in order to get clang-distcc working, you only need the preprocessing to work to get the core distcc functionality in place. This also allows clang-distcc to be used to compile code that Clang cannot: e.g., the full richness of C++. It also allows one to play with the distributed computed core of of clang-distcc without waiting for end-to-end compilation functionality in Clang to be completed.

Incorporating more than just preprocessing into clang-distcc simply allows more optimizations, but the end-to-end functionality is already all there. For example, if you also parse the ASTs in clang-distcc , we can parse ASTs and serialize those over the wire (much more compact than preprocessed text). Those serialized ASTs could be compiled using Clang on the remote machine, or pretty-printed out and sent through the other compilers.

2008/7/11 Ted Kremenek <kremenek@apple.com>:

About supporting other compilers by distributed clang: Ted, please describe you ideas, because i have no idea what’s the point of this thing.

Suppose (for whatever reason) that a user wants to use icc/gcc to compile their sources instead of Clang. They can still benefit from the clang-distcc because of the performance advantage of Clang’s preprocessor over using the preprocessor typically used by those compilers. Right now, distcc has to fork off a process just to preprocess a file, which also involves writing preprocessed files out to disk. A clang-distcc can have an integrated preprocessor that can be much faster, as it can be linked directly into the same executable (no fork and exec). Much of the scalability of distcc comes with how quickly you can send source code to the slave machines, so even if the end compiler is not Clang there is still a benefit to using clang-distcc over vanilla distcc.

This looks for me a completly different project called: integrating clang preprocessor to (original) distcc because except the used preprocessor remains the original (command line options, network protocol and the whole functionality). This is not bad but i dont think that this is the primary task of this project. In my opinion the main goal to make as fast and scalable and portable(support crosscompialtion) compiler as we can using clang.

Another advantage of allowing other compilers to work with clang-distcc is that you get a staged implementation. Instead of requiring Clang to preprocess, parse, and compile in order to get clang-distcc working, you only need the preprocessing to work to get the core distcc functionality in place. This also allows clang-distcc to be used to compile code that Clang cannot: e.g., the full richness of C++. It also allows one to play with the distributed computed core of of clang-distcc without waiting for end-to-end compilation functionality in Clang to be completed.

Incorporating more than just preprocessing into clang-distcc simply allows more optimizations, but the end-to-end functionality is already all there. For example, if you also parse the ASTs in clang-distcc , we can parse ASTs and serialize those over the wire (much more compact than preprocessed text). Those serialized ASTs could be compiled using Clang on the remote machine, or pretty-printed out and sent through the other compilers.

This feature is supported by peter neumark’s inital version: see -dist-serializelocally command line option. We can improve it with some realtime compression later.

Of course our interest is making a great distcc for clang. My main meta-point was that by supporting other compilers for subtasks of the compilation work you can stage more of the implementation without relying on everything in Clang to be working perfectly. And despite Argiris great work on C++ parsing, full support for C++ compilation is a long way off, so supporting other compilers for this task might be a huge win for adoption of a new distcc.

Also, it's better to get measurements early on where the performance issues are because that effects fundamental aspects of the design. A rapid evolution of the distcc design with testing and real numbers will not only make the new distcc both faster and more scalable (per your goals) but also potentially more stable and usable at an earlier stage.

2008/7/11 Ted Kremenek <kremenek@apple.com>:

My main problem is that I cant imagine how we can support external compilers without messing up the current design and without reimplement the original distcc.

Having the option to use other compilers would obviously be only for a subset of the ASTConsumers. For example, for the ASTConsumers that implement compilation, -fsyntax-only, etc., a fork and exec instead of running the equivalent Clang consumer seems like a reasonable solution.

Distributed clang server's goal is support compilation natively without executing external programs.

Don't conflate implementation design with project goals. Performance and stability are the goals. How this is accomplished can very well involve executing external programs. Obviously we believe that much of the performance win will come from *not* doing this, but again supporting other compilers makes the system far more testable and encourages more adoption.

Or if you think to support the client side, (new clang distcc client with original distcc server case) then we have to reimplement distcc's protocol, including ssl support.

I see no reason to be compatible with the original distcc server or its protocol. I'm not fluent in the distributed computing design used by distcc, but I imagine there are more advanced techniques available. Good support for load balancing, efficient use of the network, etc., are all things that we want to be able to tackle.

My goal is not to re-implement the current distcc, it's to build a better one that takes advantage of the features of Clang (be it the integrated preprocessor, its compilation features, serialization of ASTs, or whatever). I'm a big believer in Clang's future, and having unrivaled distributed compilation performance is my hope hope for a clang-distcc. I believe that having limited support to do compilation with other compilers is complementary to this goal.

In this case its far much easy to patch the original distcc client to include and use clang's preprocessor natively.
I'd like to focus on clang specific stuff.

Absolutely.

Ex: I can imagine an AST mege function what can support precompiled headers, or can be used to store the already parsed and semachecked headers and reuse it later from a Database.

I see these as advanced features. Don't get me wrong; I'm excited about them too.

What I'm talking about is getting a good basic distcc working first. What has been implemented right now is a good start towards this goal, but the overall design needs to be more modular (e.g., using a library for ASTConsumers, etc.), the distributed computing core can go a lot further without worrying about things like PCH, etc. Using the clang preprocessor even when the employed compiler is not Clang is a good incremental step that requires much of the boilerplate stuff to be working, and allows experimentation with different performance knobs. I don't think what I'm talking about will take years of work; I just think it's the first logical step.