I’m planning to (attempt to) add in native distcc support to clang(per Chris Lattner’s suggestion). The feature would work like this:
The user on the “master” would do something like CC=“clang --distributed dist.conf” make -j 100(where 100 is the number of slave nodes)
Each clang process spawned by ‘make’ would attempt to connect to a socket(local UNIX socket, that is) with a path representative of the configuration file’s path. I was thinking maybe /tmp/clang/hash_of_path_of_dist.conf. The other end of the socket would be a listening clang server, which would have a central FileManager(and thus a central cache). This would save big on syscalls and disk accesses(Chris Lattner showed this in his LLVM 2.0 tech talk at Google) - around 3.3x preprocessing speedup! The server would be started up on demand. If it wasn’t possible to setup the socket, the fork()ed server will connect to the socket, and if the connection succeeds, will exit, because the socket has already been setup.
If the socket connection succeeds, clang would send over the command sent by the user to the process, which would preprocess the source, and send it out to a “slave” node along with all compiler options(is it worth performing basic compression on the preprocessed source beforehand?). The slave node would compile the source down to object code, and send it back to the “master” node(along with diagnostics). After this, the “central” process running on the host would send a “DONE” command back to the calling clang process via the UNIX socket, along with diagnostics generated during the build.
If the socket connection didn’t succeed, the server is started up on demand. clang would fork() off a server which would run in a loop that looks like this:
accept(); //non-blocking call.
//alternate between accepting local requests and network requests from slaves
send_preprocessed_source_and_command_line_args_to_slave(); //Round robin?
timeout(); //If certain time period(30s?) elapsed since last request,
// AND no requests received
// AND no files left to process from slaves
//Otherwise, loop back.
After the fork(), the parent clang process would attempt to reconnect to the server.
On the slave nodes, there would be multiple listening ‘clangd’ processes(presumably one per core, but this could be configurable) which would all be listening starting at some port(maybe 63000 for core 1, 63001 for core 2, etc.). The slave nodes would do everything from preprocessing to codegen - this means for the first version LTO wouldn’t be supported.
Each slave node would have a server thread which accepted requests and put them on a (lockless?) queue, and a “consumption” thread would keep pulling requests off the queue and processing them.
In terms of actual implementation, I’m very new to the clang codebase, and have very little familiarity with it. After poking around a bit, it seems like a good place for my new code to be called from would be “Compilation *Driver::BuildCompilation(int argc, const char **argv)” in Driver.cpp. Does this seem like a good place to “inject” distcc code?
Before I go off and (attempt to) implement this, I’d like to ask for any feedback about the implementation I’ve proposed. Am I doing something the “wrong way”? Would you do anything differently? Please, let me know any concerns you have - I’d hate to implement this and find out I did it the “wrong way” and have to re-implement all over again!