Separate preprocess and compile: hack or feature?

Hi,

In the build system I am working on we are looking at always performing
the preprocessing and then C/C++ compilation as two separate clang/clang++
invocations. The main reason is support for distributed compilation but
see here[1] for other reasons.

I realize that tools like ccache/distcc have been relying on this for
a while (though see the 'direct' mode in ccache and 'pump' in distcc).
However, some compilers apparently do not support this (for example,
VC; see the above link for details).

So I wonder, in the context of Clang, if this is just a hack that
happens to work "for now" or if this is a feature that is expected
to continue to work?

Also, has anyone seen/heard of any real-world issues with compiling
preprocessed source code?

[1] https://www.reddit.com/r/cpp/comments/6abi99/rfc_issues_with_separate_preprocess_and_compile/

Thanks,
Boris

It is strongly recommented to *not* separate them. A lot of warnings are
sensitive to macros, i.e. will not trigger for patterns created by macro
use etc. A very basic example is

  if (FOO(x))

will not warn, but if FOO(x) expands to (x) as recommented, you get

  if ((x))

which will get a warning for double brackets without assignment. There
is the option of using the rewrite mode (-E -frewrite-includes), which
is somewhat of a compromise.

Joerg

Hi Joerg,

Joerg Sonnenberger writes:

There is the option of using the rewrite mode (-E -frewrite-includes),
which is somewhat of a compromise.

Thanks, looks like similar to GCC's -fdirectives-only except that one
also handles #ifdef, etc.

Do you know if there a way to achieve something similar in Clang? That
is, remove fragments that will be preprocessed out.

Thanks,
Boris

Most distributed build systems I know about end up writing their own custom preprocessor to very quickly discover which .h files are included by a cc file (you don’t need a full preprocessor for getting just that, and so you can be faster than clang -E), and then send .h and .cc files to the server based on content hashes, so that you don’t need to send the full preprocessed text, but can send source files before preprocessing. https://github.com/facebookarchive/warp was a somewhat recent example of this (but also, as you say, pump mode, and proprietary systems). (Your thread mentions that you do this for -M / /showIncludes, but you can just do this as part of regular compilation – not sure why you need this in a separate process?)

So while this doesn’t answer your question, I’d expect that you won’t need it, eventually :slight_smile:

Hi Nico,

Nico Weber <thakis@chromium.org> writes:

Most distributed build systems I know about end up writing their own custom
preprocessor to very quickly discover which .h files are included by a cc
file (you don't need a full preprocessor for getting just that, and so you
can be faster than clang -E), and then send .h and .cc files to the server
based on content hashes, so that you don't need to send the full
preprocessed text, but can send source files before preprocessing.
https://github.com/facebookarchive/warp was a somewhat recent example of
this (but also, as you say, pump mode, and proprietary systems).

One property that these build systems rely on is a very controlled
environment (e.g., single compiler, all hosts have exactly the same
headers, etc). I would much rather trade some speed for using standard
and robust tooling.

Also, I saw it mentioned (I think in the pump's documentation) that
local preprocessing is a lot less of an issue on modern hardware. I
bet SSDs made quite a difference.

Your thread mentions that you do this for -M / /showIncludes, but
you can just do this as part of regular compilation – not sure why
you need this in a separate process?

We do it this way to handle auto-generated headers.

Thanks for the feedback,
Boris

It's still an issue, because you will end up sending the pre-processed file
over the network. Time has shown that the transitive include closure of a
C++ file scales linearly with the size of the codebase, so the bigger the
project, the more time you spend sending 10MB .ii files over the wire.

As you say, pre-processing is more robust than trying to send each header
individually, set them up on the remote builder, and cache them, but it
does leave performance on the table.

As the maintainer of icecream, I have no interest in maintaining a separate preprocessor. It seems like a nightmare to maintain all the special cases, I also see modules is coming in the future C++ standard which I would have to understand. (and them modules 2 after that which might or might not be compatible enough for me)

Now if clang is willing to maintain a fast preprocessor that runs quick and spits out a list of files that I need to package up for my distributed build – I’m interested. (I figure you already have to maintain a working preprocessor which means a lot of potential bugs are fixed – but I realize this would mean a number of special cases and I don’t know if you want to maintain that)

I think the short version of my answer is: There are pitfalls, but it may work well enough for your purposes. You may want to give your users the option to combine the preprocess and compile into a single step.

In theory, having separate preprocess and compile steps should work. A preprocessed C file is just like a non-preprocessed C file that happens not to use any preprocessor features. The C preprocessor is also used for other purposes than preprocessing C code. For example, on Unix-like systems, it is not uncommon to run assembly programs through the C preprocessor. So there is reason to believe that the C preprocessor will continue to be available to run separate from the C compiler, and that the C compiler will continue to grok files that come out of the C preprocessor. Similarly for C++.

Others have already pointed out some cases where things aren’t quite that clean. I would like to add that, in my short experience working on Warp, I found that there is a lot of interdependency between the preprocessor and the compiler and the flags that are being passed to the compiler. For example, compilers like to define version macros and sometimes feature-test macros. Other macros end up being defined based on flags passed to the compiler. For example passing -mavx to gcc causes AVX to be defined. So if you want to separate the preprocess step from the compile step, you have to make sure that everything that affects the preprocessor output matches between the preprocessor invocation and the compiler invocation.

Bob

Hi Reid,

Reid Kleckner <rnk@google.com> writes:

It's still an issue, because you will end up sending the pre-processed file
over the network. Time has shown that the transitive include closure of a
C++ file scales linearly with the size of the codebase, so the bigger the
project, the more time you spend sending 10MB .ii files over the wire.

True. You can probably get /5 reduction by compressing it with something
cheap like lzo so that gives ~50 .ii files/sec over 1Gbps link.

Also, isn't there the same problem with getting the object files shipped
back? Here are some quick numbers I got from one of the "heavier" TU in
build2:

target.i 5MB (-E -frewrite-includes)
target.i.lzo 1MB

target.o 3MB
target.o.lzo 1MB

Thanks,
Boris

Hi Bob,

Bob Haarman <llvm@inglorion.net> writes:

You may want to give your users the option to combine the preprocess
and compile into a single step.

Yes, that's the current plan. The question is whether it should be on
or off by default. I think we will start with off and see what happens.

So if you want to separate the preprocess step from the compile step,
you have to make sure that everything that affects the preprocessor
output matches between the preprocessor invocation and the compiler
invocation.

Right, that's one of the main reasons we really don't want to go the
custom preprocessor route.

Boris

There are a couple of other interesting points in the design space. For builds without debug info, you can ship the preprocessed output and re-run it only if you get some compiler warnings. This has the nice effect that code that compiles without warnings will compile faster, which is a nice incentive.

The other is to run the preprocessor twice. A quick tests with a trivial Objective-C file that include a huge number of headers[1] took around 0.35 seconds to compile at -O2 and around 0.18 seconds to preprocess with -E -MMD -MD (which spits out the full dependency list). For any nontrivial source file, the difference between these two is likely to be much larger: if the cost of preprocessing is sufficiently small then it may cost less to run it twice than the speedup you get from distribution.

You might also look at the bmake meta mode work from Juniper, which uses a kernel module to track filesystem accesses to give a complete list of everything that a particular file depends on (including shared libraries linked into bits of the toolchain). A few people in the FreeBSD packaging team have been exploring using Capsicum to explicitly limit the files that the compiler can access and lazily pull them to the target system on demand (and to avoid accidental dependencies). If you have enough compile processes per node that some are CPU bound while the others are waiting for the network then this may be a better solution.

David

[1] Cocoa.h is huge:
#include <Cocoa/Cocoa.h>

int main(void)
{
  return 0;
}

Hi David,

David Chisnall <David.Chisnall@cl.cam.ac.uk> writes:

There are a couple of other interesting points in the design space. For
builds without debug info, you can ship the preprocessed output and
re-run it only if you get some compiler warnings. This has the nice
effect that code that compiles without warnings will compile faster,
which is a nice incentive.

Interesting idea though I believe one of the issues is that you no
longer get warnings if you compile the preprocessed output.

The other is to run the preprocessor twice.

Not sure how this helps. Are you talking about discovering the
included header set and shipping it along the source (and somehow
recreating the filesystem hierarchy on the remote so that everything
gets included properly)?

You might also look at the bmake meta mode work from Juniper, which
uses a kernel module to track filesystem accesses to give a complete
list of everything that a particular file depends on (including shared
libraries linked into bits of the toolchain). A few people in the
FreeBSD packaging team have been exploring using Capsicum to explicitly
limit the files that the compiler can access and lazily pull them to
the target system on demand (and to avoid accidental dependencies).

While interesting idea, all this will be very platform/compiler specific.
I am trying hard to avoid that.

Thanks,
Boris