JumboSupport: making unity builds easier in Clang

Hi,

I am a member of a small group of Chromium developers who are working on adding a unity build[1] setup to Chromium[2], in order to reduce the project’s long and ever-increasing compile times. We’re calling these “jumbo” builds, because this term is not as overloaded as “unity”.

We’re slowly making progress, but find that a lot of our time is spent renaming things in anonymous namespaces- it would be much simpler if it was possible to automatically treat these as if they were file-local. Jens Widell has put together a proof-of-concept which appears to work reasonably well, it consists of a clang plugin and a small clang patch:

https://github.com/jensl/llvm-project-20170507/tree/wip/jumbo-support/v1

https://github.com/jensl/llvm-project-20170507/commit/a00d5ce3f20bf1c7a41145be8b7a3a478df9935f

After building clang and the plugin, you generate jumbo source files that look like:

jumbo_source_1.cc:

#pragma jumbo

#include “real_source_file_1.cc”

#include “real_source_file_2.cc”

#include “real_source_file_3.cc”

Then, you compile something like this:

clang++ -c jumbo_source_1.cc -Xclang -load -Xclang lib/JumboSupport.so -Xclang -add-plugin -Xclang jumbo-support

The plugin gives unique names[3] to the anonymous namespaces without otherwise changing their semantics, and also #undef’s macros defined in each top-level source file before processing the next top-level source file. That way header files can still define macros that are used in multiple source files in the jumbo translation unit. Collisions between macros defined in header files and names used in other headers and other source files are still possible, but less likely.

To show how much these two changes help, here’s a patch to make Chromium’s network code build in jumbo mode:

https://chromium-review.googlesource.com/c/chromium/src/+/966523 (+352/-377 lines)

And here’s the corresponding patch using the proof-of-concept JumboSupport plugin:

https://chromium-review.googlesource.com/c/chromium/src/+/962062 (+53/-52 lines)

It seems clear that the version using the JumboSupport plugin would require less effort to create, review and merge into the codebase. We have a few other feature ideas, but these two changes seem to do most of the work for us.

So now we’re trying to figure out the best way forward- would a feature like this be welcome to the Clang project? And if so, how would you recommend that we go about it? We would prefer to do this in a way that does not require a locally patched Clang and could live with building a custom plugin, although implementing this entirely in Clang would be even better.

Thanks,

-Mostyn.

[1] If you’re not familiar with unity builds, the idea is to compile multiple source files per compiler invocation, reducing the overhead of processing header files (which can be surprisingly high). We do this by taking a list of the source files in a target and generating “jumbo” source files that #include multiple “real” source files, and then we feed these jumbo files to the compiler one at a time. This way, we don’t prevent the usage of valuable build tools like ccache and icecc that only support a single source file on the command line.

[2] Daniel Bratell has a summary of our progress jumbo-ifying the Chromium codebase here:

https://docs.google.com/document/d/19jGsZxh7DX8jkAKbL1nYBa5rcByUL2EeidnYsoXfsYQ/edit#

[3] The JumboSupport plugin assigns names to the anonymous namespaces in a given file: foo::(anonymous namespace)::bar is replaced with a symbol name of the form foo::_anonymous::bar where is unique to the file within the jumbo translation unit. Due to the internal linkage of these symbols, does not need to be unique across multiple object files/jumbo source files.

I haven’t looked at the patches in detail - but generally a jumbo build feels like a bit of a workaround & maybe there are better long-term solutions that might fit into the compiler. A few sort of background questions:

  • Have you tried Clang header modules ( https://clang.llvm.org/docs/Modules.html )? (explicit (granted, explicit might only be practical at the moment using Google’s internal version of Bazel - but you /might/ get some comparison numbers from a Google Chrome developer) and implicit)
  • The doc talks about maybe disabling jumbo builds for a single target for developer efficiency, with the risk that a header edit would maybe be worse for the developer than the jumbo build - this is where modules would help as well, since it doesn’t have this tradeoff property of two different dimensions of “more work” you have to choose from.
  • I was going to ask about the lack of parallelism in a jumbo build - but reading the doc I see it’s not a ‘full’ jumbo build, but chunkifying the build - so there’s still some/enough parallelism. Cool :slight_smile:

I haven't looked at the patches in detail - but generally a jumbo build
feels like a bit of a workaround & maybe there are better long-term
solutions that might fit into the compiler.

I feel the same way. However, modules need significant investment to get
going, and people do jumbo builds if we want it or not. WebKit is doing the
same thing for example
https://blogs.gnome.org/mcatanzaro/2018/02/17/on-compiling-webkit-now-twice-as-fast/

People will use this, if we want them to or not (I have some influence in
chrome land and wasn't able to talk them out of it, since it does provide
huge benefits), and the workarounds needed without compiler support are
gnarly.

So I think we might want to revisit our "you don't really want this" stance
on this topic we've had historically and instead try to make this work well.

(-a.thomason, emails bounce)

What sort of significant investment are you thinking of regarding modules - the build system support, I would imagine, wouldn’t be any less than the support being proposed here for jumbo builds, no?

But making header files modules-clean is some work, for sure. I’d imagine doing this the same way we’re kind of motivated to do it inside Google - provide the feature, then migrate the most impactful libraries. Teams/projects then have an incentive to cleanup/modularize code in whatever areas are the most important.

But, yeah, I see where you’re coming from - that maybe a tidy jumbo build support might not be too bad.

What sort of significant investment are you thinking of regarding modules
- the build system support, I would imagine, wouldn't be any less than the
support being proposed here for jumbo builds, no?

But making header files modules-clean is some work, for sure. I'd imagine
doing this the same way we're kind of motivated to do it inside Google -
provide the feature, then migrate the most impactful libraries.
Teams/projects <https://teams.googleplex.com/u/projects> then have an
incentive to cleanup/modularize code in whatever areas are the most
important.

Mostly this work. And it's best to start at the bottom of the dependency
stack, and we don't control our SDK headers everywhere. And even if we get
modules working, they're not pure win since they serialize the build graph
more, and touching a header in a module now requires rebuilds of all
targets depending on the module instead of just all all translation units
including that specific header (I think this can be fixed, but that too is
work.)

But, yeah, I see where you're coming from - that maybe a tidy jumbo build
support might not be too bad.

Yup.

What sort of significant investment are you thinking of regarding modules
- the build system support, I would imagine, wouldn't be any less than the
support being proposed here for jumbo builds, no?

But making header files modules-clean is some work, for sure. I'd imagine
doing this the same way we're kind of motivated to do it inside Google -
provide the feature, then migrate the most impactful libraries.
Teams/projects <https://teams.googleplex.com/u/projects> then have an
incentive to cleanup/modularize code in whatever areas are the most
important.

Mostly this work. And it's best to start at the bottom of the dependency
stack, and we don't control our SDK headers everywhere. And even if we get
modules working, they're not pure win since they serialize the build graph
more, and touching a header in a module now requires rebuilds of all
targets depending on the module instead of just all all translation units
including that specific header (I think this can be fixed, but that too is
work.)

In addition to these downsides, I suspect that modules would cause trouble
for tools like icecc which distribute preprocessed sources and a toolchain
caching system. Though I have not explored this in detail.

-Mostyn.

Google uses modules with the internal version of Bazel - but yes, it does require build system support (though I’d imagine this jumbo build support would require it too)

I haven't looked at the patches in detail - but generally a jumbo build
feels like a bit of a workaround & maybe there are better long-term
solutions that might fit into the compiler. A few sort of background
questions:

* Have you tried Clang header modules ( https://clang.llvm.org/docs/
Modules.html )? (explicit (granted, explicit might only be practical at
the moment using Google's internal version of Bazel - but you /might/ get
some comparison numbers from a Google Chrome developer) and implicit)
  * The doc talks about maybe disabling jumbo builds for a single target
for developer efficiency, with the risk that a header edit would maybe be
worse for the developer than the jumbo build - this is where modules would
help as well, since it doesn't have this tradeoff property of two different
dimensions of "more work" you have to choose from.

There are ways to minimise this- an earlier proprietary jumbo build system
used at Opera would detect when you're modifying and rebuilding files, and
compile these in "normal" mode. This gave fast full/clean build times but
also short modify+rebuild times. We have not attempted to implement this
in the Chromium Jumbo build configuration.

* I was going to ask about the lack of parallelism in a jumbo build - but
reading the doc I see it's not a 'full' jumbo build, but chunkifying the
build - so there's still some/enough parallelism. Cool :slight_smile:

I have heard rumours of some codebases in the games industry using a single
jumbo source file for the entire build, but this is generally considered to
be taking things too far and not our intended use case.

The size of Chromium's jumbo compilation units is tunable- you can simply
#include fewer real source files per jumbo source file- the bigger your
build farm is, the smaller you want this number to be. The optimal setup
depends on things like the shape of the dependency graph and the relative
costs of the original source files. IIRC we currently only have build-wide
"jumbo_file_merge_limit" setting, though that might have changed since I
last looked (V8 would benefit from this, since its source files compile
more slowly than most Chromium source files).

-Mostyn.

Building that kind of infrastructure seems like a pretty big hammer compared to modularizing the codebase… (maybe still less work - but a lot of work to workaround things & produce some rather quirky behavior (in terms of how the build functions based on looking at exactly how the source files have changed & changing the build action graph depending on that) - but enough that I’d be inclined to reconsider going in the modular direction again)

Ah, my understanding was that jumbo builds were often/mainly used for optimized builds to get cross-module optimizations (LTO-esque) & so it’d be likely to be the whole program.

I haven't looked at the patches in detail - but generally a jumbo build
feels like a bit of a workaround & maybe there are better long-term
solutions that might fit into the compiler. A few sort of background
questions:

* Have you tried Clang header modules ( https://clang.llvm.org/docs/
Modules.html )? (explicit (granted, explicit might only be practical at
the moment using Google's internal version of Bazel - but you /might/ get
some comparison numbers from a Google Chrome developer) and implicit)
  * The doc talks about maybe disabling jumbo builds for a single target
for developer efficiency, with the risk that a header edit would maybe be
worse for the developer than the jumbo build - this is where modules would
help as well, since it doesn't have this tradeoff property of two different
dimensions of "more work" you have to choose from.

There are ways to minimise this- an earlier proprietary jumbo build
system used at Opera would detect when you're modifying and rebuilding
files, and compile these in "normal" mode. This gave fast full/clean build
times but also short modify+rebuild times. We have not attempted to
implement this in the Chromium Jumbo build configuration.

Building that kind of infrastructure seems like a pretty big hammer
compared to modularizing the codebase...

Modularizing the codebase doesn't give you the same build time impact,
linearizes your build more, and slows down incremental builds. Even if it
wasn't a lot more work to get modules going, it's not completely clear to
me that that would address the use case that the people working on the
jumbo build have.

Not sure I follow - it partially linearizes (as you say, due to the module dependency rather than header dependency issue), as does the jumbo build.

Compared to a traditional build? I wouldn’t think so (I mean, yes, reading/writing modules has some overhead - but also some gains) on average. I’d expect slower builds if you modify a header at the very base of the dependency (the STL), but beyond that I would’ve thought the reading/writing modules overhead would be saved by reusing modules for infrequently modified files (like the STL).

(wonder what the combination would be like - modularizing headers, and also jumbo-ifying .cpp files together… - whether there’s much to be saved in the reading modules part of the work, reading them in fewer times - that gets into some of the ideas of compiler as a service I guess)

I haven't looked at the patches in detail - but generally a jumbo
build feels like a bit of a workaround & maybe there are better long-term
solutions that might fit into the compiler. A few sort of background
questions:

* Have you tried Clang header modules ( https://clang.llvm.org/docs/
Modules.html )? (explicit (granted, explicit might only be practical
at the moment using Google's internal version of Bazel - but you /might/
get some comparison numbers from a Google Chrome developer) and implicit)
  * The doc talks about maybe disabling jumbo builds for a single
target for developer efficiency, with the risk that a header edit would
maybe be worse for the developer than the jumbo build - this is where
modules would help as well, since it doesn't have this tradeoff property of
two different dimensions of "more work" you have to choose from.

There are ways to minimise this- an earlier proprietary jumbo build
system used at Opera would detect when you're modifying and rebuilding
files, and compile these in "normal" mode. This gave fast full/clean build
times but also short modify+rebuild times. We have not attempted to
implement this in the Chromium Jumbo build configuration.

Building that kind of infrastructure seems like a pretty big hammer
compared to modularizing the codebase...

Modularizing the codebase doesn't give you the same build time impact,
linearizes your build more,

Not sure I follow - it partially linearizes (as you say, due to the module
dependency rather than header dependency issue), as does the jumbo build.

The jumbo build just needs to append a bunch of files, that's fast.
Compiling a module isn't.

and slows down incremental builds.

Compared to a traditional build? I wouldn't think so (I mean, yes,
reading/writing modules has some overhead - but also some gains) on
average. I'd expect slower builds if you modify a header at the very base
of the dependency (the STL), but beyond that I would've thought the
reading/writing modules overhead would be saved by reusing modules for
infrequently modified files (like the STL).

Say you touch some header foo.h. Previously, you needed to rebuild all cc
files including it. Now you need to instead rebuild the module, and since
the module has changed you now need to rebuild all cc files using any
header in the module, not just the users of foo.h. That's potentially way
more cc files.

I’m also revisiting my position on this. We’ve discussed unity build support in the past (I think Ubisoft proposed it), and at the time I felt that it was very backwards-facing. It’s not a long term solution to reducing the overall cost of C++ compilation, and it can lead to creeping transitive dependencies between C++ files.

However, more than a year later, we have not produced a solution that is as easy to deploy and as compelling as unity builds are today. I think we need to seriously weigh cost of adding features to support unity/jumbo builds. The initial patches necessary to get things off the ground look small and relatively low-maintenance. They may be just the tip of the iceberg, so we need to gather more input, but I think it’s worth a try.

FYI, my availability this week is low, so I don’t expect to be able to participate more in this thread.

I think it was Churchill who said that “Jumbo builds are the worst form of build optimization, except for all the others.”

The simple reality is that Chromium has a lot of translation units and there is a minimum (often a large minimum) cost to compiling each of these translation units. In order to make Chromium build times closer to reasonable we either need to dramatically reduce the number of translation units (jumbo) or dramatically reduce the cost of compiling each of these translation units.

Jumbo builds have some practical and philosophical problems but ultimately they work, and they work now, and they will get even better. Speculative efforts to reduce the per-translation-unit are, well, speculative. If and when modules improve our build times we can assess whether they should replace or coexist with jumbo builds, but until they are proven we need to proceed with what works today.

Making jumbo builds more maintainable through compiler changes appears to have an excellent cost/benefit ratio, and does nothing to stop us from pursuing other options in parallel. We will always continue to do non-jumbo builds, so our code will not be compromised by this effort.

I think it was Churchill who said that “Jumbo builds are the worst form of build optimization, except for all the others.”

The simple reality is that Chromium has a lot of translation units and there is a minimum (often a large minimum) cost to compiling each of these translation units. In order to make Chromium build times closer to reasonable we either need to dramatically reduce the number of translation units (jumbo) or dramatically reduce the cost of compiling each of these translation units.

Jumbo builds have some practical and philosophical problems but ultimately they work, and they work now, and they will get even better. Speculative efforts to reduce the per-translation-unit are, well, speculative.

Not quite sure what you mean here - Clang header modules have been deployed across a variety of users (Apple initially shipped them to reduce compile time for Apple developers, especially around Cocoa.h, as I understand it - then Google’s used them internally for protobufs & the like for a few years now). So it’s not exactly speculative that these features exist, are implemented fairly robustly, and do offer improvements.

It sounds like I was exaggerating how speculative modules are. On the other hand, until they are in use and demonstrating compile-time reductions in Chromium they are still speculative - we don’t know how big a reduction they will give, how much work will be required to use them, how they will interact with goma builds, etc.

I haven't looked at the patches in detail - but generally a jumbo
build feels like a bit of a workaround & maybe there are better long-term
solutions that might fit into the compiler. A few sort of background
questions:

* Have you tried Clang header modules (
https://clang.llvm.org/docs/Modules.html )? (explicit (granted,
explicit might only be practical at the moment using Google's internal
version of Bazel - but you /might/ get some comparison numbers from a
Google Chrome developer) and implicit)
  * The doc talks about maybe disabling jumbo builds for a single
target for developer efficiency, with the risk that a header edit would
maybe be worse for the developer than the jumbo build - this is where
modules would help as well, since it doesn't have this tradeoff property of
two different dimensions of "more work" you have to choose from.

There are ways to minimise this- an earlier proprietary jumbo build
system used at Opera would detect when you're modifying and rebuilding
files, and compile these in "normal" mode. This gave fast full/clean build
times but also short modify+rebuild times. We have not attempted to
implement this in the Chromium Jumbo build configuration.

Building that kind of infrastructure seems like a pretty big hammer
compared to modularizing the codebase...

Modularizing the codebase doesn't give you the same build time impact,
linearizes your build more,

Not sure I follow - it partially linearizes (as you say, due to the
module dependency rather than header dependency issue), as does the jumbo
build.

The jumbo build just needs to append a bunch of files, that's fast.
Compiling a module isn't.

Well, compiling a module is just appending a bunch of headers and compiling
them. It's just at a different layer of the graph.

and slows down incremental builds.

Compared to a traditional build? I wouldn't think so (I mean, yes,
reading/writing modules has some overhead - but also some gains) on
average. I'd expect slower builds if you modify a header at the very base
of the dependency (the STL), but beyond that I would've thought the
reading/writing modules overhead would be saved by reusing modules for
infrequently modified files (like the STL).

Say you touch some header foo.h. Previously, you needed to rebuild all cc
files including it. Now you need to instead rebuild the module, and since
the module has changed you now need to rebuild all cc files using any
header in the module, not just the users of foo.h. That's potentially way
more cc files.

But say you touch some source file foo.cc. Previously, and with modules,
you just need to rebuild that cc file. With a unity build, you now instead
need to rebuild the concatenation of that .cc file and a bunch of others.
That's also potentially way more cc files. :slight_smile:

But measurements beat speculation here.

(wonder what the combination would be like - modularizing headers, and

also jumbo-ifying .cpp files together... - whether there's much to be saved
in the reading modules part of the work, reading them in fewer times - that
gets into some of the ideas of compiler as a service I guess)

Even if it wasn't a lot more work to get modules going, it's not
completely clear to me that that would address the use case that the people
working on the jumbo build have.

(maybe still less work - but a lot of work to workaround things &
produce some rather quirky behavior (in terms of how the build functions
based on looking at exactly how the source files have changed & changing
the build action graph depending on that) - but enough that I'd be inclined
to reconsider going in the modular direction again)

* I was going to ask about the lack of parallelism in a jumbo build -
but reading the doc I see it's not a 'full' jumbo build, but chunkifying
the build - so there's still some/enough parallelism. Cool :slight_smile:

I have heard rumours of some codebases in the games industry using a
single jumbo source file for the entire build, but this is generally
considered to be taking things too far and not our intended use case.

Ah, my understanding was that jumbo builds were often/mainly used for
optimized builds to get cross-module optimizations (LTO-esque) & so it'd be
likely to be the whole program.

The size of Chromium's jumbo compilation units is tunable- you can
simply #include fewer real source files per jumbo source file- the bigger
your build farm is, the smaller you want this number to be. The optimal
setup depends on things like the shape of the dependency graph and the
relative costs of the original source files. IIRC we currently only have
build-wide "jumbo_file_merge_limit" setting, though that might have changed
since I last looked (V8 would benefit from this, since its source files
compile more slowly than most Chromium source files).

-Mostyn.

*Hi,I am a member of a small group of Chromium developers who are
working on adding a unity build[1] setup to Chromium[2], in order to reduce
the project's long and ever-increasing compile times. We're calling these
"jumbo" builds, because this term is not as overloaded as "unity".We're
slowly making progress, but find that a lot of our time is spent renaming
things in anonymous namespaces- it would be much simpler if it was possible
to automatically treat these as if they were file-local. Jens Widell has
put together a proof-of-concept which appears to work reasonably well, it
consists of a clang plugin and a small clang
patch:https://github.com/jensl/llvm-project-20170507/tree/wip/jumbo-support/v1
<https://github.com/jensl/llvm-project-20170507/tree/wip/jumbo-support/v1>https://github.com/jensl/llvm-project-20170507/commit/a00d5ce3f20bf1c7a41145be8b7a3a478df9935f
<https://github.com/jensl/llvm-project-20170507/commit/a00d5ce3f20bf1c7a41145be8b7a3a478df9935f>After
building clang and the plugin, you generate jumbo source files that look
like:jumbo_source_1.cc:#pragma jumbo#include
"real_source_file_1.cc"#include "real_source_file_2.cc"#include
"real_source_file_3.cc"Then, you compile something like this:clang++ -c
jumbo_source_1.cc -Xclang -load -Xclang lib/JumboSupport.so -Xclang
-add-plugin -Xclang jumbo-supportThe plugin gives unique names[3] to the
anonymous namespaces without otherwise changing their semantics, and also
#undef's macros defined in each top-level source file before processing the
next top-level source file. That way header files can still define macros
that are used in multiple source files in the jumbo translation unit.
Collisions between macros defined in header files and names used in other
headers and other source files are still possible, but less likely.To show
how much these two changes help, here's a patch to make Chromium's network
code build in jumbo
mode:https://chromium-review.googlesource.com/c/chromium/src/+/966523
<https://chromium-review.googlesource.com/c/chromium/src/+/966523>
(+352/-377 lines)And here's the corresponding patch using the
proof-of-concept JumboSupport
plugin:https://chromium-review.googlesource.com/c/chromium/src/+/962062
<https://chromium-review.googlesource.com/c/chromium/src/+/962062> (+53/-52
lines)It seems clear that the version using the JumboSupport plugin would
require less effort to create, review and merge into the codebase. We have
a few other feature ideas, but these two changes seem to do most of the
work for us.So now we're trying to figure out the best way forward- would a
feature like this be welcome to the Clang project? And if so, how would
you recommend that we go about it? We would prefer to do this in a way that
does not require a locally patched Clang and could live with building a
custom plugin, although implementing this entirely in Clang would be even
better.*

I've been thinking about ways to get the benefits of unity builds without
the semantic changes. With the functionality we introduced for
-fmodules-local-submodule-visibility, we have the abililty to parse one
file, then make it "invisible" and parse another file, skipping all the
repeated parts from the two parses, which would give us some (maybe most)
of the performance benefit of unity builds without the semantic changes.
(This is not quite as good as a unity build: you'd still repeatedly lex and
preprocess the files #included into both source files. We could implicitly
treat header files with include guards as being "modular" to get the
performance back, but then you also get back some of the semantic changes.)

you’d still repeatedly lex and preprocess the files #included into both source files

That is where the high cost of translation units comes from, so I don’t think the ‘abililty to parse one file, then make it “invisible”’ will help build performance. To be clear, the per-translation unit cost is not from firing up the compiler, it’s from parsing/lexing/preprocessing millions of lines of header files, and associated code generation.

With a unity build, you now instead need to rebuild the concatenation of that .cc file and a bunch of others.

True. But a pragmatic unity/jumbo build system understands and manages this risk, by keeping the number of source files that are #included down to a reasonable level. Even when jumbo concatenates 50 source files together the compilation cost for that blob is far less than 50 times the cost of compiling one file. It’s an issue, to be sure, but not a fatal flaw.

As a data point: Inside Chromium the time to process headers is typically 80-95% of the total time processing a cc file. Maybe not surprising when the headers are around 240k lines, and the cc files themselves 50-500 lines. Most of the compile time remained even with precompiled headers on Windows.

I’ve heard (hearsay, I admit) from profiling that it seems the single largest time consumer in clang is template instantiation, something I assume can’t easily be prepared in advance.

One example is chromium’s chrome/browser/browser target which is 732 files that normally need 6220 CPU seconds to compile, average 8,5 seconds per file. All combined together gives a single translation unit that takes 400 seconds to compile, a mere 0.54 seconds on average per file. That indicates that about 8 seconds per compiled file is related to the processing of headers.

Our default jumbo configuration makes groups of 8 (when having access to Google’s internal distributed compilation system) or 50 (for single computer compilation) files which loses half or more of the potential speedup for a much faster single-file turnaround and better use of parallel hardware.

To comment on some earlier things mentioned: The value of jumbo is in the results, the massive compile time speedup. It can also be used for “cheap” full program/module optimization (I measured a 1-2% speedup on Speedometer with a jumbo build, along with a 2% increase of the binary size, all compared to a normal non-PGO/LTO/FPO build) and it reduces disk usage and makes linking faster, but the main point for us is that it makes compilations so much faster.

The main downside is that you have to slightly adjust the source where “slightly” in a code base of 10-20 million lines can be noticeable. That is where this proposed clang feature enters. It would both reduce the initial amount of changes needed, and it removes the distraction that it would be for a developer to have to consider other code in other files when writing new code.

/Daniel