Reducing build times for single compilation unit

kloveless · December 4, 2023, 8:58pm

Hi,

Long-time C++ user, first-time compiler-investigator. I’m particularly interested in improving single compilation unit latency, so that the development loop of modifying a single .cc file, then testing it is much faster.

I’ve experimented a bit with -ftime-trace, which has been already quite useful for identifying a few issues (primarily header-only libraries, unsurprisingly, since I do not use PCH currently). My hopes is C++ modules will improve some of these issues (waiting for Bazel support).

But beyond those improvements, I’m trying to understand what else can be done.

A few notes/thoughts - with widely varying difficulty:

My code makes quite extensive use of co_routines and lambdas, and for the backend, results in CoroConditionalWrapper taking ~15% of the compilation time (~6 seconds on my machine). I’m wondering if there are tips to reduce this.
Some of my compilation units are relatively large, and I realize I could separate them to improve single-compilation unit build latency - but philosophically I’m curious to explore not needing to. This could come in a couple flavours:
- Parallelizing (more?) of the single-compilation builds (I saw the large discussion on it here) - so that parallelism does not need to rely on compilation-unit-level parallelism by splitting them apart
- Within-compilation-unit incremental compilation. To take an extreme (albeit simple) example, if a compilation unit has 1000 functions that do not call each other, if I modify one of them, I’d like to avoid most of the cost of compiling the remaining 999 functions (handwave AST diffing).
  - The extreme would be an entire project in a single compilation unit with reasonable incremental build times.

And in-case general stats are useful (to identify an outlier), I’m seeing -ftime-trace of:

Source - ~15% - modules/PCH/avoiding header-only libraries should help
PerformPendingInstantiations - ~25%
CodeGen Function - ~10%
CoroConditionalWrapper - ~15%
CodeGenPasses/OptModule (without explicitly enabling optimizations) - ~25%

For my larger files, my compilation times are ~35 seconds on my hardware.

Thoughts? Both curious about opinions on shorter-term improvements, and long-term reducing the need to split files. Also, if there are starter simple improvements for compilation latency, would be interested to see what’s involved - I did see Improve build times with Clang as a project area, but did not find specific proposals for it.

Thanks!

Kyle

Endill · December 4, 2023, 10:03pm

One of the possible directions would be to sprinkle more instrumentation around Clang, so that -ftime-trace output becomes more detailed.

ben.boeckel · December 5, 2023, 1:24am

I also recently did some timings of Clang for build perf reasons (splitting TUs is fine, but some expensive TUs have gnarly templating that is hard to factor the template instantiations to their own TU…). One thing that stood out to me was that OptFunction doesn’t seem to be done in a threaded way. I don’t know how much cross-function optimization happens that can’t use an internal directed graph to perform a build-like pass over, but it looked like some easy wins for inserting some parallelism into the compiler itself.

Of course, threading tools have interesting interactions with higher-level parallelism like make or ninja that also need to be considered as it is easy to end up with N² tasks being run instead of N when each tool thinks it can use nproc as a hint to its own parallelism level.

trass3r · December 12, 2023, 12:30am

When you explore PCH make sure to try -fpch-instantiate-templates and -fpch-codegen/-fpch-debuginfo .

Topic		Replies	Views
Pre-compiled Modules and (Failing) Optimization of Build Time Requirements Clang Frontend	7	332	August 31, 2022
Modules increased build times Using Clang	17	1920	July 10, 2023
Improving PCH build times by instantiating templates in the PCH Clang Frontend	2	89	January 15, 2020
Measuring compilation performance Clang Frontend	3	93	January 15, 2019
Designing a clang incremental compiler server Clang Frontend	4	101	June 18, 2010

Reducing build times for single compilation unit

Related Topics