Who wants faster LLVM/Clang builds?

Hi,

Recently I’ve done some experiments on the LLVM/Clang code and discovered that many of our source files often include unnecessary header files. I wrote a simple tool that eliminates redundant includes and estimates benefits of doing it, and the results were quite nice: for some files we were able to save 90% of compile time! I think we want to apply some of the cleanups I found, but I’m not sure how to better do it: the total patches are 8k lines of code for LLVM and 3k lines of code for clang (I’ll attach them for reference). My suggestion would be that people take a look at the list of changed files and pick the changes for the piece of code they are working on if the changes look sane (the changes do need some checking before committing). Does it sound like a good idea? I’d appreciate any feedback on what can we do here.

The list of files for which removing redundant headers improved compile time (the numbers are compile time in seconds for a Debug build):

LLVM top 10

Filename Old New Delta
lib/CodeGen/GlobalISel/GlobalISel.cpp 0.26 0.02 -91.9%
lib/MC/MCLabel.cpp 0.19 0.02 -88.2%
tools/llvm-readobj/ObjDumper.cpp 0.43 0.10 -76.5%
lib/MC/MCWinEH.cpp 0.51 0.13 -74.3%
lib/Transforms/Vectorize/Vectorize.cpp 0.72 0.29 -59.7%
tools/llvm-diff/DiffLog.cpp 0.58 0.26 -54.6%
lib/Target/ARM/MCTargetDesc/ARMMachORelocationInfo.cpp 0.46 0.26 -44.1%
lib/DebugInfo/DWARF/DWARFExpression.cpp 0.68 0.38 -43.3%
lib/LTO/LTOModule.cpp 2.25 1.33 -41.1%
lib/Target/TargetMachine.cpp 1.76 1.10 -37.8%

Full list:

llvm.txt (25 KB)

clang.txt (10.3 KB)

llvm_redundant_headers.patch (287 KB)

clang_redundant_headers.patch (114 KB)

I, for one, want faster builds.

Beyond that though, this seems like obvious goodness to reduce coupling in the codebase. I’ve only skimmed the patch, but this seems like a clearly amazingly great ideas. Did you use the IWYU tool or something else?

-Chris

To find which header files could be removed it scanned the file for “#include” lines and tried to remove them one by one (checking if the file still compiles after the removal). When there were no more include lines to remove, we verified the change with ninja+ninja check.

It looks like this makes us rely heavily on transitive header includes – is that right?

Specifically what I mean is, suppose foo.cc uses class1 and class2. class2.h #includes “class1.h”. If foo.cc had a #include for class1.h and class2.h, this tool will remove the include of class1.h. Is that right?

If so, seems like probably more aggressive than we want to be, because it makes our codebase fragile: If someone removes class1.h from class2.h, it may break foo.cc.

I’m sure we can find many situations like this today, but this approach seems it would make that situation much more common.

IWYU would avoid this pitfall, although I know IWYU has its own problems. Alternatively maybe if the tool only removed headers whose removal significantly decreased the preprocessed size of the file, we’d know that we actually removed “something interesting”.

-Justin

I, for one, want faster builds.

Good, we have at least two people on board then :slight_smile:

Beyond that though, this seems like obvious goodness to reduce coupling in the codebase. I’ve only skimmed the patch, but this seems like a clearly amazingly great ideas. Did you use the IWYU tool or something else?

I tried using it, but while it gave me some interesting hints, I stopped using it when the build broke after the proposed changes. Probably, I could’ve figured out what went wrong and made it work, but I also noticed that the proposed by IWYU changes are much more intrusive - i.e. it tries to forward declare symbols, analyze include chains and leave only the last include etc (and it only would work if one applies the changes to all affected files at once). While these all are good ideas, I’d expect some objections against mechanical application of such clean-ups. Plus the patch would be much less obvious.

So, instead, I implemented a light-weight version of it that just tries to remove #include lines, making the footprint of this cleanup local. As a result, here I get a patch consisting of many independent changes (it can be applied per file and everything should work fine). More details of how that was done is in “Methodology” section in the end of the original e-mail.

Thanks,
Michael

To find which header files could be removed it scanned the file for “#include” lines and tried to remove them one by one (checking if the file still compiles after the removal). When there were no more include lines to remove, we verified the change with ninja+ninja check.

It looks like this makes us rely heavily on transitive header includes – is that right?

Correct.

Specifically what I mean is, suppose foo.cc uses class1 and class2. class2.h #includes “class1.h”. If foo.cc had a #include for class1.h and class2.h, this tool will remove the include of class1.h. Is that right?

Correct.

If so, seems like probably more aggressive than we want to be, because it makes our codebase fragile: If someone removes class1.h from class2.h, it may break foo.cc.

I completely agree, that’s why I intended this patch to be a collection of hints for real people rather than something we want to commit as-is. I think I can make the tool check the patch again and try to restore #include lines that don’t contribute to the speedup.

I’m sure we can find many situations like this today, but this approach seems it would make that situation much more common.

IWYU would avoid this pitfall, although I know IWYU has its own problems. Alternatively maybe if the tool only removed headers whose removal significantly decreased the preprocessed size of the file, we’d know that we actually removed "something interesting”.

That’s a good idea. Thanks for the feedback!

Michael

Got it. I’ve never used IWYU, but those ideas sound like they could lead to further improvement. It sounds like the right thing is to get your basic patch set in, then explore where IWYU can do anything useful above and beyond that.

If the gains are immeasurable and a PITA to work with, then it can easily be dropped. If it leads to a few significant wins, it could be worth playing with.

In any case, thank you for working on this!

-Chris

Hey all,

IWYU maintainer here. I wanted to make a small observation.

Surprisingly, IWYU will most often add includes to a reasonably well-factored codebase, and this ties into Chris’ comment:

Beyond that though, this seems like obvious
goodness to reduce coupling in the codebase.

Just blindly removing includes will probably increase coupling, not reduce it, because it optimizes for transitive dependencies.

So if a refactor makes it possible to remove an include in a faraway upstream header, that will potentially break lots of code.

One of IWYU’s benefits, imho, is that it works against this. But it’s a much harder problem, which is why it’s less immediately effective :slight_smile:

Good to see some numbers below!

  • Kim

A few comments:

First, your old and new numbers are giving time to a hundredth of a second, yet I generally see a variation of 0.1-0.2 seconds between runs. How many times did you run each one and what is the Welch’s t-test delta?

Second, most of the changes are a few tenths of a second. From the whole of your top 10, it looks as if you’re shaving around 3 seconds off the total build. Presumably this is CPU time, so on a -j8 parallel build you’re saving under a second for something that takes a few minutes. That doesn’t mean it’s not worth doing, but it does mean that if it has any impact on code maintainability then it probably isn’t.

Third, it’s not clear from your measurements whether you were doing a modules build or not. The cost of redundant includes should be a lot lower if we’re using modules.

Finally, redundant includes can have a counterintuitive positive benefit on overall build times by warming the buffer cache (pulling in an include early when the dependency graph for the build is wide lets another compile job proceed while you’re waiting for disk, whereas pulling it in later when you’ve exhausted the parallelism in the build process causes a synchronous stall). Did you measure the deltas in overall build time for a cold and warm disk cache after making these changes?

David

It’s also likely that a lot of ‘#include “foo.h”’ can be replaced with ‘class foo;’

Especially in the transitive inclusion case, instead of removing the #include entirely.

The list of files for which removing redundant headers improved compile time (the numbers are compile time in seconds for a Debug build):

A few comments:

First, your old and new numbers are giving time to a hundredth of a second, yet I generally see a variation of 0.1-0.2 seconds between runs. How many times did you run each one and what is the Welch’s t-test delta?

Second, most of the changes are a few tenths of a second. From the whole of your top 10, it looks as if you’re shaving around 3 seconds off the total build. Presumably this is CPU time, so on a -j8 parallel build you’re saving under a second for something that takes a few minutes. That doesn’t mean it’s not worth doing, but it does mean that if it has any impact on code maintainability then it probably isn’t.

I’m not arguing against any of your points, but also keep in mind that reducing #includes can reduce the number of files built in an incremental build when you change a header.

-Chris

In my experience, a lot of time is spent on optimizing the code (assuming it’s not a “-O0” build). Also redundant includes are largely fixed by header guards, and I believe Clang [and gcc as well as MS Compilers, and probably most others too] have an include guards-cache that determines that “we’ve already included foo.h, and it has include guards around the whole actual content of the file, so we can just skip it”.

So I’m slightly dubious as to this being an efficient way of significantly reducing the total compilation time for the overall project - even if there are SOME cases where there is a significant improvement in a single file. The total time for a clean build [in wall-clock-time, not CPU-time] should be measured, making sure that there is enough memory. Doing a run of, say, five complete builds of the same thing [with suitable “clean” between to redo the whole build], take away the worst and the best, and perhaps also “modify one of the more common header files” (llvm/IR/Type.h for example) and build again.

As Chris says, a benefit of “don’t rebuild so much when editing a header file” is clearly a good benefit.

The list of files for which removing redundant headers improved compile time (the numbers are compile time in seconds for a Debug build):

Thanks for the comments! I agree with all the points you brought up, and actually I thought about most of them before, please find my comments below:

A few comments:

First, your old and new numbers are giving time to a hundredth of a second, yet I generally see a variation of 0.1-0.2 seconds between runs. How many times did you run each one and what is the Welch’s t-test delta?

I measured the time 5 times for both ‘old’ and ‘new’ versions and took the minimum. I don’t have variance and other statistics at hand now, but I’m pretty confident in the ballpark of the numbers. I.e. If the number goes from 1.7s to 1.1s, I’m pretty sure there is a win (even if the ‘true’ numbers would be 1.6 and 1.2 respectively).

Second, most of the changes are a few tenths of a second. From the whole of your top 10, it looks as if you’re shaving around 3 seconds off the total build. Presumably this is CPU time, so on a -j8 parallel build you’re saving under a second for something that takes a few minutes. That doesn’t mean it’s not worth doing, but it does mean that if it has any impact on code maintainability then it probably isn’t.

Correct. That’s exactly why I aimed for small local changes that would be a strict improvement over what we have now. Also, while the effect on the total build time isn’t big, incremental builds can see much more significant gains from it.

Third, it’s not clear from your measurements whether you were doing a modules build or not. The cost of redundant includes should be a lot lower if we’re using modules.

No, I didn’t do modules, and yes, you are correct (and this is another argument for probably not spending more efforts here).

Finally, redundant includes can have a counterintuitive positive benefit on overall build times by warming the buffer cache (pulling in an include early when the dependency graph for the build is wide lets another compile job proceed while you’re waiting for disk, whereas pulling it in later when you’ve exhausted the parallelism in the build process causes a synchronous stall). Did you measure the deltas in overall build time for a cold and warm disk cache after making these changes?

No, I didn’t measure it with cold and warm caches. For the entire build (-j8) I usually saw a win of ~1%, but that number is even more noisier, so I didn’t report it. However, I’m a bit sceptic about keeping redundant includes to prefetch them to cache. There is no guarantee that a file with the redundant include will be compiled first, and if it’s compiled after we’re just wasting our time. There are also other files that actually need this header and will move it to the cache before our use, so all in all I don’t think the positive effect of having redundant includes outweigh its cost.

Thanks,
Michael

It’s also likely that a lot of ‘#include “foo.h”’ can be replaced with ‘class foo;’

Especially in the transitive inclusion case, instead of removing the #include entirely.

But if we don’t use "class foo”, why do we want to forward declare it?

Speaking of forward declarations, IWYU is intelligently computing proper places to insert them (it also rewrites header files for this), but I’m currently aiming at simpler cases.

Michael

In my experience, a lot of time is spent on optimizing the code (assuming it’s not a “-O0” build).

The numbers were actually for the debug build (-O0 -g), so for Release build they would be different (presumably lower).

Also redundant includes are largely fixed by header guards, and I believe Clang [and gcc as well as MS Compilers, and probably most others too] have an include guards-cache that determines that "we’ve already included foo.h, and it has include guards around the whole actual content of the file, so we can just skip it”.

By redundant here I meant that we included a file, but we didn’t use any of its content (rather than we included the same file twice).

So I’m slightly dubious as to this being an efficient way of significantly reducing the total compilation time for the overall project - even if there are SOME cases where there is a significant improvement in a single file. The total time for a clean build [in wall-clock-time, not CPU-time] should be measured, making sure that there is enough memory. Doing a run of, say, five complete builds of the same thing [with suitable “clean” between to redo the whole build], take away the worst and the best, and perhaps also “modify one of the more common header files” (llvm/IR/Type.h for example) and build again.

On full builds the benefit is not big (around 1%, but the noise is high), but: 1) if we only take gains more than, say, 5%, we’ll probably never see any, 2) I aim at changes that make the code strictly better (modulo David’s point about disk cache). If any change is questionable from maintenance or whatever other point of view, I’m all for dropping it.

Thanks,
Michael

my 2¢

+1 for point 2). Even leaving aside the speed gain, removing unused
includes file just looks like good coding practice to me.

  • We do indeed have a lot of unnecessary includes around in llvm (or pretty much any other C++ project for that matter).
  • I want faster builds.
  • The only way to reliably fight this is indeed automatic tools.
  • Having the right amount of includes also has documentation value and ideally let’s you understand the structure of your project.
  • However relying on transitive includes works contrary to the last “undestanding/documentation” point.
  • (And as stated earlier to have things really clean we want class XXX; instead of #include "XXX.h" wherever possible. And if you are serious about that we also often have to reduce the amount of include code in the headers so we can move the #include "XXX.h" from the header to the implementation.

For me personally I think the documentation/understandability we loose when relying on transitive includes weights heavier than my desire to get a faster build…

  • Matthias

I agree with you regarding the transitive includes. However, what’ I’m trying is different: I would like to remove headers that are included directly, but not used. Ideally, I’d like not to remove headers that are included transitively. The current patch has such removes because my tool was intended to just point us to files which can be cleaned-up and suggest a way to do it. The original idea was that we then take a look at interesting files and manually remove unnecessary headers after manual inspection.

I see though how having such removes in the patch leads the conversation to transitive headers/symbols forward declaration/etc (which I don’t plan to touch at the moment), so I’ll try to improve my tools so that they produce a cleaner patch. Stay tuned!

Thanks,
Michael

Hi,

Nice to IWYU developers here:) I wonder how hard it would be to run IWYU on LLVM/Clang (or, if it’s supposed to work, I wonder what I did wrong). If we also can tweak it a bit to make it choose more human-like (~more conservative) decisions, we would be able to just apply what it suggests!

Michael

  • We do indeed have a lot of unnecessary includes around in llvm (or pretty much any other C++ project for that matter).
  • I want faster builds.
  • The only way to reliably fight this is indeed automatic tools.
  • Having the right amount of includes also has documentation value and ideally let’s you understand the structure of your project.
  • However relying on transitive includes works contrary to the last “undestanding/documentation” point.
  • (And as stated earlier to have things really clean we want class XXX; instead of #include "XXX.h" wherever possible. And if you are serious about that we also often have to reduce the amount of include code in the headers so we can move the #include "XXX.h" from the header to the implementation.

For me personally I think the documentation/understandability we loose when relying on transitive includes weights heavier than my desire to get a faster build…

+1

Q.

Hi,

I tweaked my scripts to avoid removing includes when it doesn’t give any significant benefits, which made the patches significantly smaller. This time the patches should not try to remove includes of header files, which are transitively included from other included header files. The gains mostly remained the same (plus/minus noise), the tables are in the end of the email. I also included size of preprocessed files (measured in 1000 lines of code).

I suggest that from here we go as follows: maintainers/interested people take a look at files related to their components and pick the parts of the patches that they consider correct. I’ll also start with some files next week if there is no objections to it. Does it sound reasonable?

The most impacted files (the numbers are for Debug build):

LLVM top 10

Filename TimeOld TimeNew Delta SizeOld SizeNew SizeDelta
lib/CodeGen/GlobalISel/GlobalISel.cpp 0.26 0.02 -91.6% 35.0 0.3 -99.0%
lib/MC/MCLabel.cpp 0.20 0.02 -88.0% 25.5 0.0 -99.9%
tools/llvm-readobj/ObjDumper.cpp 0.44 0.10 -76.8% 41.0 11.8 -71.1%
lib/MC/MCWinEH.cpp 0.49 0.15 -70.4% 43.9 21.4 -51.2%
lib/Transforms/Vectorize/Vectorize.cpp 0.73 0.29 -60.7% 52.7 35.5 -32.6%
tools/llvm-diff/DiffLog.cpp 0.59 0.27 -53.8% 50.7 33.7 -33.7%
lib/Target/ARM/MCTargetDesc/ARMMachORelocationInfo.cpp 0.47 0.25 -46.7% 46.7 37.9 -18.9%
lib/DebugInfo/DWARF/DWARFExpression.cpp 0.67 0.38 -43.5% 47.4 34.8 -26.7%
lib/Transforms/Utils/ASanStackFrameLayout.cpp 0.52 0.32 -38.8% 41.7 33.7 -19.2%
tools/llvm-dwp/llvm-dwp.cpp 2.48 1.53 -38.3% 92.5 55.2 -40.3%

Full list:

llvm.txt (24 KB)

clang.txt (7.76 KB)

llvm_redundant_headers.patch (149 KB)

clang_redundant_headers.patch (46.5 KB)