Hello LLVM/Clang developers,
We recently switched to use the same clang version on all our platforms. This included switching from apple-clang from xcode to a pre-built binary we downloaded from llvm.org. We noticed that this actually came with a pretty big performance regression in compile times.
If we do the simplest test program like this:
std::cout << “Hello world” << std::endl;
and compile that with Xcode Clang (Xcode 10.1 apple-clang clang-1000.11.45.5):
clang++ test.cpp -o test 0.31s user 0.06s system 97% cpu 0.380 total
with clang 7 binaries found on llvm.org 7.0.0:
~/Downloads/clang+llvm-7.0.0-x86_64-apple-darwin/bin/clang++ -o test test.cpp 0.53s user 0.11s system 62% cpu 1.032 total
If we now run that on our whole project:
with xcode clang:
368.17s user 32.00s system 663% cpu 1:00.30 total
with clang 7:
423.31s user 31.65s system 662% cpu 1:08.69 total
That’s a pretty hefty difference. Any ideas what can account for this discrepancy? Does apple-clang contain any special patches or build flags that differ a lot from the binaries on llvm.org?
I know about PGO - and I guess the best we could do is to get profile data out of compiling my whole tree and use that when building clang - but this process seems not very well documented and unsure if this would even help.
Thankful for any ideas or feedback.
I don’t know about what sort of secret sauce apple may have (& they won’t necessarily be able to talk about it) - though mostly I believe they have fun tional rather than performance patches.
Profile guided optimisation is likely important - Google uses it internally for compiler releases too.
The obvious question is whether the llvm.org builds are using -DLLVM_ENABLE_ASSERTIONS:OFF -DCMAKE_BUILD_TYPE:STRING=Release -DLLVM_LINK_LLVM_DYLIB:BOOL=ON which would improve the load time of the compiler by combining all of the llvm libs into a single dylib and would eliminate the speed decrease from using the default use of assertions in the built compiler.
If I recall correctly the release builds are built with fdo and thinlto/lto. The cmake scripts should be in tree somewhere (I’m on my phone which makes looking hard). Those could be worth 20% for build time that you’re seeing.
I don’t think Apple disable assertion on the release build. I remember having clang and llvm crash because of assertion failure regularly at some point in the past.
Nowadays, it is far more unusual to get a clang crash, so I can’t tell, but I doubt they change the configuration.
The latest available release posted on their open source web site argues that that they are disabling assertions in clang.
Thank you for the correction. So they actually did change that at some point.
(Sorry for the late response, I was out of town for the holiday)
There is a little bit of internal magic for how AppleClang is built, but much of the essential configuration is in the public tree and documented here:https://llvm.org/docs/AdvancedBuilds.html#apple-clang-builds-a-more-complex-bootstrap
Apple also uses PGO and order files for the AppleClang builds, and there is PGO infrastructure publicly documented here:
That PGO infrastructure can also generate linker order files on Darwin. Unfortunately Apple’s PGO training data is internal, so you’ll need to come up with your own test cases for that. I would love to see more comprehensive training data in the open source tree, right now the only test we have is under clang/utils/perf-training, and it is a simple c++ hello world.