LLVM compile speed significantly slower than GCC (w/ test case)

I've been doing some profiling of LLVM on our codebase, to see how it
stacks up to the existing GCC build that we do. The primary thing I'm
focusing on at the moment is build speed, and in this regard LLVM
seems to be pretty all over the map. On some files it seems to go
quite a bit faster than GCC, and on others it's slower, leading to an
aggregate build time for our repository that's roughly the same as
GCC.

Some IRC discussions suggested that you guys might be interested in
seeing an example of a file that goes appreciably slower, so I managed
to isolate one that's completely self-contained. It's a relatively
stock implementation of the SHA1 algorithm, so it should be a pretty
straightforward file to follow, as well as being a relatively
data-intensive piece of code.

I compiled the file with both compilers for the arm-none-eabi triple.
The numbers I get are as follows:

GCC (4.5.2, Windows build from CodeSourcery) - With -O0: 110ms, with -O2: 215ms
Clang/LLVM (Release mode, LLVM git hash 7f5714f4..., clang git hash
9d9cf5...) - With -O0: 110ms, with -O2: 640ms

The compilers are essentially identical for the -O0 case, but when
compiling with -O2, LLVM takes almost three times as long as GCC.

I'm not sure whether this file is unusual in some way, such that
fixing whatever makes this slow wouldn't have much of an effect on
other files, or if this is evidence of some problem that's broad
enough to improve the compile speed of a wide variety of files. If
anybody is interested in investigating the discrepancy, though, I'd
love to hear about it.

Thanks,
Matt

sha1test.c (8.96 KB)

I’m looking into why.

For me -O0 is about 4x faster w/ Clang, but -O2 is 2x slower. I’ll update after a pleliminary analysis. Note that this is comparing against gcc 4.6.2.

Thanks, Matt. This is great information. Sounds like Chandler is looking into the details of what's going on.

-Jim

GCC (4.5.2, Windows build from CodeSourcery) - With -O0: 110ms, with -O2: 215ms
Clang/LLVM (Release mode, LLVM git hash 7f5714f4..., clang git hash
9d9cf5...) - With -O0: 110ms, with -O2: 640ms

Hi Matt,

I only see 2x slowdown on my machine (consistently, O2 and O3), but
that's still bad.

If you compile to IR then pass "opt -time-passes" you can get a good
idea who the culprit is:

$ clang -O0 -S -emit-llvm sha1test.c
$ opt -time-passes -O2 sha1test.s
(...)
   ---User Time--- --System Time-- --User+System-- ---Wall
Time--- --- Name ---
   0.2720 ( 54.0%) 0.0000 ( 0.0%) 0.2720 ( 53.5%) 0.2821 (
52.8%) Combine redundant instructions
   0.1160 ( 23.0%) 0.0000 ( 0.0%) 0.1160 ( 22.8%) 0.1162 (
21.8%) Combine redundant instructions
   0.0600 ( 11.9%) 0.0000 ( 0.0%) 0.0600 ( 11.8%) 0.0610 (
11.4%) Combine redundant instructions
   0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0205 (
3.8%) Early CSE
   0.0240 ( 4.8%) 0.0000 ( 0.0%) 0.0240 ( 4.7%) 0.0204 (
3.8%) Combine redundant instructions
   0.0200 ( 4.0%) 0.0000 ( 0.0%) 0.0200 ( 3.9%) 0.0203 (
3.8%) Combine redundant instructions
(...)

That's roughly 99% user / 97% wall clock.

Seeing from your source, you have a few simple macros repeated over
and over, which will stress the combiner, for sure.

This is a great micro-benchmark (and a very common pattern), thanks
for the report!

Renato Golin <rengolin@systemcall.org> writes:

If you compile to IR then pass "opt -time-passes" you can get a good
idea who the culprit is:

$ clang -O0 -S -emit-llvm sha1test.c
$ opt -time-passes -O2 sha1test.s
(...)
   ---User Time--- --System Time-- --User+System-- ---Wall
Time--- --- Name ---
   0.2720 ( 54.0%) 0.0000 ( 0.0%) 0.2720 ( 53.5%) 0.2821 (
52.8%) Combine redundant instructions
   0.1160 ( 23.0%) 0.0000 ( 0.0%) 0.1160 ( 22.8%) 0.1162 (
21.8%) Combine redundant instructions
   0.0600 ( 11.9%) 0.0000 ( 0.0%) 0.0600 ( 11.8%) 0.0610 (
11.4%) Combine redundant instructions
   0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0205 (
3.8%) Early CSE
   0.0240 ( 4.8%) 0.0000 ( 0.0%) 0.0240 ( 4.7%) 0.0204 (
3.8%) Combine redundant instructions
   0.0200 ( 4.0%) 0.0000 ( 0.0%) 0.0200 ( 3.9%) 0.0203 (
3.8%) Combine redundant instructions
(...)

That's roughly 99% user / 97% wall clock.

Seeing from your source, you have a few simple macros repeated over
and over, which will stress the combiner, for sure.

This is a great micro-benchmark (and a very common pattern), thanks
for the report!

Incidentally, we just ran into a compile time issue that also pointed to
instcombine. Just another datapoint to confirm that some work here
would be helpful.

                              -Dave

FWIW, the result of my investigation led to a new aspect of an existing bug: http://llvm.org/PR13392

This is responsible for a very sizable chunk of the compile time due to slowing down extremely fundamental analysis operations in LLVM – computing the properties of specific bits in integer values. As these values become increasingly large (especially larger than a native integer type), the LLVM optimization and analyses become quite slow.

I’ve dug fairly deeply into this particular issue and mailed out a patch to address it. I’m hopeful we’ll get it addressed quickly. Once fixed, I’m seeing between 20% and 50% compile time reductions on the test case you provided, so I think it will address your concerns. Feel free to add yourself to the bug’s CC list if you want to know when it is addressed.

If you’d like to understand the inner details of what went wrong, I have a write up attached to the patch I posted1, but it may not be at all obvious how all these things are related. This one was pretty nasty to untangle.

Thanks again for the report!
-Chandler