clang++ vs g++ compilation speed for ace-tao servants

Hi All,

We are evaluating switch to clang from gcc for our c++ application. Main focus is compilation speed, but we also look at address/memory sanitizer, c++11 support and c++ modules. I’ve tried to compile our project with clang++ but resulting compilation time is more than with gcc. Can someone give me a hint where might be a problem?

Our project is bunch of ace-tao corba servants. Overall time with clang++ is about twice more than with gcc. To reproduce problem I preprocesssed one file and then compile it, result same - twice longer compilation. Don’t know where to look further.
[root@ivagulin-pc ~]# time clang++ -c RemoveClusterObserverClang.cpp
real 0m1.283s
user 0m1.254s
sys 0m0.024s
[root@ivagulin-pc ~]# time g++ -c RemoveClusterObserverGcc.cpp
real 0m0.576s
user 0m0.524s
sys 0m0.048s

I use llvm-3.3 and cfe-3.3 compiled with folowing options. Sources of RemoveClusterObserver*cpp attached.

  • cmake -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_BUILD_TYPE=Release ‘-DCMAKE_CXX_FLAGS_RELEASE=-O3 -g -mtune=amdfam10 -march=i686’ ‘-DCMAKE_C_FLAGS_RELEASE=-O3 -g -mtune=amdfam10 -march=i686’ -DCMAKE_EXE_LINKER_FLAGS_RELEASE=-Wl,–as-needed -Wl,–strip-all’ ‘-DCMAKE_MODULE_LINKER_FLAGS_RELEASE=-Wl,–as-needed -Wl,–strip-all’ ‘-DCMAKE_SHARED_LINKER_FLAGS_RELEASE=-Wl,–as-needed -Wl,–strip-all -shared’ -DCMAKE_SKIP_RPATH=YES -DBUILD_SHARED_LIBS=YES -DLLVM_ENABLE_TIMESTAMPS=NO …

Igor Vagulin

sources.tar.bz2 (590 KB)

Hi All,

We are evaluating switch to clang from gcc for our c++ application.
Main focus is compilation speed, but we also look at address/memory
sanitizer, c++11 support and c++ modules. I've tried to compile our
project with clang++ but resulting compilation time is more than with
gcc. Can someone give me a hint where might be a problem?

Our project is bunch of ace-tao corba servants. Overall time with
clang++ is about twice more than with gcc. To reproduce problem I
preprocesssed one file and then compile it, result same - twice longer
compilation. Don't know where to look further.
[root@ivagulin-pc ~]# time clang++ -c RemoveClusterObserverClang.cpp
real 0m1.283s
user 0m1.254s
sys 0m0.024s
[root@ivagulin-pc ~]# time g++ -c RemoveClusterObserverGcc.cpp
real 0m0.576s
user 0m0.524s
sys 0m0.048s

I use llvm-3.3 and cfe-3.3 compiled with folowing options. Sources of
RemoveClusterObserver*cpp attached.
+ cmake -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_BUILD_TYPE=Release
'-DCMAKE_CXX_FLAGS_RELEASE=-O3 -g -mtune=amdfam10 -march=i686'
'-DCMAKE_C_FLAGS_RELEASE=-O3 -g -mtune=amdfam10 -march=i686'
-DCMAKE_EXE_LINKER_FLAGS_RELEASE=-Wl,--as-needed -Wl,--strip-all'
'-DCMAKE_MODULE_LINKER_FLAGS_RELEASE=-Wl,--as-needed -Wl,--strip-all'
'-DCMAKE_SHARED_LINKER_FLAGS_RELEASE=-Wl,--as-needed -Wl,--strip-all
-shared' -DCMAKE_SKIP_RPATH=YES -DBUILD_SHARED_LIBS=YES
-DLLVM_ENABLE_TIMESTAMPS=NO ..

Igor Vagulin

sources.tar.bz2 (590 KB)

Are you able to compile ACE+TAO? I tried once (using clang 3.3) and noticed that clang complains on operator<< overloaded for some corba magic (do not remember the details now). The entire quality of the tao implementation is questionable.

Regards,
Tomek

Hi All,

We are evaluating switch to clang from gcc for our c++ application.
Main focus is compilation speed, but we also look at address/memory
sanitizer, c++11 support and c++ modules. I've tried to compile our
project with clang++ but resulting compilation time is more than with
gcc. Can someone give me a hint where might be a problem?

Our project is bunch of ace-tao corba servants. Overall time with
clang++ is about twice more than with gcc. To reproduce problem I
preprocesssed one file and then compile it, result same - twice longer
compilation. Don't know where to look further.

If you're lucky, someone might look for you if you attach (or link to, if
it's too big to attach) an example of the problem.

Otherwise, you might want to get started with a profiler and see where the
hot parts of Clang/LLVM are in your example.

(also, consider trying with Clang top of tree (straight from svn/git) - the
project moves fairly quickly)

I suspect the problem is overload resolution for the several hundred overloads of each of ‘operator<<’, ‘operator<<=’, ‘operator>>’, and ‘operator>>=’ that are present here. Many of these have the same LHS parameter type; we could probably improve performance here by caching the computation of an implicit conversion sequence for a given (argument, parameter type) pair.

We may also be able to pick out an ‘obvious winner’ (perhaps looking for one that only requires standard conversions) before trying to build conversion sequences for all candidates.

Hi Richard,

I've spend few days tweaking all possible flags on clang to make it
work close to gcc. At best found combination clang was only about 30%
worse than gcc(4.4 from rhel-6). Maybe you can give me another hint? I
really want to switch :).

I've create 100 copies of RemoveClusterObserver.cpp, below is test
results of "time g++|clang++ -c RemoveClusterObserver-*cpp"
g++: 0m58.510s
clang++ default opts: 2m4.065s
+ march=core2: 1m43.729s
+ BUILD_SHARED_LIBS=NO: 1m27.216s
+ by-clang-build-by-clang++: 1m18.816s

I've also trying measure compile time of clang sources, I thought this
case should be comfortable for clang. No luck, clang still worse in
every scenario. BTW here I also measured intel c compiler.
- gcc-4.4:
real 7m7.759s
user 45m53.622s
sys 2m3.657s
- icc 2013-sp1:
real 8m4.175s
user 54m39.116s
sys 2m35.159s
- clang compiled by gcc:
real 8m42.278s
user 60m19.175s
sys 0m51.341s
- clang compiled by icc (who is paying for this compiler? :-/):
real 8m2.399s
user 57m31.185s
sys 0m57.272s

Then I thought maybe problem is x86 architecture and lack of register
and switched to x86_64. Looks like that's the case, and people
claiming "clang compiles faster than gcc" mean "on x86_64".
- gcc:
real 8m8.230s
user 50m1.458s
sys 3m44.313s
- clang compiled by gcc:
real 7m57.747s
user 55m16.080s
sys 1m27.786s
- clang compiled by clang:
real 6m41.412s
user 44m53.298s
sys 1m27.715s
Igor Vagulin

Hi Richard,

<snip>

Then I thought maybe problem is x86 architecture and lack of register
and switched to x86_64. Looks like that's the case, and people
claiming "clang compiles faster than gcc" mean "on x86_64".
- gcc:
real 8m8.230s
user 50m1.458s
sys 3m44.313s
- clang compiled by gcc:
real 7m57.747s
user 55m16.080s
sys 1m27.786s
- clang compiled by clang:
real 6m41.412s
user 44m53.298s
sys 1m27.715s

Nice work and thanks for chasing this down... Being that gcc is c code (until recently) and clang is c++ - I'm not surprised those extra registers made the difference :slight_smile:

Hi,

Hi Richard,

I've spend few days tweaking all possible flags on clang to make it
work close to gcc. At best found combination clang was only about 30%
worse than gcc(4.4 from rhel-6). Maybe you can give me another hint? I
really want to switch :).

I've create 100 copies of RemoveClusterObserver.cpp, below is test
results of "time g++|clang++ -c RemoveClusterObserver-*cpp"
g++: 0m58.510s
clang++ default opts: 2m4.065s
+ march=core2: 1m43.729s
+ BUILD_SHARED_LIBS=NO: 1m27.216s
+ by-clang-build-by-clang++: 1m18.816s

I've also trying measure compile time of clang sources, I thought this
case should be comfortable for clang. No luck, clang still worse in
every scenario. BTW here I also measured intel c compiler.
- gcc-4.4:
real 7m7.759s
user 45m53.622s
sys 2m3.657s
- icc 2013-sp1:
real 8m4.175s
user 54m39.116s
sys 2m35.159s
- clang compiled by gcc:
real 8m42.278s
user 60m19.175s
sys 0m51.341s
- clang compiled by icc (who is paying for this compiler? :-/):
real 8m2.399s
user 57m31.185s
sys 0m57.272s

Thanks for putting together these numbers! I would guess
clang-compiled-by-clang would clock in at a little under 8 minutes.

Then I thought maybe problem is x86 architecture and lack of register
and switched to x86_64. Looks like that's the case, and people
claiming "clang compiles faster than gcc" mean "on x86_64".
- gcc:
real 8m8.230s
user 50m1.458s
sys 3m44.313s
- clang compiled by gcc:
real 7m57.747s
user 55m16.080s
sys 1m27.786s
- clang compiled by clang:
real 6m41.412s
user 44m53.298s
sys 1m27.715s
Igor Vagulin

> Looks like your clang is built with shared libraries enabled. That will
be
> hurting your performance somewhat, but I don't know how much. Try without
> -DBUILD_SHARED_LIBS=YES.
>
> Other than that, the only abnormally high cpu usage is within
> Sema::BuildBinOp, which will probably be due to the large number of
> overloads of operator<< etc that I observed earlier.

I still think it'd be worth us working on the performance here; there's
some obvious redundancy.