Performance Tracking

Hello Everyone,

I've been looking at benchmarks of LLVM recently, and overall they look pretty good. Aside from things that use OpenMP or benefit from autovectorisation, Clang/LLVM and GCC seem to come fairly close, with no overall winner.

But: there do seem to have been a number of performance regressions between 2.9 and 3.0:

http://openbenchmarking.org/result/1110178-AR-1110173AR66

Identifying where these were introduced is quite difficult. I wonder if some of the buildbots could be persuaded, after building clang, to build and run some benchmark projects (e.g. perlbench, povray, flac / lame, whatever) and email people if they introduce a performance regression, just as the current buildbots email people who break the build.

David

Hello Everyone,

I've been looking at benchmarks of LLVM recently, and overall they look pretty good. Aside from things that use OpenMP or benefit from autovectorisation, Clang/LLVM and GCC seem to come fairly close, with no overall winner.

Nice. Thanks.

But: there do seem to have been a number of performance regressions between 2.9 and 3.0:

http://openbenchmarking.org/result/1110178-AR-1110173AR66

Identifying where these were introduced is quite difficult. I wonder if some of the buildbots could be persuaded, after building clang, to build and run some benchmark projects (e.g. perlbench, povray, flac / lame, whatever) and email people if they introduce a performance regression, just as the current buildbots email people who break the build.

Step one would be to adopt these tests into the LLVM test harness. It will force organizations that care about performance to start paying attention to these benchmarks.

Evan

Many thanks David, it had been a while (6 months I guess) since the last benchmark I saw and I was wondering how the new Clang/LLVM compared to GCC!

One comment though, the graphs are great, however the alternance of “less is better”/“more is better” makes for a difficult read: it’s not obvious at a glance which is performing better and it’s difficult to get a quick overview surveying the few graphs available.

Perhaps that introducing a relative performance would help: normalize on the winner and indicate the factor or percentage next to the “losers” ?

Thanks for taking the time of doing this anyway, I’m looking forward to the final article.

– Matthieu

To clarify - I didn't create these benchmarks and am not affiliated in any way with the site that did, someone sent me the link and asked me if I knew what accounted for the differences between the three compilers tested.

After looking at them, I see that there are some improvements and some regressions between 2.9 and 3.0. I am interested in us setting up something that ensures that 3.1 contains only improvements and not regressions.

Running benchmarks like these on (at least some of) the buildbots and sending mails to people for any commit that resulted in a slowdown would be a good start. I believe most other compilers do something along these lines...

David

I completely agree, and I think Evan has described the right approach. Look at the way these benchmarks run, and port them to the LLVM test suite. There are bots that run nightly and dashboards that track regressions.

However, I can only spot two regressions:

FLAC encoding regresses by maybe 2% – is that actually within the noise?

John The Ripper: Blowfish regresses by over 5%; that one actually looks interesting.

The rest seem to have improved, or to have some error in running…