C++ Expression Template Benchmarks for GCC/Clang/Intel/PGI/MSVC

Hello Everyone,

I thought you might be interested in some C++ expression template
benchmarks I have done.

  http://www.wlandry.net/Projects/FTensor#Benchmarks

Clang's performance was mixed. It optimized the expression template
code just as well as the code that unrolled the expressions by hand,
but that may be because it only did a mediocre job of optimizing the
unrolled versions. GCC had similar performance issues until I used
-Ofast. I could not find a similar option for Clang, partly because I
could not find a complete list of Clang compiler options. You can see
a list of all of the compiler options that I used at

  http://www.wlandry.net/Projects/FTensor/compilers_2012.html

I used clang 3.0. I also tried the 3.1 binary. The difference in
performance was, on the whole, not significant.

Cheers,
Walter Landry
wlandry@caltech.edu

Thanks, this looks pretty neat! Would you happen to have compile times as well?

It is not such a good benchmark for that. There is a lot of code for
each benchmark that is not used. So the compilation time does not
really depend strongly on the complexity of the expressions.

But since you asked, it takes clang 3.0 between 4.3 and 4.6 seconds to
compile any one of the FTensor (not C-tran) benchmark files, gcc 4.7
takes between 3.9 and 4.7 seconds, and Intel between 6.4 and 7.4
seconds.

Cheers,
Walter Landry

Hello Everyone,

I thought you might be interested in some C++ expression template
benchmarks I have done.

http://www.wlandry.net/Projects/FTensor#Benchmarks

Clang's performance was mixed. It optimized the expression template
code just as well as the code that unrolled the expressions by hand,
but that may be because it only did a mediocre job of optimizing the
unrolled versions. GCC had similar performance issues until I used
-Ofast. I could not find a similar option for Clang, partly because I
could not find a complete list of Clang compiler options. You can see
a list of all of the compiler options that I used at

-Ofast enables unsafe optimizations that can change the results produced by floating-point operations, so it doesn't make sense to compare the code generated by one compiler using -Ofast (which gets to break the rules of floating-point math) against the code generated by another compiler that hasn't been allowed to break those rules. It's very possible that -Ofast doesn't even make sense for your library, unless you don't care about the accuracy of your results.

IIRC, Clang doesn't actually do anything with -ffast-math, either. So, an apples-to-apples comparison would not use -Ofast or -ffast-math for either. Of course, it's completely fair criticism to say that, for people who don't require exact FP math, -Ofast gives a very nice performance boost in GCC that Clang can't match.

http://www.wlandry.net/Projects/FTensor/compilers_2012.html

I used clang 3.0. I also tried the 3.1 binary. The difference in
performance was, on the whole, not significant.

CC'ing llvm-dev, because code generation and optimization is handled by the LLVM core.

  - Doug

The code generator does understand the -ffast-math flag, and there are a small number of peephole optimizations that make use of it. However, it's not very comprehensive.

--Owen

IIRC, fast-math does get propagated to LLVM, which does honor it.

John.

You're right; I see it now.

  - Doug