C++ ray tracer performance: gcc 4.3.2 vs llvm-gcc 4.2.1

On the off chance anyone here is interested in more performance results, I
compiled and ran the fastest of the implementations of the ray tracer in C++


This is a small program with a relatively large hotpath. Specifically, around
30% of the time is spend in the ray sphere intersection but another 30% is
also spent in the intersect function that recursively traverses a
hierarchical scene.

This is on an (eight core) 2.1GHz Opteron 2352, compile time and then run time
for each compiler in 32- and 64-bit mode:

$ time g++ -O3 -msse3 -ffast-math ray.cpp -o ray
real 0m0.770s
$ time ./ray 9 512 >image.pgm
real 0m3.772s

$ time llvm-g++ -O3 -msse3 -ffast-math ray.cpp -o ray
real 0m0.746s
$ time ./ray 9 512 >image.pgm
real 0m3.278s


$ time g++ -O3 -ffast-math ray.cpp -o ray
real 0m0.774s
$ time ./ray 9 512 >image.pgm
real 0m3.068s

$ llvm-g++ -O3 -ffast-math ray.cpp -o ray
real 0m0.741s
$ time ./ray 9 512 >image.pgm
real 0m3.009s

Note that llvm-gcc is generating faster code than GCC in both cases and, in
particular, is 15% faster on x86! In fact, I have tried many different
command line options to GCC (including -march=barcelona) and none can beat

I find these recent results so compelling that I intend to benchmark larger
programs such as FFTW in the future.