Why is my code 9 times slower with Clang than with gcc?

I'm developing a hash function based on Bob Jenkins' one. From the
start, I used to compile it with gcc 4.9. Now I decided to try Clang 3.4
and was shocked to see that the results are just terrible. Now I wonder
what should I do to make Clang do fair here too.
I also tried Clang 3.1 and gcc 4.2.1 - the former was very slow, the
latter OK.
Detailed results:
pcbsd-8973% sudo nice -n -10 ./SMHasher Spooky128

Here's something to try: wrap template class SpookyHash in an anonymous
namespace. What impact does this have on performance?

You didn't include a main() function so I can't run it and see concrete
numbers. I think the problem is that it looks like the code is manually
unrolled in parts (h0 through h11?!) and in turn that's causing the
functions to be so big that llvm is refusing to inline them.


Wow...that's it:
pcbsd-8973% ./SMHasher Spooky128