initial clang-omp/openmp benchmarking

I’ve done some initial benchmarking of the openmp performance using the clang compiler from our fink llvm34-3.4.1-0e packaging which has the current openmp trunk svn built against the llvm/compiler-rt/clang 3.4.1 with a back port of current clang-omp applied. The results for the heated_plate_openmp.c demo code compiled and run with the shell script revealed some interesting results. The demo code is run at one, two and four OMP processes. Ratioing these timings to the one OMP process timing shows the following on a 16-core MacPro on darwin13…

1:1.90:3.31 for FSF gcc 4.8.3

1:1.90:3.30 for FSF gcc 4.9.0

1:1.99:3.71 for clang 3.4.1 with openmp and merged clang-omp

this compares to the results on a 24-core Fedora 15 linux box

1:1.99:3.92 for FSF gcc 4.6.3

1:1.99:3.93 for FSF gcc 4.8 branch svn

I’ve filed on the reduced performance of gomp on darwin compared to iomp5 on darwin and gomp on linux. Their response was that darwin’s use of pthread_mutex calls rather than futex was the cause in gomp.
While the results for iomp5 are far better on darwin than those for gomp on darwin, we still are lagging behind the performance of gomp using futex on linux. Does anyone have clang-omp/openmp on linux? I would be curious to know what the timing ratios for heated_plate_openmp.c demo code look like on linux compare to what we get on darwin. FYI, the heated_plate_openmp.c and are attached to PR 61333.