More on atlas and clang

Hi there,

I have recently undertaken another experimental build of Atlas (http://math-atlas.sourceforge.net – briefly speaking, Atlas provides a highly complete BLAS/LAPACK implementation optimized for the native architecture of the computer on which it is running) on an AVX machine (MacMini 2011) using a snapshot of clang 3.3 (r173279) provided by MacPorts (http://macports.org), with -O3, -fPIC, -fvectorize and -fslp-vectorize flags.

I am please to say that:

1. The generated AVX code seems fine: a full test session run under an Atlas-based SciPy didn’t raise any error;
2. The performance seems now on-par or even (sometimes surprisingly) better than the ‘reference GCC’ – whatever that means (I was unable to get in touch with Atlas developer at that time) – as evidenced by the table below:

Reference clock rate=3292Mhz, new rate=2300Mhz
  Refrenc : % of clock rate achieved by reference install
  Present : % of clock rate achieved by present ATLAS install

                   single precision double precision
           ******************************** *******************************
                 real complex real complex
           --------------- --------------- --------------- ---------------
Benchmark Refrenc Present Refrenc Present Refrenc Present Refrenc Present
========= ======= ======= ======= ======= ======= ======= ======= =======
kSelMM 1289.9 1407.4 1188.7 1229.8 686.7 826.8 647.4 682.1
kGenMM 198.2 239.7 198.5 237.8 193.9 231.8 196.0 233.8
kMM_NT 193.7 266.4 195.2 192.9 184.2 187.4 188.5 197.5
kMM_TN 198.5 211.1 197.9 226.2 189.8 227.6 189.5 223.2
BIG_MM 1213.8 1346.7 1241.3 1366.5 652.0 789.5 661.4 795.8
  kMV_N 224.3 308.1 438.8 617.3 115.9 152.1 205.8 283.5
  kMV_T 224.6 313.5 460.3 642.9 123.2 159.6 211.3 288.2
   kGER 148.3 192.4 290.2 381.2 73.3 95.6 144.3 184.3

This is in stark contrast with the previous test where clang were lagging about 20% beyond the ‘reference implementation’ based on GCC for lines 2, 3 and 4 where compiler performance matters most.

So – to summarize in two words: kudos folks!

I will build another version on a Core2Duo machine tonight and see if the results are consistent.

Cheers!
Vincent