Clang build of ATLAS (and speed comparison)

Hi there,

together with Clint Whaley, the author and maintainer of the ATLAS suite, we are currently evaluating clang/llvm performance to support the clang compiler for ATLAS building, besides gcc, in the scope (on my side) to add this feature to the Macports port.

As a preliminary report, here is what spits out the ATLAS built-in benchmark (‘make time’) after a compilation with the -Oz flag. Reference stands for an installation presumably with GCC4.x on Linux (Clint, could you elaborate on this?). Machine is a 3-year old MacBook (late 2008, Core 2 Duo), compiler is the version of clang shipped with latest Xcode 4.2:

Apple clang version 3.0 (tags/Apple/clang-211.9) (based on LLVM 3.0svn)

Dragonegg with gcc4.5 is used for fortran compilation, but I don’t think it is relevant here (again, Clint, could you confirm this ?). Anyway, it is pure LLVM output code.

Vincent,
Hi there,

As a preliminary report, here is what spits out the ATLAS built-in benchmark
(‘make time’) after a compilation with the -Oz flag. Reference stands for an
installation presumably with GCC4.x on Linux (Clint, could you elaborate on
this?). Machine is a 3-year old MacBook (late 2008, Core 2 Duo), compiler is
the version of clang shipped with latest Xcode 4.2:

I would give them the table you built before, where you contrast Clang and
GCC4.5 *on the same machine*. The numbers in the default compare against
timings ran on my own Core2 system, which may have strong differences
from yours (different cache, memory, OS, and compiler).

Dragonegg with gcc4.5 is used for fortran compilation, but I don’t think it is
relevant here (again, Clint, could you confirm this ?). Anyway, it is pure
LLVM output code.

Fortran is only used to compile interface files, and has no affect on
performance.

What we see is that, while clang seems to outperform GCC on level2 BLAS ops
(matrix • vector), it is consistently 20 % inferior on level3 ops (lines 2, 3
and 4).

Most of the L2BLAS use intrinsics or assembly, so the compiler is not as
important for these lines. The lines that rely on the compiler for performance
are kGenMM, kMM_NT, and kMM_TN.

Please note, and this is also important, that neither at -O0, nor at -O3 does

That is odd indeed: I've never heard -O0 failing while higher optimization
works . . .

Cheers,
Clint

Vincent Habchi <vince@macports.org> writes:

As a preliminary report, here is what spits out the ATLAS built-in
benchmark (‘make time’) after a compilation with the -Oz
flag. Reference stands for an installation presumably with GCC4.x on
Linux (Clint, could you elaborate on this?).

"gcc 4.x" is absurdly vague though -- gcc has changed hugely between 4.0
and 4.7. You also have no idea how the reference build was done.

It seems much more reasonable to just build the benchmarks with a
suitably recent version of gcc (4.6 probably) yourself.

-miles

[ relative ] companions of builds on the same machine ?

Can you point us to the source of ATLAS? Does it have a web page?

Evan

Hi Evan,

http://math-atlas.sourceforge.net

But ATLAS is very trick to figure out, because it uses a lot of dynamic makefile generation. Clint’s help is mandatory.

Thanks a lot for your interest!

Vincent

PS: I just received a brand new Macmini with a Sandy bridge Core i5, I’ll be able to carry out further tests.

Please let us know which version did you use and how to get it to build with Clang. We're also interested in how you conducted the experiments so we can reproduce the issues you have run into.

Thanks,

Evan

Evan,

I wrote a dedicated Portfile for Mac OS X Macports project. Is this acceptable to you?

Vincent

Evan,

I wrote a dedicated Portfile for Mac OS X Macports project. Is this acceptable to you?

What's that? :slight_smile: Sorry, I don't use MacPorts.

Evan

Gentoo Prefix also has ATLAS, at least for 32-bit Intel :slight_smile: