[OT?] real-world interest of the polly optimiser

Hi,

Apologies if this isn't the best place. I've been looking for some information (understandable by the average user) about the real-world benefits of the polly optimiser, but have found only either very broad and vague claims or specialist research papers.

What I'd like to get an idea of is what benefits Polly brings, under what conditions, for what cost and how (= any special compiler options needed?).

Also, given I'm installing clang via MacPorts: does clang pick up Polly's presence automatically after I add the libpolly binary (i.e. port:llvm with the +polly install variant) or do I need to rebuild clang too?

Thanks,
René

Hi René,

Hi,

Apologies if this isn't the best place.

Polly has its own mailing list here:
https://groups.google.com/forum/#!forum/polly-dev
polly-dev@googlegroups.com

I've been looking for some information (understandable by the average user) about the real-world benefits of the polly optimiser, but have found only either very broad and vague claims or specialist research papers.

As a researcher, I can tell about the research we are doing. We
currently have a paper under review about optimizing gemm where we get
85\% of vendor-provided BLAS implementation, which is 20x the speed of
the program compiled by clang without Polly.

We know Samsung, Qualcomm and Xilinx are using Polly on a regular basis.

Polly can automatically generate OpenMP and CUDA code. The benefits
depend a lot on what you are using it for, for instance whether your
code consists of for-loops and dense arrays. In other cases you only
get increased compilation time.

What I'd like to get an idea of is what benefits Polly brings, under what conditions, for what cost and how (= any special compiler options needed?).

Also, given I'm installing clang via MacPorts: does clang pick up Polly's presence automatically after I add the libpolly binary (i.e. port:llvm with the +polly install variant) or do I need to rebuild clang too?

I don't have a Mac, so I don't know how it works there. So I can only
explain how to do it from source:

Check out the Polly source into LLVM's tools directory then recompile
opt and clang. Add "-mllvm -polly" to the clang command line to enable
Polly.

As currently being a research project, I'd not expect a sudden
improvement of execution time. Performance-critical "real-world" code
is often already optimized manually simply because general purpose
compilers do not automatically optimize code aggressive enough. Many
such manual optimizations are incompatible with Polly, e.g. parts
written in assembler.

Michael

Sorry, but please don't
1) Provide numbers when comparing against a weak baseline

Please do
2) If you do have a valid performance comparison or claim - please do
provide enough information so that a complete picture is presented.

You statement just came across as something like either llvm's loop
optimizer sucks so bad that polly is required and or somehow it's hitting a
corner case which is a sweetspot for polly.

To my knowledge, polly is not in use in any production setting. It is used for research purposes, but I don't believe it has been productionized at this time.

Philip

In applications like linear algebra a lot of performance comes from optimizing loop nests for cache locality. Doing things like loop interchange, loop nest distribution, unroll and jam, etc. helps a lot with it, and to the best of my knowledge LLVM does none of that. There is some basic support for loop fusion and distribution, but I don't think it works on the nest level. Given how important that is in high-performance computing, the 20x difference sounds believable.

I don't know what the general plan is: if there is any interest in loop nest optimizations in the LLVM itself, or whether this task is delegated to Polly to handle at some point. In any case, without those optimizations there is only so much that can be done.

-Krzysztof

We implemented it recently, but only for gemm-like kernels, basically
the techniques from
http://www.cs.utexas.edu/users/flame/pubs/TOMS-BLIS-Analytical.pdf

We sent a paper for review to ACM TACO. As it is under review, and I
am not the main author, I think cannot just share it publicly (yet).

Michael