Loop-specific optimizations

Hi al,

At our lab we're using LLVM to optimize and compile code to be run on a
CGRA processor, capable of executing parts of an application (mostly
loops) very efficiently. Since we are talking about a VLIW processor,
this code is generally being processed quite different than code for an
OoO-processor would be (e.g. modulo scheduled). This makes that
otherwise performance-enhancing optimizations can wreck our schedule.

We try to cope with this by selectively applying transformations to
loops. Initially we annotated loops with a pragma, and modified passes
to honour this pragma, but this proved to be cumbersome. It also didn't
work very well with function passes, which we then disabled altogether.

Recently we've been looking into outlining the relevant loop bodies to a
new function, and selecting transformations for this function. This
would only require modifying the pass manager, and would enable us to
specialize function passes as well.

Does this seem like a good way to specialize transformations for loop
bodies, or are there better ways to accomplish this? Some of the issues
I can already think of:
* overhead caused by argument passing -- can be fixed by inlining the
function again before register allocation?
* some optimizations (e.g. licm) won't be possible any more
* merging/rearranging loops won't be possibly (I'm thinking of Polly here)

Thanks,

Hi Tim,

we at Saarland University are working on something similar to what you are describing. In principle, we enhance Clang by an attribute that allows to specify what transformation phases should be run on the annotated construct (currently functions, compound statements, or loops) and in what order.
Will you be at the LLVM Euro Conference? We will have a lightning talk and poster on the topic there.

Cheers,
Ralf

Hi Ralf,

we at Saarland University are working on something similar to what you
are describing. In principle, we enhance Clang by an attribute that
allows to specify what transformation phases should be run on the
annotated construct (currently functions, compound statements, or loops)
and in what order.

That definitely sounds interesting. Do you add these attributes to the
bytecode, or how does opt decide which transformations to apply? This
because our approach, attaching metadata to loop latches, is quite
fragile; loop latches can easily get destroyed or transformed during
optimization.

Will you be at the LLVM Euro Conference? We will have a lightning talk
and poster on the topic there.

Sadly no, I missed the call for participation. Are the talks going to be
videotaped, or will proceedings be published? Or could you, if possible,
send over a working paper or something similar?

Thanks,

Hi Tim,

we at Saarland University are working on something similar to what you
are describing. In principle, we enhance Clang by an attribute that
allows to specify what transformation phases should be run on the
annotated construct (currently functions, compound statements, or loops)
and in what order.

That definitely sounds interesting. Do you add these attributes to the
bytecode, or how does opt decide which transformations to apply? This
because our approach, attaching metadata to loop latches, is quite
fragile; loop latches can easily get destroyed or transformed during
optimization.

We do this in the front end already:

void test_noise(int x, int* in, int* out)
{
     __attribute__((noise("inline(fn) licm wfv-vectorize(4) unroll(4)")))
     for (int i=0; i<16; ++i)
     {
       out[i] = in[i] + fn(x);
     }
}

This results in IR similar to this code:

void test_noise_result(int x, int* in, int* out)
{
   // Code of "fn" (assuming it was loop invariant).
   int F = ...

   out[0-3] = <in[0],in[1],in[2],in[3]> + <F,F,F,F>;
   out[4-7] = <in[4],in[5],in[6],in[7]> + <F,F,F,F>;
   out[8-11] = <in[8],in[9],in[10],in[11]> + <F,F,F,F>;
   out[12-15] = <in[12],in[13],in[14],in[15]> + <F,F,F,F>;
}

Will you be at the LLVM Euro Conference? We will have a lightning talk
and poster on the topic there.

Sadly no, I missed the call for participation. Are the talks going to be
videotaped, or will proceedings be published? Or could you, if possible,
send over a working paper or something similar?

I don't think that the lightning talks will be videotaped since they are only 5 minutes long - but I may be wrong. There are also no proceedings, and we don't have anything ready except some examples.
However, if you are interested, we could talk about giving you access to an alpha version - it will be open source eventually, anyway.

Cheers,
Ralf

Hi Ralf,

I don't think that the lightning talks will be videotaped since they are only 5
minutes long - but I may be wrong. There are also no proceedings, and we don't
have anything ready except some examples.

the lightning talks will be videotaped. We would also like to put any slides on
the conference web-page (as well as integrating them into the video). If people
have an electronic version of their posters then we would like to put it on the
web-page too.

Ciao, Duncan.

Hi Duncan,

wow, that's a huge effort, great!

Cheers,
Ralf