LLVM and loop optimizations

Hello,

We are currently evaluating LLVM as a compiler technology against Open64 and have a few questions regarding its currently status with respect to optimizations in particular.

Control flow is often expensive on the architectures we are considering and as such loop optimizations, e.g. unroll and jam, can play an important role. For example, consider the following program

int a[4][4];

int b[4][4];

int c[4][4];

int main(void)

{

int i, j, k;

for(i = 0; i < 4; i++)

{

for(j = 0; j < 4; j++)

{

c[i][j] = 0;

for (k = 0; k < 4; k++)

c[i][j] = a[i][k] * b[k][j] + c[i][j];

}

}

return c[2][3];

}

If I compile this with std-compile-opts the inner loops are completely unrolled and thus unroll-and-jam is not important here but I’m assuming this is down to the fact that the loop bounds are known and small. Assuming this was not the case what I would like to be able to do is unroll the loop iterating over “j” some number of times and then fuse the resulting instances of the inner loop, e.g:

for(i=O; i<4; i++)

for(j=O; j<4; j+=2)

{

c[i][j] = 0;

c [i][j+1]= 0;

for(k=O; k<4; k++)

{

c[i] [j] = a[i][k]*b[k] [j]+c[i][j];

c[i][j+1]= a[i][k]*b[k] [j+1]+c[i] [j+1];

}

}

Of course this is the well known unroll-and-jam optimization, described by Car and Kennedy*, and I was wondering if there are plans to add this and other additional loop optimizations to LLVM in the near future?

Thanks for any information you can provide around this.

Ben

*S. Carr and K. Kennedy, “Improving the ratio of memory operations to floating-point operations in loops,” ACM Transactions on Programming Languages and Systems, vol. 16, no. 6, pp. 1768-1810, 1994.

image001.gif

Benedict,

LLVM has a reasonably nice loop optimization framework and support code like loop extraction and SCEV, but relatively few existing loop optimizations. I think it would be relatively easy for you to add the ones you want. In addition, we have just begun developing a new parallelizing compiler infrastructure based on LLVM at Illinois and I expect more loop optimizations for both parallelism and locality to be added in the near future.

–Vikram

http://llvm.org/~vadve

Hello Dr. Adve,

Can you please tell us more about the new parallelizing compiler
infrastructure ? This sounds very interesting and I would like to
contribute to such an effort.

Thank you,
-Nadav Rotem

Hi Benedict,

As others have said, LLVM does not have much in the way of high-level dependence-analysis based loop transformations. There are several groups that are interested in this, so I expect this to start improving over the next months.

Pragmatically speaking, the right choice for you probably depends on your goals. If you need to get something running as soon as possible and need these high level loop xforms, Open64 is probably the way to go. If you are interested in the long term and interested in contributing to make this happen, I strongly believe LLVM is a better choice than Open64. We would obviously welcome any input and contributions from AMD of course.

-Chris