Hello,
We are currently evaluating LLVM as a compiler technology against Open64 and have a few questions regarding its currently status with respect to optimizations in particular.
Control flow is often expensive on the architectures we are considering and as such loop optimizations, e.g. unroll and jam, can play an important role. For example, consider the following program
int a[4][4];
int b[4][4];
int c[4][4];
int main(void)
{
int i, j, k;
for(i = 0; i < 4; i++)
{
for(j = 0; j < 4; j++)
{
c[i][j] = 0;
for (k = 0; k < 4; k++)
c[i][j] = a[i][k] * b[k][j] + c[i][j];
}
}
return c[2][3];
}
If I compile this with std-compile-opts the inner loops are completely unrolled and thus unroll-and-jam is not important here but I’m assuming this is down to the fact that the loop bounds are known and small. Assuming this was not the case what I would like to be able to do is unroll the loop iterating over “j” some number of times and then fuse the resulting instances of the inner loop, e.g:
for(i=O; i<4; i++)
for(j=O; j<4; j+=2)
{
c[i][j] = 0;
c [i][j+1]= 0;
for(k=O; k<4; k++)
{
c[i] [j] = a[i][k]*b[k] [j]+c[i][j];
c[i][j+1]= a[i][k]*b[k] [j+1]+c[i] [j+1];
}
}
Of course this is the well known unroll-and-jam optimization, described by Car and Kennedy*, and I was wondering if there are plans to add this and other additional loop optimizations to LLVM in the near future?
Thanks for any information you can provide around this.
Ben
*S. Carr and K. Kennedy, “Improving the ratio of memory operations to floating-point operations in loops,” ACM Transactions on Programming Languages and Systems, vol. 16, no. 6, pp. 1768-1810, 1994.