Is llvm capable of doing loop interchange optimization?

Hello,

I've been playing around with the really simple example of not cache
friendly loop like this:

#define N 100

void foo(int** __restrict__ a,
         int** __restrict__ b)
{
    for (int i = 0; i < N; ++i)
        for (int j = 0; j < N; ++j)
            a[j][i] += b[j][i];
}

link to compiler explorer:
Compiler Explorer’)),version:4

I was thinking that modern compilers are optimizing such cases very
aggressively. I quickly checked gcc, there is special option
-floop-interchange, but it requires properly configured gcc compiler
with Graphite infrastructure enabled. I didn't have that at hand, so I
just stopped right there.

Loop has no side effects and arrays do not alias, so from my point of
view it's safe to do interchange here. I tried different march options
as well with no success.
I'm interested if llvm is capable of doing loop interchange in
general, at least the most simple cases like the mentioned one?

Hi Denis,

Hello,

I've been playing around with the really simple example of not cache
friendly loop like this:

#define N 100

void foo(int** __restrict__ a,
          int** __restrict__ b)
{
     for (int i = 0; i < N; ++i)
         for (int j = 0; j < N; ++j)
             a[j][i] += b[j][i];
}

link to compiler explorer:
Compiler Explorer’)),version:4

I was thinking that modern compilers are optimizing such cases very
aggressively. I quickly checked gcc, there is special option
-floop-interchange, but it requires properly configured gcc compiler
with Graphite infrastructure enabled. I didn't have that at hand, so I
just stopped right there.

Loop has no side effects and arrays do not alias, so from my point of
view it's safe to do interchange here. I tried different march options
as well with no success.
I'm interested if llvm is capable of doing loop interchange in
general, at least the most simple cases like the mentioned one?

There is a (relatively basic) loop interchange pass in LLVM. It is not enabled by default however, but you can enable it by passing `-mllvm -enable-loopinterchange` to Clang.

LoopInterchange in Clang does not trigger in your example, because a[j] and b[i] may alias. I am no expert on restrict, but it seems both GCC and Clang cannot prove that the access do not alias with the annotations you provided (also `int *restrict *restrict` does not do the trick).

In the code below, accesses to aa and bb cannot alias, and both GCC (with -O3) and Clang (recent master build, with `-O3 -mllvm -enable-loopinterchange`) interchange the 2 loops.

int aa[N][N];
int bb[N][N];

void foo1() {
     for (int i = 0; i < N; ++i)
         for (int j = 0; j < N; ++j)
             aa[j][i] += bb[j][i];
}

Cheers,
Florian

Hi Florian,
Thank you very much for answering the question.
Looks like for now gcc does a better job with the example you provided: