LLVM Matrix Multiplication Loop Vectorizer

Hello,
i am trying to vectorize a simple matrix multiplication in llvm;
here is my code;

#include <stdio.h>
#define N 1000

// This function multiplies A and B, and stores
// the result in C
void multiply(int A[N], int B[N], int C[N])
{
int i, j, k;
for (i = 0; i < N; i++)
{
for (j = 0; j < N; j++)
{
C[i][j] = 0;
for (k = 0; k < N; k++)
C[i][j] += A[i][k]*B[k][j];
}
}
}

here are the commands;

clang -S -emit-llvm mat.c -march=knl -O3 -mllvm -disable-llvm-optzns -o mat.ll

opt -S -O3 mat.ll -o mat_o3.ll

llc -x86-asm-syntax=intel mat_o3.ll -o mat_intel.s

with this command i got the below error

opt -S -O3 -force-vector-width=16 mat.ll -o mat_o3.ll

remark: :0:0: loop not vectorized: value that could not be identified as reduction is used outside the loop

it is unable to vectorize the matrix multiplication and in .ll and .s files i see the scalar instructions.

Why is that so? What is my mistake?? Kindly correct me.

Looking forward to your reply

Thank You

Hello,

The message basically means that llvm failed to recognize C[i][j] as valid reduction. In order to make C[i][j] valid reduction is to be privatized into some scalar value for the innermost loop. In you case aliasing analysis fails to prove that C[i][j] never aliases with A and B and this seems correct. So you need something like this to make loop vectorizable:

be explicit:

#include <stdio.h>
#define N 1000

// This function multiplies A and B, and stores
// the result in C
void multiply(int A[N], int B[N], int C[N])
{
int i, j, k;
for (i = 0; i < N; i++)
{
for (j = 0; j < N; j++)
{
int res = 0;
for (k = 0; k < N; k++)
res += A[i][k]*B[k][j];

C[i][j] = res;

}
}
}

or just add restrict to arguments:

// This function multiplies A and B, and stores
// the result in C
void multiply(int A[restrict][N], int B[restrict][N], int C[restrict][N])
{
int i, j, k;
for (i = 0; i < N; i++)
{
for (j = 0; j < N; j++)
{
for (k = 0; k < N; k++)
C[i][j] += A[i][k]*B[k][j];

}

}
}

On the practical side of things though the following loops reordering should provide much better performance when vectorizsed because in your case you have gather operation (strided load) from B + costly reduce operation in j-loop.

#include <stdio.h>
#define N 1000

// This function multiplies A and B, and stores
// the result in C
void multiply(int A[N], int B[N], int C[N])
{
int i, j, k;
for (i = 0; i < N; i++)
{

for (j = 0; j < N; j++)
C[i][j] = 0;

for (k = 0; k < N; k++) {
for (j = 0; j < N; j++)
{
C[i][j] += A[i][k]*B[k][j];

}
}

}
}

Hello,
i am trying to vectorize a simple matrix multiplication in llvm;
here is my code;

#include <stdio.h>
#define N 1000

// This function multiplies A and B, and stores
// the result in C
void multiply(int A[N], int B[N], int C[N])
{
    int i, j, k;
    for (i = 0; i < N; i++)
    {
        for (j = 0; j < N; j++)
        {
            C[i][j] = 0;
            for (k = 0; k < N; k++)
                C[i][j] += A[i][k]*B[k][j];
        }
    }
}

here are the commands;

clang -S -emit-llvm mat.c -march=knl -O3 -mllvm -disable-llvm-optzns -o
mat.ll

opt -S -O3 mat.ll -o mat_o3.ll

llc -x86-asm-syntax=intel mat_o3.ll -o mat_intel.s

with this command i got the below error
opt -S -O3 -force-vector-width=16 mat.ll -o mat_o3.ll

remark: <unknown>:0:0: loop not vectorized: value that could not be
identified as reduction is used outside the loop

it is unable to vectorize the matrix multiplication and in .ll and .s
files
i see the scalar instructions.

Why is that so? What is my mistake?? Kindly correct me.

You might also try Polly. We detect and optimize this code into very
high-performance code.

Best,
Tobias

I’d advised Hameeza to file a bug report for this. We should be able to vectorize this without the restrict by emitting runtime checks. -Hal

Thank You