loop vectorizer issue

Hello,

I was trying to trace the Loop vectorizer of the LLVM, I wrote a simple loop with a clear dependency.

But found that the debug shows that ‘we can vectorize this loop’

Here you are my loop with dependency:

for(int k=20;k<50;k++)

dataY[k] = dataY[k-1];

And the debug prints:

LV: Checking a loop in “main”

LV: Found a loop: for.body4

LV: Found an induction variable.

LV: Found a write-only loop!

LV: We can vectorize this loop!

LV: Vectorization is possible but not beneficial.

From the LLVM IR, it contains only one ‘store’ instruction with ‘%.pre’. Seems that no ‘load’ instruction prevented the Vectorizer to detect dependency.

Is that a bug, or I’m missing something? Please advice

for.body4: ; preds = %for.body4, %for.cond2.preheader

%k.030 = phi i32 [ 20, %for.cond2.preheader ], [ %inc8, %for.body4 ]

%arrayidx6 = getelementptr inbounds i32* %0, i32 %k.030

store i32 %.pre, i32* %arrayidx6, align 4, !tbaa !0

%inc8 = add nsw i32 %k.030, 1

%exitcond32 = icmp eq i32 %inc8, 50

br i1 %exitcond32, label %for.cond10.preheader, label %for.body4

Thanks in advance,

Sara Elshobaky

Notice that the code you provided, for globals and stack allocations, at least,
is semantically equivalent to:

int a = d[19];
for(int k = 20; k < 50; k++)
dataY[k] = a;

Like so, the load you see missing was redundant, probably hoisted by GVN/PRE
and replaced with “%.pre”.

H.

Actually what I meant in my original loop, that there is a dependency between every two consecutive iterations. So, how the loop vectorizer says ‘we can vectorize this loop’?

for(int k=20;k<50;k++)

dataY[k] = dataY[k-1];

Hi Sarah,

the loop vectorizer runs not on the C code but on LLVM IR this c code was lowered to. Before the loop vectorizer runs many other optimization change the shape of this IR.

You can see in the LLVM IR you referenced below, a preceding LLVM IR transformation has change your loop from:

for(int k=20;k<50;k++)
     dataY[k] = dataY[k-1];

to

  int a = d[19];
  for(int k = 20; k < 50; k++)
    dataY[k] = a;

which is allowed because they are semantically equivalent and beneficial because we safe many loads.

We can vectorize the latter loop. You can see in the debug output there is no load in the loop once the loop vectorizer gets to see it:

And the debug prints:
LV: Checking a loop in "main"
LV: Found a loop: for.body4
LV: Found an induction variable.
LV: Found a write-only loop! // <<<< Write-only.
LV: We can vectorize this loop!
...
LV: Vectorization is possible but not beneficial.

This is not a bug but a great example of how one optimization can enable another.

Best,
Arnold

Actually what I meant in my original loop, that there is a dependency
between every two consecutive iterations. So, how the loop vectorizer
says ‘we can vectorize this loop’?

for(int k=20;k<50;k++)

dataY[k] = dataY[k-1];

The reason that this is equivalent to

> int a = d[19];
> for(int k = 20; k < 50; k++)
> dataY[k] = a;

may not be immediately clear. But if you manually unroll the original loop it may make more sense. So your loop, fully unrolled, would look like

dataY[20] = dataY[19];
dataY[21] = dataY[20];
dataY[22] = dataY[21];
  ...
dataY[49] = dataY[48];

From this you should be able to see that after you have run the 30 iterations, all of the LHS values (in other words, this whole section of the dataY array) will end up with the same value that is in dataY[19]. That is why the compiler can make this optimization.

Does that make sense?

Greg

I got it , thank you