Autovectorization questions

irishrover · March 12, 2014, 10:34am

Hi,

I’m reading “http://llvm.org/docs/Vectorizers.html” and have few question. Hope someone has answers on it.

The Loop Vectorizer can vectorize code that becomes a sequence of scalar instructions that scatter/gathers memory. (http://llvm.org/docs/Vectorizers.html#scatter-gather)

int foo(int *A, int *B, int n, int k) {
for (int i = 0; i < n; ++i)
A[i*7] += B[i*k];
}

I replaced “int *A”/“int *B” into “double *A”/“double *B” and then compiled the sample with

$> ./clang -Ofast -ffast-math test.c -std=c99 -march=core-avx2 -S -o bb.S -fslp-vectorize-aggressive

and loop body looks like:

.LBB1_2: # %for.body

=>This Inner Loop Header: Depth=1

cltq
vmovsd (%rsi,%rax,8), %xmm0
movq %r9, %r10
sarq $32, %r10
vaddsd (%rdi,%r10,8), %xmm0, %xmm0
vmovsd %xmm0, (%rdi,%r10,8)
addq %r8, %r9
addl %ecx, %eax
decl %edx
jne .LBB1_2

so vector instructions for scalars (vaddsd, vmovsd) were used in the loop and no real gather/scatter emitted.

The question is why this loop was not vectorized? Typo in docs?

Nadav_Rotem1 · March 12, 2014, 8:54pm

Hi Zinovy,

The loop vectorizer probably decided that it was not profitable to vectorize the function. You can force the vectorization of the function by setting a low threshold.

Thanks,
Nadav

Arnold · March 12, 2014, 10:50pm

In order to vectorize code like this LLVM needs to prove that “A[i*7]” does not wrap in the address space. It fails to do so and so LLVM doesn’t vectorize this loop even if we try to force it.

The following loop will be vectorized if we force it:

int foo(int * A, int * B, int n, int k) {
for (int i = 0; i < 1024; ++i)
A[i] += B[i*k];
}

So will this loop:

int foo(int * restrict A, int * restrict B, int n, int k) {
for (int i = 0; i < n; ++i)
A[i] += B[i*k];
}

I will update the example.

Thanks,
Arnold

Chandler_Carruth · March 12, 2014, 11:05pm

But, why?

I'm moderately sure that neither C nor C++ allow wrapping around the end of
the address space. If they do, we will fix C++ at least to disallow this.
'i' is a signed integer, so we can't wrap in the index space either. So why
can't LLVM prove this?

Arnold · March 12, 2014, 11:45pm

The loop vectorizer relies on scev’s nowrap flags. We need to improve SCEV for this.

  %conv = sext i32 %k to i64
  --> (sext i32 %k to i64)
  %i.06 = phi i64 [ 0, %entry ], [ %inc, %for.body ]
  --> {0,+,1}<nuw><nsw><%for.body> Exits: 1023
  %mul1 = mul nsw i64 %i.06, 7
  --> {0,+,7}<%for.body> Exits: 7161
  %arrayidx2 = getelementptr inbounds i32* %A, i64 %mul1
  --> {%A,+,28}<%for.body> <== we want to see a nw flag here.

Scev sometimes drops new flags for safety (cannonicalization can make them invalid if the same expression is used in different contexts) . See past discussions on this.

We are thinking about doing something like described here: http://permalink.gmane.org/gmane.comp.compilers.llvm.devel/67476 or in this thread:(http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20131007/190703.html.

Chandler_Carruth · March 12, 2014, 11:48pm

Sure, but I think its really important to clarify that the *example* is
fine, and there is nothing fundamental about it that prevents vectorization.

We simply need to fix SCEV.

Arnold · March 12, 2014, 11:59pm

The loop vectorizer relies on scev’s nowrap flags. We need to improve SCEV for this.

  %conv = sext i32 %k to i64
  --> (sext i32 %k to i64)
  %i.06 = phi i64 [ 0, %entry ], [ %inc, %for.body ]
  --> {0,+,1}<nuw><nsw><%for.body> Exits: 1023
  %mul1 = mul nsw i64 %i.06, 7
  --> {0,+,7}<%for.body> Exits: 7161
  %arrayidx2 = getelementptr inbounds i32* %A, i64 %mul1
  --> {%A,+,28}<%for.body> <== we want to see a nw flag here.

Scev sometimes drops new flags for safety (cannonicalization can make them invalid if the same expression is used in different contexts) . See past discussions on this.

Sure, but I think its really important to clarify that the *example* is fine, and there is nothing fundamental about it that prevents vectorization.

Sure. I thought this was clear in my answer (obviously not :). Rereading, I should probably have added that the code is vectorizable (I assumed the Zinovy knows this).

Arnold · March 13, 2014, 12:01am

Zinovy,

to clarify: the code is vectorizable. But LLVM currently fails to prove it is.

Raul_Silvera · March 13, 2014, 12:26am

Even without wrapping around the end of the address space, without restrict you still have to worry about A and B overlapping on interesting ways. Will LLVM do some runtime dependence checks to discount such potential overlap? Just curious…

Arnold · March 13, 2014, 12:40am

Yes, llvm will insert runtime checks.

Topic		Replies	Views
LoopVectorize module - some possible enhancements LLVM Dev List Archives	2	85	August 24, 2016
Question about the loop vectorizer LLVM Dev List Archives	1	95	January 14, 2013
Loop Vectorizer Update LLVM Dev List Archives	0	73	October 22, 2012
[Vectorization] Mis match in code generated LLVM Dev List Archives	9	83	November 11, 2014
GSoC 2009: Auto-vectorization LLVM Dev List Archives	12	63	April 1, 2009

Autovectorization questions

=>This Inner Loop Header: Depth=1

Related Topics