The loop vectorizer probably decided that it was not profitable to vectorize the function. You can force the vectorization of the function by setting a low threshold.
In order to vectorize code like this LLVM needs to prove that “A[i*7]” does not wrap in the address space. It fails to do so and so LLVM doesn’t vectorize this loop even if we try to force it.
The following loop will be vectorized if we force it:
int foo(int * A, int * B, int n, int k) {
for (int i = 0; i < 1024; ++i)
A[i] += B[i*k];
}
So will this loop:
int foo(int * restrict A, int * restrict B, int n, int k) {
for (int i = 0; i < n; ++i)
A[i] += B[i*k];
}
I'm moderately sure that neither C nor C++ allow wrapping around the end of
the address space. If they do, we will fix C++ at least to disallow this.
'i' is a signed integer, so we can't wrap in the index space either. So why
can't LLVM prove this?
The loop vectorizer relies on scev’s nowrap flags. We need to improve SCEV for this.
%conv = sext i32 %k to i64
--> (sext i32 %k to i64)
%i.06 = phi i64 [ 0, %entry ], [ %inc, %for.body ]
--> {0,+,1}<nuw><nsw><%for.body> Exits: 1023
%mul1 = mul nsw i64 %i.06, 7
--> {0,+,7}<%for.body> Exits: 7161
%arrayidx2 = getelementptr inbounds i32* %A, i64 %mul1
--> {%A,+,28}<%for.body> <== we want to see a nw flag here.
Scev sometimes drops new flags for safety (cannonicalization can make them invalid if the same expression is used in different contexts) . See past discussions on this.
The loop vectorizer relies on scev’s nowrap flags. We need to improve SCEV for this.
%conv = sext i32 %k to i64
--> (sext i32 %k to i64)
%i.06 = phi i64 [ 0, %entry ], [ %inc, %for.body ]
--> {0,+,1}<nuw><nsw><%for.body> Exits: 1023
%mul1 = mul nsw i64 %i.06, 7
--> {0,+,7}<%for.body> Exits: 7161
%arrayidx2 = getelementptr inbounds i32* %A, i64 %mul1
--> {%A,+,28}<%for.body> <== we want to see a nw flag here.
Scev sometimes drops new flags for safety (cannonicalization can make them invalid if the same expression is used in different contexts) . See past discussions on this.
Sure, but I think its really important to clarify that the *example* is fine, and there is nothing fundamental about it that prevents vectorization.
Sure. I thought this was clear in my answer (obviously not :). Rereading, I should probably have added that the code is vectorizable (I assumed the Zinovy knows this).
Even without wrapping around the end of the address space, without restrict you still have to worry about A and B overlapping on interesting ways. Will LLVM do some runtime dependence checks to discount such potential overlap? Just curious…