I have the following C++ code that evaluates a Chebyshev polynomial using Clenshaw’s algorithm
void cheby_eval(double coeffs,int n,double xs,double ys,int m)
{
#pragma omp simd
for (int i=0;i<m;i++){
double x = xs[i];
double u0=0,u1=0,u2=0;
for (int k=n;k>=0;k–){
u2 = u1;
u1 = u0;
u0 = 2xu1-u2+coeffs[k];
}
ys[i] = 0.5(coeffs[0]+u0-u2);
}
}
I’m hoping for an autovectorization of the outer loop so that the inner loop operates on vectors.
When compiled with
clang++ -O3 -march=haswell -Rpass-analysis=loop-vectorize -S chebyshev.cc
using clang++ 3.8.1-23, no vectorization happens and I get the message
chebyshev.cc:19:18: remark: loop not vectorized: cannot identify array bounds
[-Rpass-analysis=loop-vectorize]
ys[i] = 0.5*(coeffs[0]+u0-u2);
^
chebyshev.cc:21:1: remark: loop not vectorized: value that could not be
identified as reduction is used outside the loop
[-Rpass-analysis=loop-vectorize]
On the same code icc vectorizes the outer loop as expected.
I was wondering if there are small ways in which I can change my code to help LLVM’s autovectorizer to succeed. I would also appreciate any pointers to documentation or LLVM source that can help me better understand how autovectorization of outer loops works.
Regards,
Jyotirmoy Bhattacharya
PS. The interesting part of icc’s assembler output is
…B1.4: # Preds …B1.8 …B1.3
xorl %r15d, %r15d #14.5
xorl %ebx, %ebx #14.21
testq %rsi, %rsi #14.21
vmovupd (%rdx,%r9,8), %ymm3 #12.16
vxorpd %ymm5, %ymm5, %ymm5 #13.14
vmovdqa %ymm1, %ymm4 #13.19
vmovdqa %ymm1, %ymm2 #13.24
jl …B1.8 # Prob 2% #14.21
…B1.5: # Preds …B1.4
vaddpd %ymm3, %ymm3, %ymm3 #17.14
…B1.6: # Preds …B1.6 …B1.5
vmovapd %ymm4, %ymm2 #20.3
incq %r15 #14.5
vmovapd %ymm5, %ymm4 #20.3
vfmsub213pd %ymm2, %ymm3, %ymm5 #17.19
vbroadcastsd (%r11,%rbx,8), %ymm6 #17.22
decq %rbx
vaddpd %ymm5, %ymm6, %ymm5 #17.22
cmpq %r10, %r15 #14.5
jb …B1.6 # Prob 82% #14.5
…B1.8: # Preds …B1.6 …B1.4
vbroadcastsd (%rdi), %ymm3 #19.18
vaddpd %ymm3, %ymm5, %ymm4 #19.28
vsubpd %ymm2, %ymm4, %ymm2 #19.31
vmulpd %ymm2, %ymm0, %ymm5 #19.31
vmovupd %ymm5, (%rcx,%r9,8) #19.5
addq $4, %r9 #11.3
cmpq %r8, %r9 #11.3
jb …B1.4 # Prob 82% #11