Loop vectorization with the loop containing bitcast

Hi ,

The following loop fails to be vectorized since the load c[i] is casted as i64 and the store c[i] is double. The loop access analysis gives up since they are in different types.

Since these two memory operations are in the same size, I believe the loop access analysis should return forward dependence and thus the loop can be vectorized.

Any comments?

Thanks,

Jin

#define N 1000

double a[N], b[N],c[N];

void foo() {

for (int i=0;i<N;i++) {

b[i] =c[i];

c[i]=0.0;

}

}

for.body: ; preds = %for.body, %entry

%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]

%arrayidx = getelementptr inbounds [1000 x double], [1000 x double]* @c, i64 0, i64 %indvars.iv

%0 = bitcast double* %arrayidx to i64*

%1 = load i64, i64* %0, align 8, !tbaa !1

%arrayidx2 = getelementptr inbounds [1000 x double], [1000 x double]* @b, i64 0, i64 %indvars.iv

%2 = bitcast double* %arrayidx2 to i64*

store i64 %1, i64* %2, align 8, !tbaa !1

store double 0.000000e+00, double* %arrayidx, align 8, !tbaa !1

%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1

%exitcond = icmp eq i64 %indvars.iv.next, 1000

br i1 %exitcond, label %for.cond.cleanup, label %for.body

LAA: Found a loop in foo: loop.17

LAA: Processing memory accesses…

AST: Alias Set Tracker: 2 alias sets for 3 pointer values.

AliasSet[0x9508b80, 1] must alias, No access Pointers: (<4 x i64>* %1, 18446744073709551615)

AliasSet[0x95f8a70, 2] must alias, No access Pointers: (<4 x double>* %2, 18446744073709551615), (<4 x i64>* %0, 18446744073709551615)

LAA: Accesses(3):

%1 = bitcast double* %arrayIdx11 to <4 x i64>* (write)

%2 = bitcast double* %arrayIdx to <4 x double>* (write)

%0 = bitcast double* %arrayIdx to <4 x i64>* (read-only)

Underlying objects for pointer %1 = bitcast double* %arrayIdx11 to <4 x i64>*

@b = common local_unnamed_addr global [1000 x double] zeroinitializer, align 16

Underlying objects for pointer %2 = bitcast double* %arrayIdx to <4 x double>*

@c = common local_unnamed_addr global [1000 x double] zeroinitializer, align 16

Underlying objects for pointer %0 = bitcast double* %arrayIdx to <4 x i64>*

@c = common local_unnamed_addr global [1000 x double] zeroinitializer, align 16

LAA: Found a runtime check ptr: %1 = bitcast double* %arrayIdx11 to <4 x i64>*

LAA: Found a runtime check ptr: %2 = bitcast double* %arrayIdx to <4 x double>*

LAA: Found a runtime check ptr: %0 = bitcast double* %arrayIdx to <4 x i64>*

LAA: We need to do 0 pointer comparisons.

LAA: We can perform a memory runtime check if needed.

LAA: Checking memory dependencies

LAA: Src Scev: {@c,+,32}<%loop.17>Sink Scev: {@c,+,32}<%loop.17>(Induction step: 1)

LAA: Distance for %gepload = load <4 x i64>, <4 x i64>* %0, align 16, !tbaa !1 to store <4 x double> zeroinitializer, <4 x double>* %2, align 16, !tbaa !1: 0

LAA: Zero dependence difference but different types

Total Dependences: 1

LAA: unsafe dependent memory operations in loop

Hi Jin,

I agree, this looks wrong. The bitcasts are fallout from r226781 - and we should be able to look through them if the size is the same.Can you please file a PR?

Thanks,
Michael

Hi Michael,

Many thanks for your quick response. The PR 29021 has been filed to address this issue.

Jin

Oh, sorry, I didn’t realize you already have a patch for this.

Can you submit it for review using the regular process? See http://llvm.org/docs/Phabricator.html for details.
Note that you’ll need a test included in the patch.

Thanks,
Michael