Consider this piece of prefix sum function written in C:
void prefix_sum(int *a, int n) {
for (int i = 1; i < n; i++)
a[i] += a[i - 1];
}
The function is transformed into the following LLVM IR before GVN pass:
define dso_local void @prefix_sum(ptr nocapture noundef %a, i32 noundef %n) local_unnamed_addr {
entry:
%cmp7 = icmp sgt i32 %n, 1
br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup
for.body.preheader: ; preds = %entry
%wide.trip.count = zext i32 %n to i64
br label %for.body
for.cond.cleanup.loopexit: ; preds = %for.body
br label %for.cond.cleanup
for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry
ret void
for.body: ; preds = %for.body.preheader, %for.body
%indvars.iv = phi i64 [ 1, %for.body.preheader ], [ %indvars.iv.next, %for.body ]
%0 = getelementptr i32, ptr %a, i64 %indvars.iv
%arrayidx = getelementptr i8, ptr %0, i64 -4
%1 = load i32, ptr %arrayidx, align 4
%2 = load i32, ptr %0, align 4
%add = add nsw i32 %2, %1
store i32 %add, ptr %0, align 4
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond = icmp ne i64 %indvars.iv.next, %wide.trip.count
br i1 %exitcond, label %for.body, label %for.cond.cleanup.loopexit
}
I found that the GVN pass does not perform PRE for load in %1, since the PHI transform for %0 fails. The load does get optimized into phi in LoopLoadElimination pass, but this got me curious as one of the tests (see test5 in llvm/test/Transforms/GVN/PRE/rle-phi-translate.ll) for GVN/PRE uses similar pattern, and ensures that GVN eliminates load for (i - 1)-th index of the array. Is this an intended behavior for the pass, or a missed optimization opportunity?
