Help: Question on Epilog Vectorization

Hi,

I wrote a small test case and tried to force epilog vectorization for the loop.

void foo(double * restrict a, double * restrict b, int N) {
for(int i = 0; i < N; ++i)
a[i] = sin(i);
}

clang -O3 -mavx2 -fveclib=libmvec sin.c -mllvm -epilogue-vectorization-minimum-VF=4 -S -emit-llvm -fno-unroll-loops

But I ended up with epilog vectorization failing at this check.
In the function “isCandidateForEpilogueVectorization”, I find the below check.

– Snip llvm/lib/Transforms/Vectorize/LoopVectorize.cpp –

// Induction variables that are widened require special handling that is
// currently not supported.
if (any_of(Legal->getInductionVars(), [&](auto &Entry) {
return !(this->isScalarAfterVectorization(Entry.first, VF) ||
this->isProfitableToScalarize(Entry.first, VF));
– Snip –

I understand that when induction variables are widened as per the VPLAN , we don’t support such loops
for epilog vectorization at the moment.

But can someone please explain the “special handling” we need to do here?

If I remove the check from the source, the epilog vectorization is happening, but the generated LLVM IR seems to be wrong.

—Snip–

12: ; preds = %12, %10
%13 = phi i64 [ 0, %10 ], [ %19, %12 ]
%14 = phi <4 x i32> [ <i32 0, i32 1, i32 2, i32 3>, %10 ], [ %20, %12 ]
%15 = sitofp <4 x i32> %14 to <4 x double>
%16 = call <4 x double> @_ZGVdN4v_sin(<4 x double> %15)
%17 = getelementptr inbounds double, double* %0, i64 %13
%18 = bitcast double* %17 to <4 x double>*
store <4 x double> %16, <4 x double>* %18, align 8, !tbaa !3
%19 = add nuw i64 %13, 4
%20 = add <4 x i32> %14, <i32 4, i32 4, i32 4, i32 4>
%21 = icmp eq i64 %19, %11
br i1 %21, label %22, label %12, !llvm.loop !7

22: ; preds = %12
%23 = icmp eq i64 %11, %6
br i1 %23, label %44, label %24

24: ; preds = %22
%25 = and i64 %6, 2
%26 = icmp eq i64 %25, 0
br i1 %26, label %42, label %27

27: ; preds = %8, %24
%28 = phi i64 [ %11, %24 ], [ 0, %8 ]
%29 = and i64 %6, 4294967294
br label %30

30: ; preds = %30, %27
%31 = phi i64 [ %28, %27 ], [ %37, %30 ]
%32 = phi <2 x i32> [ <i32 0, i32 1>, %27 ], [ %38, %30 ] <== Resume value seem to be wrong.
%33 = sitofp <2 x i32> %32 to <2 x double>
%34 = call <2 x double> @_ZGVbN2v_sin(<2 x double> %33)
%35 = getelementptr inbounds double, double* %0, i64 %31
%36 = bitcast double* %35 to <2 x double>*
store <2 x double> %34, <2 x double>* %36, align 8, !tbaa !3
%37 = add nuw i64 %31, 2
%38 = add <2 x i32> %32, <i32 2, i32 2>
%39 = icmp eq i64 %37, %29
br i1 %39, label %40, label %30, !llvm.loop !11
— Snip–

I see the resume value for the widened phi node in the epilog loop is not updated correctly.
Are there any other issues here apart from handling the widened induction variable’s resume value ?

Regards,
Venkat.

The resume value for the widened induction is the only problem I’m aware of.

The issue is that normally scalar induction resume values are created/updated as part of skeleton creation. However for widened inductions in the epilogue loop, we have corresponding recipes in the vplan that haven’t been executed at the time of skeleton creation. We either have to find the related phis after the fact and fix them up, or change the vplan to update the incoming values of the widened IVs before executing on it. Florian demonstrate the latter idea in https://reviews.llvm.org/D92132, so maybe he has more details to share.

Bardia Mahjour
Compiler Optimizations
IBM Toronto Software Lab

graycol.gif“Venkataramanan Kumar” —2021/10/06 12:33:00 PM—Hi, I wrote a small test case and tried to force epilog vectorization for the

All required patches for https://reviews.llvm.org/D92132 landed a while ago. I just updated it and it should be ready for review now.

Cheers
Florian