Reassociation is blocking a vectorization

Hi Devs,

I am looking at the bug
https://bugs.llvm.org/show_bug.cgi?id=43953

and found that following piece of ir

%arrayidx = getelementptr inbounds float, float* %Vec0, i64 %idxprom
%0 = load float, float* %arrayidx, align 4, !tbaa !2
%arrayidx2 = getelementptr inbounds float, float* %Vec1, i64 %idxprom
%1 = load float, float* %arrayidx2, align 4, !tbaa !2
%sub = fsub fast float %0, %1
%add = fadd fast float %sum.0, %sub

is transformed into

%arrayidx = getelementptr inbounds float, float* %Vec0, i64 %idxprom
%0 = load float, float* %arrayidx, align 4, !tbaa !2
%arrayidx2 = getelementptr inbounds float, float* %Vec1, i64 %idxprom
%1 = load float, float* %arrayidx2, align 4, !tbaa !2
%.neg = fsub fast float -0.000000e+00, %1
%sub = fadd fast float %0, %sum.0
%add = fadd fast float %sub, %.neg

which again by instcombiner transformed into

Please ignore above the mail.

Hi Devs,
I am looking at the bug
https://bugs.llvm.org/show_bug.cgi?id=43953

and found that following piece of ir

%arrayidx = getelementptr inbounds float, float* %Vec0, i64 %idxprom
%0 = load float, float* %arrayidx, align 4, !tbaa !2
%arrayidx2 = getelementptr inbounds float, float* %Vec1, i64 %idxprom
%1 = load float, float* %arrayidx2, align 4, !tbaa !2
%sub = fsub fast float %0, %1
%add = fadd fast float %sum.0, %sub

is transformed into

%arrayidx = getelementptr inbounds float, float* %Vec0, i64 %idxprom
%0 = load float, float* %arrayidx, align 4, !tbaa !2
%arrayidx2 = getelementptr inbounds float, float* %Vec1, i64 %idxprom
%1 = load float, float* %arrayidx2, align 4, !tbaa !2
%.neg = fsub fast float -0.000000e+00, %1
%sub = fadd fast float %0, %sum.0
%add = fadd fast float %sub, %.neg

which again by instcombiner transformed into

%arrayidx = getelementptr inbounds float, float* %Vec0, i64 %idxprom
%0 = load float, float* %arrayidx, align 4, !tbaa !2
%arrayidx2 = getelementptr inbounds float, float* %Vec1, i64 %idxprom
%1 = load float, float* %arrayidx2, align 4, !tbaa !2
%sub = fadd fast float %0, %sum.012
%add = fsub fast float %sub, %1

Now at the time of SLP Vectorization for reported testcase

SLP Vectorization visits instructions in bottom up fashion
so this instruction is first one to be analyzed
%add = fsub fast float %sub, %1

When this piece of code is called for the above instruction by HorizontalReduction::matchAssociativeReduction(SLPVectorizer.cpp)
/// Checks if the reduction operation can be vectorized.
bool isVectorizable(Instruction *I) const {
return isVectorizable() && isAssociative(I);
}
isAssociative returns false for the above instruction ans vectorization fails.
If we disable the reassociation the reported testcase is vectorized.
Like to community thought on this whether reassociation is doing any good here?

./Kamlesh

We can’t answer whether the reassociation pass is good/bad in isolation. Clearly, on this example it’s doing harm…but it’s probably positive overall? You could disable it and run some benchmarks.

But there are more basic questions raised here.

  1. The reassociate pass in combination with instcombine linearizes add/sub integer/FP sequences - forms that have parallelism end up with none. So we will transform as follows:

Name: reassociate minimizes parallelism

%wx = sub i8 %w, %x
%yz = sub i8 %y, %z
%r = add i8 %wx, %yz
=>

%wx = sub i8 %w, %x
%wxy = add i8 %wx, %y
%r = sub i8 %wxy, %z

There’s a proposal to limit that:
https://reviews.llvm.org/D65614

There’s also a proposal for a new pass that reassociates to reduce tree-height (increase parallelism):
https://reviews.llvm.org/D67383

Neither of those has been updated in a while, so we could reverse-ping them to check status.

  1. There’s a potential missed canonicalization within instcombine that might serve as a work-around for the motivating problem:

Name: canonicalize sub before add
%a = add i8 %x, %y
%r = sub i8 %a, %z
=>
%s = sub i8 %y, %z
%r = add i8 %s, %x

https://rise4fun.com/Alive/VAm