llvm 10: Why is float experimental_vector_reduce_fmin not tried?

LLVM vectorizes this same function for floating point addition just fine (uses experimental_vector_reduce_v2_fadd), but refuses to do the same for minf(). Does anyone have any insight why that would be? I’m using -ffast-math but that doesn’t seem to help.

From grep’ing the sources the best I can figure is that some logic exists for Instruction::FCmp but perhaps not for Intrinsic:: minnum. Is that the case?

; Function Attrs: norecurse nounwind readonly

define float @f(float addrspace(4)* noalias nocapture readonly %a, float addrspace(4)* noalias nocapture readonly %b, float %m) local_unnamed_addr #0 {

entry:

br label %for.body

for.cond.cleanup: ; preds = %for.body

ret float %3

for.body: ; preds = %entry, %for.body

%m.addr.024 = phi float [ %m, %entry ], [ %3, %for.body ] ; [#uses=1 type=float]

%i.023 = phi i32 [ 0, %entry ], [ %inc, %for.body ] ; [#uses=3 type=i32]

%arrayidx = getelementptr inbounds float, float addrspace(4)* %a, i32 %i.023 ; [#uses=1 type=float addrspace(4)*]

%0 = load float, float addrspace(4)* %arrayidx, align 4, !tbaa !3 ; [#uses=1 type=float]

%arrayidx1 = getelementptr inbounds float, float addrspace(4)* %b, i32 %i.023 ; [#uses=1 type=float addrspace(4)*]

%1 = load float, float addrspace(4)* %arrayidx1, align 4, !tbaa !3 ; [#uses=1 type=float]

%2 = tail call fast float @llvm.minnum.f32(float %0, float %1) ; [#uses=1 type=float]

%3 = tail call fast float @llvm.minnum.f32(float %m.addr.024, float %2) ; [#uses=2 type=float]

%inc = add nuw nsw i32 %i.023, 1 ; [#uses=2 type=i32]

%cmp = icmp ult i32 %inc, 8192 ; [#uses=1 type=i1]

br i1 %cmp, label %for.body, label %for.cond.cleanup, !llvm.loop !7

}

LV: Checking a loop in “f” from /path/to/x.c

LV: Loop hints: force=enabled width=0 unroll=0 optspace=0

LV: Found a loop: for.body

LV: Not vectorizing: Found an unidentified PHI %m.addr.024 = phi float [ %m, %entry ], [ %3, %for.body ] ; [#uses=1 type=float]

LV: Interleaving disabled by the pass manager

LV: Can’t vectorize the instructions or CFG

LV: Not vectorizing: Cannot prove legality.

I agree with your guess: the loop vectorizer doesn’t know how to match the ‘minnum’ intrinsics into a reduction yet. The SLP vectorizer is missing that functionality too. We need to update/consolidate both to recognize the FP min/max intrinsics as well as the recently added integer min/max intrinsics ( http://llvm.org/docs/LangRef.html#llvm-smax-intrinsic ).

cc’ing Craig to see if anything has happened since:
https://reviews.llvm.org/rGc195ae2

I just changed the x86 cost model to remove what could have been another roadblock:
https://reviews.llvm.org/rG136f98e52365
We may need to extend that kind of cost model fix-up to other targets.