As reported [clang][LoopVectorize] Inccorrect code generation in case of vectorize_predicate(enable) · Issue #76069 · llvm/llvm-project · GitHub, tail folding by masking in the loop vectorizer causes miscompilation.
The problem is, when foldTailByMasking()
is true, trip count is calculated differently.
vector.ph: ; preds = %vector.scevcheck
%n.rnd.up = add i32 %limit, 1
%n.mod.vf = urem i32 %n.rnd.up, 2
%n.vec = sub i32 %n.rnd.up, %n.mod.vf
%n.vec
is used outside the loop body and at the middle block we simply use this value. No fixup.
middle.block: ; preds = %pred.store.continue5
%ind.escape = sub i32 %n.vec, 1
br i1 true, label %for.cond.for.cond.cleanup_crit_edge, label %scalar.ph
It is not clear to me that we can fix this in the middle block. However, at this moment we can bail out during the legality check for fold tail by masking, when induction variables are used outside the loop. Here is the pr, [LV] Disable fold tail by masking - when induction vars used outside by niwinanto · Pull Request #81609 · llvm/llvm-project · GitHub