Hi!
I was playing with MLIR vectorization pass for Affine dialect and faced with one issue.
I have a testing IR, which just computes per-element division between 2 1D memref values.
affine-super-vectorize
pass replaces affine.load
operations with vector.transfer_read
with zero padding and uses non-masked version of division operation. At the end after lowering to LLVM I’ve got maskedload
operations with zeros for out-of-bound elements and non-masked vectorized llvm.sdiv
operation. This lead to runtime “integer divide by zero” exception.
func.func @test(%arg0: memref<2xi32>, %arg1: memref<2xi32>) {
// This constant doesn't contain zero values
%0 = memref.get_global @__constant_2xi32 : memref<2xi32>
affine.for %arg2 = 0 to 2 step 128 {
%c0_i32 = arith.constant 0 : i32
%1 = vector.transfer_read %arg0[%arg2], %c0_i32 : memref<2xi32>, vector<128xi32>
%2 = vector.transfer_read %0[%arg2], %c0_i32 : memref<2xi32>, vector<128xi32>
%3 = arith.divsi %1, %2 : vector<128xi32> // divide by zero due to padding
vector.transfer_write %3, %arg1[%arg2] : vector<128xi32>, memref<2xi32>
}
return
}
Locally I overcame this issue with separate pass, which replaces such pattern with non-zero paddings. But I’m just curious is this a bug in MLIR or I’m missing something during optimization and lowering pipeline?