Vector.transfer_read padding and division operations

Hi!

I was playing with MLIR vectorization pass for Affine dialect and faced with one issue.

I have a testing IR, which just computes per-element division between 2 1D memref values.
affine-super-vectorize pass replaces affine.load operations with vector.transfer_read with zero padding and uses non-masked version of division operation. At the end after lowering to LLVM I’ve got maskedload operations with zeros for out-of-bound elements and non-masked vectorized llvm.sdiv operation. This lead to runtime “integer divide by zero” exception.

func.func @test(%arg0: memref<2xi32>, %arg1: memref<2xi32>) {
  // This constant doesn't contain zero values
  %0 = memref.get_global @__constant_2xi32 : memref<2xi32>
  affine.for %arg2 = 0 to 2 step 128 {
    %c0_i32 = arith.constant 0 : i32
    %1 = vector.transfer_read %arg0[%arg2], %c0_i32 : memref<2xi32>, vector<128xi32>
    %2 = vector.transfer_read %0[%arg2], %c0_i32 : memref<2xi32>, vector<128xi32>
    %3 = arith.divsi %1, %2 : vector<128xi32> // divide by zero due to padding
    vector.transfer_write %3, %arg1[%arg2] : vector<128xi32>, memref<2xi32>
  }
  return
}

Locally I overcame this issue with separate pass, which replaces such pattern with non-zero paddings. But I’m just curious is this a bug in MLIR or I’m missing something during optimization and lowering pipeline?

Yes this is a limitation in the implementation as the constant 0 is likely hard coded somewhere (this is very old code).

This needs to evolve to a NeutralOf type of interface.

Nothing very hard but needs to be done.