Vectorization failure

This is a gemm MLIR file.

module {
  func.func @_Z4gemmPA32_iS0_S0_(%arg0: memref<32x32xi32>, %arg1: memref<32x32xi32>, %arg2: memref<32x32xi32>)  {
    affine.for %arg3 = 0 to 32 {
      affine.for %arg4 = 0 to 32 {
        %c0_i32 = arith.constant 0 : i32
        %0 = affine.for %arg5 = 0 to 32 iter_args(%arg6 = %c0_i32) -> (i32) {
          %1 = affine.load %arg0[%arg3, %arg5] : memref<32x32xi32>
          %2 = affine.load %arg1[%arg5, %arg4] : memref<32x32xi32>
          %3 = arith.muli %1, %2 : i32
          %4 = arith.addi %arg6, %3 : i32
          affine.yield %4 : i32
        }
        affine.store %0, %arg2[%arg3, %arg4] : memref<32x32xi32>
      }
    }
    return
  }
}

Command:mlir-opt gemm.mlir -affine-super-vectorize=“virtual-vector-size=8 test-fastest-varying=0 vectorize-reductions=true”
Why does the result vectorize the second loop and fail to vectorize the innermost loop?