Vector tranfer read when permutation_map is not minor identity

Recently, I experienced wrong behaviour of vector.transfer_read when I tried to read second and third dimension of a 4D memref into a vector i.e. memref<?x?x?x?xf32> to vector<3x3xf32> with permutation_map : (d0, d1, d2, d3) -> (d1, d2). Together with Alex we observed that it works correctly only if vector dimensions are power of 2.

This is the full code that reproduces the error:

#map0 = affine_map<(d0, d1, d2, d3) -> (d1, d2)>
func @print_memref_f32(memref<*xf32>)
func @alloc_4d_filled_f32(%arg0: index, %arg1: index, %arg2: index, %arg3: index, %arg4: f32) -> memref<?x?x?x?xf32> {
  %c0 = constant 0 : index
  %c1 = constant 1 : index
  %c10 = constant 10 : index
  %c100 = constant 100 : index
  %0 = alloc(%arg0, %arg1, %arg2, %arg3) : memref<?x?x?x?xf32>
  scf.for %arg5 = %c0 to %arg0 step %c1 {
    scf.for %arg6 = %c0 to %arg1 step %c1 {
      scf.for %arg7 = %c0 to %arg2 step %c1 {
        scf.for %arg8 = %c0 to %arg3 step %c1 {
          %arg66 = muli %arg6, %c100 : index
          %arg77 = muli %arg7, %c10 : index
          %tmp1 = addi %arg5, %arg66 : index
          %tmp2 = addi %arg77, %arg8 : index
          %tmp3 = addi %tmp1, %tmp2 : index
          %tmp4 = index_cast %tmp3 : index to i32
          %tmp5 = sitofp %tmp4 : i32 to f32
          store %tmp5, %0[%arg5, %arg6, %arg7, %arg8] : memref<?x?x?x?xf32>
        }
      }
    }
  }
  return %0 : memref<?x?x?x?xf32>
}

func @main() {
  %c0 = constant 0 : index
  %c1 = constant 1 : index
  %c3 = constant 9 : index
  %cst = constant 1.000000e+01 : f32
  %cst_1 = constant 0.000000e+00 : f32
  %0 = call @alloc_4d_filled_f32(%c1, %c3, %c3, %c3, %cst_1) : (index, index, index, index, f32) -> memref<?x?x?x?xf32>
  %converted = memref_cast %0 : memref<?x?x?x?xf32> to memref<*xf32>
  call @print_memref_f32(%converted): (memref<*xf32>) -> ()

  %1 = vector.transfer_read %0[%c0, %c0, %c0, %c0], %cst {permutation_map = #map0} : memref<?x?x?x?xf32>, vector<9x9xf32>
  vector.print %1 : vector<9x9xf32>
  dealloc %0 : memref<?x?x?x?xf32>
  return
}

And this is how I run it:
mlir-opt %s -convert-vector-to-scf -convert-linalg-to-llvm | mlir-cpu-runner -e main -entry-point-result=void -shared-libs=<path-to-libs>/runtime-support.so.

Alex told me that he already exprienced/heard about similar problem so I want to ask if this is something well known or the bug or that I just managed to trigger some unsupported behaviour. Thanks. @nicolasvasilache @aartbik

Sorry for the delay responding on this medium …
This should be fixed now I believe?