About the semantics of vectors with unit dimensions

After reading the documentation and spending a considerable amount of time with the lowering of vector.transfer_read, I still don’t understand the semantics of vector.transfer_read and vector.transfer_write. Let’s assume we have this:

%0 = vector.transfer_read %collapsed[%1, %2, %3], %cst {in_bounds = [true, true, true]} : tensor<2048x7x3xf32>, vector<8x1x1xf32>

Documentation says: “A vector.transfer_read can be lowered to a simple load if all dimensions are specified to be within bounds and no mask was specified”. So in this example it should be lowered to a vector.load. However, after running --convert-vector-to-scf pass, this is lowered to a bunch of ops (vector.broadcast, vector.insert, and more).

Then, if I instead write this:

%0 = vector.transfer_read %collapsed[%1, %2, %3], %cst {in_bounds = [true]} : tensor<2048x7x3xf32>, vector<8xf32>

then it gets lowered into a single vector.load.

The former code gives me correct results in my kernel, while the latter gives me wrong results. This means that those two are not equivalent, which is something that I don’t understand. I was expecting vector<8x1x1xf32> to be semantically equivalent to vector<8xf32> (in the end it’s just an buffer containing 8 floats).