After reading the documentation and spending a considerable amount of time with the lowering of vector.transfer_read
, I still don’t understand the semantics of vector.transfer_read
and vector.transfer_write
. Let’s assume we have this:
%0 = vector.transfer_read %collapsed[%1, %2, %3], %cst {in_bounds = [true, true, true]} : tensor<2048x7x3xf32>, vector<8x1x1xf32>
Documentation says: “A vector.transfer_read
can be lowered to a simple load if all dimensions are specified to be within bounds and no mask
was specified”. So in this example it should be lowered to a vector.load
. However, after running --convert-vector-to-scf
pass, this is lowered to a bunch of ops (vector.broadcast
, vector.insert
, and more).
Then, if I instead write this:
%0 = vector.transfer_read %collapsed[%1, %2, %3], %cst {in_bounds = [true]} : tensor<2048x7x3xf32>, vector<8xf32>
then it gets lowered into a single vector.load
.
The former code gives me correct results in my kernel, while the latter gives me wrong results. This means that those two are not equivalent, which is something that I don’t understand. I was expecting vector<8x1x1xf32>
to be semantically equivalent to vector<8xf32>
(in the end it’s just an buffer containing 8 floats).