[MLIR] broadcasting vector.transfer_read

Hi,

Does MLIR ever transform vector.transfer_read into a vector broadcast? Specifically I’m curious about this case where you load from a single constant:

func @foo(%arg1: memref) → memref<?x784xf32> {

%cst_0 = constant 0.000000e+00 : f32
%5 = vector.transfer_read %arg1, %cst_0 {permutation_map = #map3} : memref<f32>, vector<128xf32>

}

The docs make it sound like it’d be reasonable to transform the above into a broadcast (VectorTransferReadOp docs):

Alternatively, if a notional vector broadcast operation were available, the lowered code would resemble…

But when I inspect the codegen coming out of --convert-vector-to-scf I’m seeing a scalar loop:

scf.for %arg4 = %c0 to %c128 step %c1 {
%10 = load %arg1 : memref<f32>
store %10, %6[%arg4] : memref<128xf32>
}

What I’d like to see is a load from the constant and then a broadcast - is this transform somewhere else that I’m not looking? And if indeed it isn’t already implemented, is there a deeper reason behind this, or is it simply something that hasn’t been implemented yet?

Any guidance is appreciated!