Efficiently loading vectors from memrefs

I try to understand the interface between “regular” memref-based code (like the one I can generate today from linalg) and the vector dialect.

On one hand, it would be appealing to use everywhere vector types
instead of scalar types. This probably enforces some alignment requirements, but then facilitates the synthesis of vector code. But I
guess in doing so one loses the bufferization phases and possibly other lowerings and optimizations (unless bufferization already is capable of handling such cases, which I don’t know - please let me know).

On the other hand, using scalar-based memrefs and having to convert all the time from one to the other raises the question of how to do it efficiently. Is there some example on how to efficiently load vectors from memrefs? Do I have to do some the packing of scalars into vectors “by hand” (moving scalar by scalar from and then storing as vector), or is there something better that can be done?

I have read the (very helpful) PDF documents on the vector dialect, but these documents provide only the fully vectorial inner loops, not the integrating code around. This is why I’m asking.


@dpotop there is a bunch of ongoing work from tensor-land to vector to make the packing transformations much easier than from buffer-land (thanks to SSA use-def chains).

Most transformations live in core but the end-to-end story is, for now, in a separate repository.
This is mostly because of the differences in bufferization and will take some time to resolve.

There are still many things in flights and open problems but if you want to see where this is going, feel free to poke at the sandbox. This test is useful to look at, dump intermediate IR and play with; but expect many sharp corners.