Converting between vector and scalar memrefs

I can’t find a way to write code that sees a memory region as both a memref<256xvector<4xf32>> and a memref<1024xf32>. I don’t want to have two separate objects and then allow copy-converting from one to the other. Sharing the memory locations is an important issue. For instance, to (efficiently) supply input from input devices, you would want your vector-based code to perform the memory allocation as a vector memref, then provide the well-aligned location as a scalar memref to the library-supplied input loading routine (which knows nothing about vectors), and then come back to the vector code and work without making copies.

As far as I can understand, the only operation allowing conversion between the two types of memrefs is vector.type_cast. However, this operation seems to go the wrong way, from a scalar memref without alignment requirements to a vector memref with alignment requirements. As far as I can understand, this implies that an implementation of vector.type_cast will fall in one of two cases:

  • may require a copy of the data to ensure the correct alignment
  • may result in a crash if the scalar memref is not correctly aligned.

I assume a solution exists, I just don’t know it. And I did search for it, in the definition of the vector dialect and in posts like these ones.

Addendum: I realize that memref.alloc allows specifying an alignment on the scalar memref. However, this does not solve my question, because extracting vector from an aligned memref would require the use of memref.subview (memref.view seems to be restricted to type i8) which introduces a map in the output memref definition, followed by vector.type_cast, which does not accept a map in the input memref…

Furthermore, even if this solution somehow worked in spite of the map that is introduced, the code would remain cumbersome:

  • vector.type_cast only applies to full objects (you can cast memref<1024xf32>
    into memref<vector<1024xf32>>, but not into memref<256*vector<4xf32>>) meaning that one has to insert memref.view and vector.type_cast operations everywhere in the processing loop.
  • vector-related alignment may depend on architecture, but here I have to provide the alignment directly, instead of delegating it to the back-end.