I have looked at various memref casts and have not succeeded in performing the following operation, namely casting to a vector that cover a subset of the last dimension.
%vB = vector.type_cast %B: memref<64x128xf32> to memref<64x32xvector<4xf32>>
That above conversion is not supported. It is critically important when performing tiling in conjunction with SIMD and buffers, as we want to copy large panels for cache benefits, and then simdizing subsets
of the large panel.
I looked at increasing the memref rank of
%B first, e.g.
%B1 = some_cast %B: memref<64x128xf32> to memref<64x32x4xf32>
as it would then let me use the current vector.type_cast; I have not succeeded in doing so. So is there currently a way to perform such a rank-increasing (same total size, same type, sufficient in only the last dimension)?
You could just use the code from this PR: https://reviews.llvm.org/D85885 It should easily rebase on whichever master you have and is the simplest / most direct way to achieve what you want here. It exactly gives you the memref you want in one operation and its LLVM lowering exists and has been tested through until execution.
%MV = memref_vector_cast %M : memref<8x16xf32> to memref<8x2xvector<8xf32>>
%AV = memref_vector_cast %A : memref<?x?xf32> to memref<?x?xvector<8xf32>>
As of today, memref cast that changes element type is still a dangerous and unsupported operation. This has been documented and discussed in multiple posts. With the recent discussions and work on data layout representation that @ftynse has started landing, a bitcast like operation on memrefs becomes less of a footgun.
You could just use the code from this PR:
thanks, got the code and it works perfectly for our needs.
As of today, memref cast that changes element type is still a dangerous and unsupported operation.
While I am certainly not fully aware of the latest discussions about data layout, I fail to see how splitting a last dimension into a vector is dangerous. This is basically all that simdization is doing, overlaying a vector load on memory that is traditionally seen as scalar.
I would venture that offering a very conservative:
a0 x a1 x a2 x ... x an x T' to a0 x a1 x a2 x … x an/N x vector’ and restricting it to cases where
an (the last dimension) and N are compile time constants with ‘an mod N == 0’ is an extremely well behaving case and would be quite beneficial.
I have implemented in our internal dialect based on Uday’s code, and I can contribute it back if others are interested.
I think @nicolasvasilache might be referring to this post: MemRef type and data layout. Converting
memref<64x32xvector<4xf32>> is not a simple bitcast operation since the allocation of a vector type might have alignment constraints for a particular target, which would require introducing padding between the memref vector elements.
Note that you wouldn’t need a memref with a vector element type to perform a vector load/store. Vector transfer ops and the recently introduced
vector.store ops ('vector' Dialect - MLIR) allow you to perform a vector load/store on a memref with a scalar element type. For example:
%result = vector.load %base[%i, %j] : memref<100x100xf32>, vector<8xf32>
I don’t know the details of your particular case but it looks like you should be able to do something like (pseudo-code):
%cast = some_cast %B: memref<64x128xf32> to memref<64x32x4xf32>
%vload = vector.load %cast[%i, %j, 0] : memref<64x32x4xf32>, vector<4xf32>
of even without the cast:
%vload = vector.load %B[%i, %j*4] : memref<64x128xf32>, vector<4xf32>
Hopefully that helps!
Thanks, will try the vector load/store, I missed the new extension to loading from non-vector-typed memrefs.
On top of what @dcaballe added about padding constraint, I don’t think in general we have a guarantee that the last dimension of a memref is contiguous, can’t you form a subview of a 1D memref into another 1D memref that has a stride?