Restrict `vector.type_cast` to only cast to alignable multi-dim vectors

In LLVM IR, type [a... x <b x T>] has an alignment requirement on the vector type inside. By default, it will require <b x T> to be aligned as the next 2’s power (e.g., for <3 x T>, it will align to sizeof(4xT)). So [a... x b x T] is generally not the same thing as [a ... x <b x T>] unless b is 2’s power.

So here comes the problem in operation vector.type_cast. It cast a memref<a... x b x T> into memref<vector<a... x b x T>> without touching the data inside. In MLIR the vector<a... x b x T> is currently translated into [a... x <b x T>] in LLVM. So if b is NOT a power of 2, this cast will cause very strange behavior.

Considering the example above:

memref.global "private" @gv0 : memref<2x3xi32> = dense<[[1, 2, 3], [4, 5, 6]]> 

func.func @main() {
  %mem0 = memref.get_global @gv0 : memref<2x3xi32>
  %mem1 = vector.type_cast %mem0 : memref<2x3xi32> to memref<vector<2x3xi32>>

  %v1 = memref.load %mem1[] : memref<vector<2x3xi32>>
  
  %p12 = vector.extract %v1[1, 0] : vector<2x3xi32>
  vector.print %p12 : i32

  return
}

It will print out 0 instead of 6 because when translated into LLVM IR, LLVM assumes a [2 x <3 x i32>] (%v1 here) will have <3 x i32> aligned as 16 bytes (4 *sizeof(i32)), so the vector.extract operation in the example will be translated into an accession to address @gv0 + 16 instead of @gv0 + 12, giving a wrong output. I put the translated LLVM IR in godbolt to see the generated assembly code and confirmed this.

IMO, vector.type_cast should restrict its argument type to memref<a... x b x T> where b is a power of 2, instead of accepting arbitrarily memref. But since I am not very familiar with the design and excepted usage for vector.type_cast, I decide to post here for more opinions on this strange behavior of vector.type_cast before I begin to add this check into it. Once we reach a consensus, I am happy to write and submit a patch to fix it.

I have submited a patch in D142280 to perform the checking I described.

1 Like

Thanks for surfacing, the op documentation is wrong and not in line with its usage, I will clean it up next week.

This op in isolation is however just a symptom of a larger problem that needs a deeper resolution.
There have been various discussions / posts about why memref is “not a pointer” with detailed explanations about the type of behavior you describe.

The issue you are seeing is not limited to vector.type_cast, it also applies to other things at the memref / value interface where we do not yet model the data layout in MLIR, here is an example.
Interestingly we had a recent conversation internally for the case of i1.

The TL;DR is that we are not taking into account the LLVM data layout yet and this is wrong.
The proper way to make progress here is to finally start making use of the DataLayout support in MLIR that @ftynse introduced a while back.

A solution that would register as good in my opinion would need to:

  1. be retargeable to various HW (e.g. some HW require 1KB alignment to do things implicitly (i.e. without explicit memref.reshape / vector.reshape / vector.shape_cast))
  2. work with sub-byte granularity (i4, i2, i1)
  3. be retargetable across backends: LLVM is but one backend we use, SPIR-V is another and there may be others.

The “power-of-2” constraint is a simple approximation for LLVM alignment, this may make sense in the short term to remove some effect of surprise once the op doc is fixed.

This may work as a stop-gap solution to avoid buggy behavior, but we need something more robust that doesn’t propagate the expectations of LLVM IR beyond low levels of the dialect stack.