Tensor Ops with dynamic sizes - which behaviour is "more correct"?

Hi,

I’ve been exploring the behaviour of tensor.extract_slice and tensor.pack and I’m encountering some confusion when working with dynamic sizes. Specifically, I’m unsure about the behaviour when a dynamic value (like %c4) is used for the “size” in these operations.

Here’s what’s correct/incorrect for tensor.extract_slice today (note the dynamic “size” from %c4):

%c4 = arith.constant 4 : index
// CORRECT: Output tensor: tensor<?xf32>
%0 = tensor.extract_slice %t[0][%c4][1] : tensor<?xf32> to tensor<?xf32>
// INCORRECT: This leads to an error: "expected type to be 'tensor<?xf32>' or a rank-reduced version. (size mismatch)}"
%1 = tensor.extract_slice %t[0][%c4][1] : tensor<?xf32> to tensor<4xf32>

A similar experiment with tensor.pack shows a slightly different result:

%c4 = arith.constant 4 : index
// Output tensor: tensor<?x?x4x2xf32>
%0 = tensor.pack %source padding_value(%pad : f32) inner_dims_pos = [0, 1] inner_tiles = [%c4, 2] into %dest : tensor<?x?xf32> -> tensor<?x?x4x2xf32>
// Output tensor: tensor<?x?x4x2xf32>
%1 = tensor.pack %source padding_value(%pad : f32) inner_dims_pos = [0, 1] inner_tiles = [%c4, 2] into %dest : tensor<?x?xf32> -> tensor<?x?x?x2xf32>

Intuitively, I expected that using %c4 (a constant, but still dynamic at this level) would always result in ? in the output tensor. On the other hand, I also see the logic in allowing 4 instead of ? since we know the exact value of %c4.

So my question is: Which behavior is more correct? Should %c4 always lead to ? in the output tensor, or is there merit to allowing 4 in cases where the value is known?

General Case

I’m also working on more complex cases, such as this genuinely dynamic one:

%c4 = arith.constant 4 : index
%vs = vector.vscale
%c4_vs = arith.muli %c4, %vs : index
%0 = tensor.pack %source padding_value(%pad : f32) inner_dims_pos = [0, 1] inner_tiles = [%c4_vs, 2] into %dest : tensor<?x?xf32> -> tensor<?x?x?x2xf32>

In this case, using 4 instead of ? would definitely be incorrect since %c4_vs is truly dynamic. However, before I get to this case, I need to resolve the simpler cases mentioned above.

Thanks!

– Andrzej

Was f32 → i8 intended? The correct extract_slice example doesn’t compile for me: Compiler Explorer

This aside, why not just add a canonicalization here to use static size when known?

1 Like

Sorry, that was a typo. I’ve now fixed it. Here’s a working example: Compiler Explorer

This aside, why not just add a canonicalization here to use static size when known?

That would definitely make sense, and I agree that it should be done. But for me, that’s step 2 :slight_smile: My current focus is on ensuring consistency across all Tensor Ops regarding how they handle “sizes”-like attributes.

Before we consider canonicalization, I’d like confirmation that:

  1. Tensor Ops should consistently handle “sizes” across the board (right now, they don’t).
  2. Mixing dynamic “sizes” with static “shapes” should be an error, as it introduces ambiguity.

For example, this feels wrong:

//  dynamic "size" (%c4), static "shape" (4)
tensor.pack ... inner_tiles = [%c4, 2] into ... -> tensor<?x?x4x2xf32>

Instead, I think it should be either:

//  static "size" (4), static "shape" (4)
tensor.pack ... inner_tiles = [4, 2] into ... -> tensor<?x?x4x2xf32>
//  dynamic "size" (%c4), dynamic "shape" (?)
tensor.pack ... inner_tiles = [%c4, 2] into ... -> tensor<?x?x?x2xf32>

Does it make sense? Am I missing something?

Ok that makes sense, I tripped up over “correct” and missed the underlying motivation is addressing consistency.

Makes sense to me, mixing dynamic sizes and static shapes seems very messy.

I’m pro verification flagging it as error only where it is known error. There is also the requirement that it be local, so by nature it is conservative in that. The inference should also be similarly conservative. But if user created it with specific type, and they have application knowledge to do so, then that seems fine. I don’t want to restrict high level frameworks from encoding knowledge here, nor does that mean the lowering of these tensor ops need to codegen additional verification (it should be on the framework above). E.g., I see the static shape in this locally unknown case as an additional assert in the analysis/logical sense but not in the runtime sense.

The verifier of tensor.pack is inconsistent with the rest of the tensor/memref dialects. We generally require a dynamic dimensions for each (dynamic) SSA value and a static dimension for each static size (attribute).

Example: %0 = tensor.empty(%s0, %s1) : tensor<?x2x?xf32>
It is clear that %s0 corresponds to the first ? and %s1 corresponds to the second %s2. If the number of ? and operands were different, it would not be obvious which SSA value corresponds to which dimension.

Ops like tensor.extract_slice, tensor.insert_slice, tensor.pad, memref.subview are well-defined even without this property (because they encode mixed sizes with an attribute and SSA values). But for consistency reason (I think), they still follow this principle. Another benefit is that the result type can be uniquely inferred from the operands + attributes. (But for some reason we do not implement InferTypeOpInterface…)

Some interfaces such as the ReifyRankedShapedTypeOpInterface also follow the “SSA value if and only if dynamic dimension” principle.

Another problem is with the current tensor.pack op verifier. It calls getConstantIntValue on the inner_tiles. This is violating MLIR guidelines: only local properties of an op should be verified, but here we are looking at the definition of an operand. But this kind of verification is necessary to accurately verify ops such as %0 = tensor.pack ... inner_tiles = [%c4, 2] ... : tensor<?x?xf32> -> tensor<?x?x4x2xf32>.

I’d recommend to tighten the verifier to disallow ops such as %0 = tensor.pack ... inner_tiles = [%c4, 2] ... : tensor<?x?xf32> -> tensor<?x?x4x2xf32>.

But if user created it with specific type, and they have application knowledge to do so, then that seems fine. I don’t want to restrict high level frameworks from encoding knowledge here

Users can still encode the same static information by putting inner_tiles = [4, 2] instead of inner_tiles = [%c4, 2].

2 Likes

+1 - this is the rule we should have for all of them. I’m not aware of any intentional/needed exceptions.

Thank you all, this confirms my understanding of the overall design. Here’s a patch to restrict the verifier for tensor.pack and tensor.unpack:

Matthias has already effectively replied to this and I assume that restricting the verifier shouldn’t limit anyone. But please let me know if I’m missing something :slight_smile:

Btw, @matthias-springer , thank you for elaborating :pray: That extra context makes the overall so goal much clearer!

The invariant helps many things. We’ve used the trick before in frontends to pun through a fully dynamic tensor cast when convenient or needed (then rely on folders or optimization passes to safely reduce it to the strict form).