[RFC] More OpFoldResult and "mixed indices" in ops that deal with Shaped Values

What would the API look like? You cannot have a build API taking a ValueRange and doing type inference on matching constantop, because what if the values aren’t constantop? That means that the API must require the result type to be provided…

More than type inference we are doing shape inference for tensors and memrefs so for non-constant values the inferred dim would be dynamic.

The argument against constant ops is that we could infer the shape at build time and we could replace the constant op with something else later on and the initial inferred shape would become invalid. This is a valid point but I only see two ways of replacing a constant op: 1) We set a new operand from the op itself, which should trigger the shape inference again; 2) We blindly replace the constant op with another op, impacting all the users of the constant op. I have never seen something like #2. When a constant op is replaced, it’s done within the context of the user op and only impacting that specific user. Maybe I’m missing something.

It’s not clear to me how shape inference affects a builder API? Do you have an example?

The InferShapeOpInterface and associated mechanism should be able to pattern match a constant op I believe, and the interface allows to express the concept of “compatible” types for the inference, so that “replacing the constant op with something else” would not break anything.
We’ve been supporting this “forever” I believe: Add compatible query method to infer type interface · llvm/llvm-project@7af61f6 · GitHub

This is what we have today for tensor.extract_slice using attributes:

%1 = tensor.extract_slice %0[0, 0, 0][1, 16, 4][1, 1, 1] :
  tensor<8x16x4xf32> to tensor<16x4xf32>

With constant ops, it would be something like:

%c0 = arith.constant 0 : index
%c1 = arith.constant 1 : index
%c4 = arith.constant 4 : index
%c16 = arith.constant 16 : index
%1 = tensor.extract_slice %0[%c0, %c0, %c0][%c1, %c16, %c4][%c1, %c1, %c1] :
  tensor<8x16x4xf32> to tensor<16x4xf32>

and tensor<16x4xf32> is the type of the slice we are extracting, whose shape is inferred from the constant offsets, sizes and strides. Hopefully this helps!

I know all this, but you’re losing me in how this fits the flow of the discussion here.

We can infer shaped both from “constant op” or from attributes, however when we have “constant op” shape inference becomes an optimization (we can’t just have the InferTypeOpInterface), which changes the API exposed to build the operation in C++.
This is what I wrote with:

Back to the generic topic of “mixed indices”, something I was told at some point is that having some APIs taking ArrayRef<int> instead of ValueRange for building these ops allows for type inference in a way that you can’t get relying on ConstantOp

to which you answered:

You can infer the type from the ConstantOp but you can’t verify if the inferred type is correct

Which led to the current confusion where I don’t know what you’re trying to say…

Touched on this topic with @matthias-springer @dcaballe @qcolombet and @mehdi_amini before leaving on vacation but did not have time to flush out a reply, sorry for being late to the party!

My past navigation of the tradeoffs discussed here (over a few iterations for the tensor.extract_slice, tensor.insert_slice and memref.subview cases) leads me to believe the interleaved “value or type” abstraction (a.k.a OpFoldResult) is a better compromise.

Analogies with LLVM are a red herring IMO: if we omit the simple trunc/ext-like ops, LLVM only has the bitcast operation to meaningfully convert between structured (pointer) types. AFAIK, there aren’t cases where a “more static” type is constructed from the knowledge of whether an operand is static or dynamic.

I can see the GEP operation has some API contorsions as the op semantics require static attributes for indexing into structs (using non-constant values here would always be invalid). I also understand @dcaballe’s argument about the added complexity. IMO, such pains are a direct consequence of not having a uniform way of manipulating “value or attr” (a.k.a OpFoldResult) variables and arrays thereof, across the codebase.

Here are some examples where I believe complexity is actually higher if we don’t mix “value or attr” in the op semantics.

%0 = arith.constant 0 : index
%1 = arith.constant 1 : index
%2 = memref.subview %a[%0, %0][%1, %1][%1, %1] : memref<?x?xf32> to xxx

what should we require (i.e. “op-verify”) xxx to be here ?

Case 1: memref<1x1xf32>

First, this breaks the localization properties of verifiers (to which I subscribe: the op should be verifiable without having to jump around thousands of lines of IR).

But assume we throw out the localization requirements, any user of this op now must take into account both the operands and the memref<1x1xf32> type to perform transformations etc. So in practice the only thing we have done is move our APIs from ArrayRef<OpFoldResult> to ArrayRef<Value> + type.getShape(). This is a net increase in complexity IMO.

Case 2: memref<?x?xf32, offset: ?, strides: [?, 1]>

This is what the current verifier of the op requires.
However, in the absence of mixed value or attr, we are either forced to:

  1. always use the dynamic case and give up on static information (non-starter)
  2. go to case 3
Case 3: Polymorphic with anything 2-D of elemental type f32

If we want to be able to represent the memref<1x1xf32> case with only SSA values, we have to relax the verifier to never check for more information than the rank and elemental type, since all the following types would be technically valid: memref<1x1xf32>, memref<1x?xf32>, memref<?x1xf32>, memref<?x?xf32>, … I believe this is overly restrictive.

Additionally, the argument of “case 1” remains: users of these ops will still have to manipulate ArrayRef<Value> + type.getShape(). But this is further complexified by the fact that the type itself may not be the most static possible type, because of polymorphism. So the actual information every user of the APIs is exposed to is even more complex than ArrayRef<Value> + type.getShape()

In contrast, with the current design, WYSIWYG:

  1. an attribute forces a static value in the type when appropriate
  2. a value forces a ? in the type where appropriate
  3. canonicalization / folding turns an SSA value into an attribute and a more static type where appropriate
  4. everything composes with makeComposedFoldedAffineApply which is powerful
  5. the “added complexity” of a vector of “value or attr” is really inherent to the op semantics: if one restricts the API to vector of values only, the problem is only displaced and the cognitive price will actually be higher.

I’d prefer that we start recognizing that ValueOrAttr is fundamental to higher-level IRs than pointers + structs and have a single thing to represent it uniformly across the codebase (including in the GEP case because we should not have multiple names and implementations for the same thing).

1 Like

I don’t quite get what’s “restrictive” here?
This model seems like just fine to me, and is quite commonly used! As mentioned above in his thread this is why the InferTypeOpInterface supports checking for “type compatibility”.

This is “nice to have”, but I still fail to see how fundamental this is actually.

By the way you have to check all the verification before doing this kind folding (since folding should not break the IR verified), this is another complexity here.
The other side of it, is that then the consumer can expect it to be “correct”

Yes, this is part of op design and is indeed subject to tradeoffs (e.g. what is guaranteed at definition does not need to become a switch at all users). I have found it easier to work with a tight verifier that does not accept “whatever” type but I would certainly love to see an alternative design and its implications on transformations.

True, currently this is achieved by having a tight coupling between the static/dynamic nature of operands and the return type. I.e. if inferReturnType returns null then canonicalization fails.

Note however that this is an illustration that is orthogonal to the question of using APIs and printing/parsing with “value or attr” vs operand values + type information + other one-off attributes information, which has also surfaced in the GEP case. I believe where possible / makes sense lists of “value or attr” are significantly nicer to work with.

In the limit, it is possible there is nothing fundamental: as long as the information is in the IR, it can be recovered with custom C++, for every op independently (e.g. GEP). I believe standardizing common patterns does help reduce cognitive load and reduce the risk of error introduced by reimplementing the same thing multiple times.

Actually, there may be something deeper here: forcing SSA values everywhere seems it would also skew the op design towards Case 3 if we keep the localized verifier requirements.