Linalg and Shapes

nicolasvasilache · December 15, 2020, 10:36am

Here are some high-order bits that have been tricky when using Linalg on tensors.
This relates to the ongoing discussion on discourse as well as the IREE discussion.

@stellaraccident @jpienaar @_sean_silva @benvanik @hanchung

Buffer land

All inputs and outputs are materialized SSA values with type memref<?x4x?x42xf32, layout> type. At this point, transformations have enough info to create subsets of data and computations using existing infrastructure that composes and canonicalizes properly.

Note that certain operations use a “fake” memref / a hollow memref whose data payload is ignored. linalg.max_pooling ins(%I: memref<?x?x8x16xf32>, %W: memref<2x3xf32>), outs(%O: memref<?x?x7x14xf32>. This essentially captures shape information after observing that a pooling op is a conv op with a functional kernel.

Tensor land

Currently, all inputs are materialized but only a subset of outputs (i.e. init_tensor). These are represented as SSA values with type tensor<?x4x?x42xf32>.

Linalg ops often (but not always…) have enough information to derive/reify/infer the output shape form the inputs. Examples: linalg ops that represent the computation O(i,j) <- f(A(i, j), B(j, i)), pointwise + permute ops, certain broadcast semantics O(i, j) <- f(A(i), B(j, i, k)) etc.

Certain ops may look like they have enough information but they really don’t.
Examples:

linalg.constant(%0) : tensor<4x8xf32>

“works” but

linalg.constant(%0) : tensor<?x8xf32>

lacks information.

linalg.broadcast(%0) : tensor<4xf32> into tensor<4x8xf32>

“works” but

linalg.broadcast(%0) : tensor<4xf32> into tensor<4x?xf32>

lacks information.
* It is left to the reader to derive the full linalg.generic semantic to implement constant and broadcast, these implementation details are not relevant.

The above creates a landscape in which:

many but not all ops have enough information to implement a simple inference and not worry about specifying output shapes.
it feels unsatisfactory to have to specify say 10 output shapes (can easily happen after fusion of pointwise) that can all be derived automatically.
it is unreasonable to define op semantics / verifier based on whether the shape inference procedure (a Gaussian elimination process) succeeds or not (i.e. whether the map between shapes and loops depends on any output dimension). We already had the problem in Tensor Comprehension land and did not come to a general and unsurprising solution that can be exposed to a user.

Shape dialect

The shape dialect type is rank, elemental type and dimension erased: !shape.shape. All static information is captured via op semantics + IR structure (and without type or attributes information). This is factored via InferTypeOpInterface .

Stepping forward

linalg.generic is a common lower-level form that has to support all cases, it must not be concerned with implicit shape inference rules: explicit is better than implicit. linalg.generic needs to require all output information to be available to support all cases. It is counterproductive to try and be “smart” or “nice to use” at this level of abstraction. Shape inference is a concern that must have been resolved already. If one wants nicer user semantics it is the role of named ops to provide a form in which only the necessary information needs to be spelled out / custom shape inference behavior can be specified.

An easy path forward is to just force all linalg.generic to take an output tensor for all output shapes and have an op to create a tensor out of a list of SSA values that represents the shape. This will allow unifying the structured ops interface and make it less surprising in tensor land.

A big advantage is that all transformations, ops and canonicalizations will work out of the box operating on tensor<?x4xf32>. This will also unify the tensor and memref behavior which currently have subtle OpInterface differences.

This is by no means enough: every tensor will be bufferized to a new alloc when there is no need to. In the short-term, a simple analysis can traverse the IR and determine whether allocations are indeed necessary and try to avoid them. But this hints at making the “hollow memref” type mentioned in the (Buffer land paragraph) a first class citizen and not lose the information.
This could be a new ShapedType e.g. ranked_shape or more simply an extra bit on existing shaped type and would ensure aligned_ptr == alloc_ptr == nullptr.

E.g. with just a bit we could have tensor<?x4xf32, shape_only=1> and memref<8x?xf32, shape_only=1>. Values of such types could be used in the shape dialect and bridge the gap to linalg / subview / subtensor / subtensor_insert. memref<..., shape_only=1> may not be loaded from / stored into. The nice thing is that such a type extension would compose naturally with all ops, canonicalizations and transformations. But this now gets into shape dialect territory. The downside is that it a static form of shape that the shape dialect does not have (yet) and it is unclear this is a desirable addition.

A better path forward at that point may be, as usual, to use OpInterfaces and factor out the properties needed for linalg op representation and transformations. It is unlikely however that linalg will want to evolve into support rank-erased ops at this point so we would probably need something like:
shape.to_hollow %s: !shape.shape, tensor<?x4xf32, shape_only=1>
shape.to_hollow %s: !shape.shape, memref<8x?xf32, shape_only=1> // no layout

Many longer-term design alternatives are possible and relate to the traditional “represent static information with a) type, b) attribute, c) op semantics, d) structure in the IR”, as well as canonicalizations, shape_cast propagations, how that mixes with transformations and how much we want to traverse the IR to retrieve information necessary for transformations vs encode it in types.

herhut · December 15, 2020, 3:49pm

The question with interfaces is somewhat tricky. One solution I can think of here is to have an interface that, given a value of a certain type, can do the operations on that value that linalg requires for its transformations. Examples are getting the rank or getting the shape, generate a subview (tile the value) and so on. Likely in static (produces constant) and dynamic versions (produces IR). Some types might not support all operations (the shape type would not allow to read data from it) or we might end up with different interfaces and linalg can make the requirements for its operands explicit that way.

Such an interface should be trivial to implement for tensor and memref, as linalg has all the components today and the knowledge is in the type. For shape, the implementation of the interface might need to traverse IR, or query an analysis that was run before, etc. At least until we decide to move more information into the type.

This gives us an undef value through the backdoor, as it is still a tensor/memref typed value. But at least it can be undefined in a more constrained way, as one cannot read from it. So my reaction is slightly less allergic
We should also give this op asserting behavior in the case it remains as a residual value.

Using the interface would allow us to decouple the information that is needed from how it is represented. I think that would be quite valuable for linalg long term anyway.

nicolasvasilache · December 15, 2020, 3:53pm

I corrected my example, I meant to add a bit memref<8x?xf32, shape_only=1> : there is no payload and no undef here

Def. agree, sparse is always on the back of my mind.

_sean_silva · December 15, 2020, 7:28pm

nicolasvasilache:

Certain ops may look like they have enough information but they really don’t.
Examples:
linalg.constant(%0) : tensor<4x8xf32>
“works” but
linalg.constant(%0) : tensor<?x8xf32>

It is on my TODO list, after splitting out std.splat into tensor.splat, to implement the generalization listed here as a TODO in the docs for std.splat

%t = splat %s [%m, %n] : tensor<?x?xi32>

A similar op design would work for a broadcast op.

mehdi_amini · December 15, 2020, 7:44pm

I don’t quite get the fundamental need for these “hollow”/shape_only=1 tensor/memref right now, can’t you take a SSA value with a shape type instead if this is what the value is intending to model?

nicolasvasilache · December 15, 2020, 7:53pm

What is this shape type that you mention: can you point me to the type that has the required properties and its textual IR representation?

Longer-term the answer is yes: for all intended purposes tensor<?x8xf32, shape_only=1> == ranked_shape<?x8xf32> (note that the xf32 is not part of the shape but part of the builtin ShapedType, that would have to evolve too).

Getting such a type piped through and properly working with linalg op semantics, subview/subtensor/subtensor_insert equivalents and all the required canonicalizations, foldings and transformations as well as connected to the shape dialect in ways that compose properly is likely a multi-week project by itself. Happy to offload this to people more connected to the shape dialect if there are takers

herhut · December 15, 2020, 8:03pm

_sean_silva:

It is on my TODO list, after splitting out std.splat into tensor.splat, to implement the generalization listed here as a TODO in the docs for std.splat
%t = splat %s [%m, %n] : tensor<?x?xi32>
A similar op design would work for a broadcast op.

Splat is an interesting question that was not discussed to a conclusion. The question is what the canonical form of a splat constant is. It can either be a splat with a shape, or it can be a constant with a broadcast operation. The latter is more general, as it also covers the non-constant scalar case.

So what is the value proposition of splat?

_sean_silva · December 15, 2020, 8:46pm

To be clear, the %s (the splat value) in what I linked above is a dynamic (scalar) value; in that sense, it’s basically the tensor equivalent of linalg.fill. I hadn’t thought that deeply about the tradeoffs here, and I tend to agree that a small tensor_from_elements + broadcast seems like a more general alternative that scales to more complex cases.

mehdi_amini · December 16, 2020, 12:01am

It does not exist, it can be a type in the shape dialect or an evolution of the existing one. I just wanted to clarify if there was something I missed that would force it to be tensor/memref for another reason.

MaheshRavishankar · December 16, 2020, 7:28am

Maybe repeating some of the points above, what is needed here is an undefined tensor that is only used for its shape. I do acknowledge that this might cause issue of handling undef in general, but here undef is to say we dont care about the values, i.e. they are never looked at.

To have a more concrete idea of how this looks I did implement a linalg.init_tensor which has the semantics that we need. The difference between this and tensor_from_elements is that it does not need a region that specifies what the value should be. That defeats the purpose of never having to set/access the values of this tensor materialized purely for its shape.

Here is the patch that adds linalg.init_tensor and all canonicalizers for it. ⚙ D93374 [mlir][Linalg] Define a linalg.init_tensor operation.

jpienaar · December 16, 2020, 7:37am

This is an IREE-ism, tensors always having a backing memory store.

ranked_shape is something I’m never been quite sold on: it is mixing the value and type domain. It is like combining the known zero bits into the type, e.g., we could have i8<00110001> instead of having just i8 and known zero be an analysis. I can see the convenience.

index is a type intended for indexing & size calculations and initially didn’t even have a pre-defined in memory size. A tensor<2x index> would be a value for a rank 2 shape’s size. It already exist, we already use it in shape calculations. It is unclear how much you want to include analysis information into the type vs keying off of type to differentiate further use of the computation.

I also tend to agree with the latter. Although I did the former: constant_like operation was more convenient at the front end. The splat with constant seems to be a useful case where you know all is constant and simple scalar value/friendlier to read in dumps. The broadcast one does allow more flexibility. Broadcast does have more executional connotations for me.

I think that is projecting: any appropriately shaped tensor would suffice (could be constant 0 splat) you just don’t care about its value and won’t read it. But it being undefined does not seem an essential property. It is convenience to see from its definition site that you don’t need to materialize it vs considering it’s consumers. But again this is tying materialization of a tensor in memory with tensor.

nicolasvasilache · December 16, 2020, 8:18am

Rephrased as: every tensor will bufferize to a new alloc.

Right, so the point here is that ShapedType is the “current interface” that codegen operates on and we need to bridge the gap between that and the Shape dialect. This is what my comment below wanted to capture: there is probably a decoupling between the RankedShape and the ShapedType (i.e. Shape + element type).

I don’t have strong opinions on how this should be layered, not having worked with the shape dialect yet. On one hand, it seems it would be good to have these in the same class hierarchy as builtin types? Or is there an appropriate interface we can just use?
Generally, it seems useful to unify the notion of shape / unranked memref / tensor on one hand and “ranked shape” / memref / ranked tensor on the other hand.

+100 as per the LLVM Langref:
“Undefined values are useful because they indicate to the compiler that the program is well defined no matter what value is used. This gives the compiler more freedom to optimize. Here are some examples of (potentially surprising) transformations that are valid (in pseudo IR):”

Let’s please keep these considerations for future optimizations and out of the critical path: they are likely premature at this point and a simple constant suffices for now. Also, part of this discussion is about reducing surprise, explicit is better than implicit etc; mixing it with undef which introduces “potentially surprising transformations” goes against the flow .

herhut · December 16, 2020, 11:44am

Let me try to clarify how I am looking at this by expanding a little. LinAlg operations today take tensors or memrefs as their inputs and are hard-wired to these two types. The main reason for this is

Both have types and operations that provide an interface to reason about their shapes, both statically and in IR (dim and rank ops, there should be more like subshape, etc.).
LinAlg has built-in implementations to transform these, like tiling via subview etc.

To generalize linalg, we would need to pull both into interfaces, so that we can implement them also for other types, independent of the linalg dialect/framework itself.

Now, if we do this, we could also be more clear about which of those two properties are actually required for operands of a linalg operation. This is especially interesting, as producer operations (those that require a definition of their output shape, as it cannot be derived from the input) might not need the second interface for their “output tensor” arguments. If the shape.shape type would implement the first interface, it would be a valid operand for such an output argument.

At higher linalg levels, we could have a linalg.broadcast that takes a shape or (in the spirit of ConstantLike) takes another shaped value. Operations that can derive their shapes could implement the interface for shape reification and not carry any shape operand.

When lowering to linalg.generic, all outputs become explicit, using the interface for shape reification. When bufferizing linalg.generic, those arguments can then be queried for their shape to produce an alloc. If they already are a shape type, this is essentially to_extent_tensor. If not, it can be a tensor_from_elements with dim or anything else that is convenient. When we then lower shape to fall back on descriptors (as done today in kernel generator) or we lower it to actual computation for descriptor-less designs (does not exist), this should all canonicalize away to the actual computation that is needed.

So, with that view as the long term goal, the shape.to_hollow is essentially a cast that takes today’s shape type and casts it to a type (tensor) that linalg can understand today. It would allow us to built the above overall approach, while making progress without having the interfaces fully cut out.

herhut · December 16, 2020, 11:49am

For more front-end specification dialects, having a splat and constant_like operation makes a lot of sense for brevity of specification. I am mostly wondering what the best model for lower-end transformation and codegen dialects is and whether we can unify this so that we do not have to support both paths.

MaheshRavishankar · December 16, 2020, 6:12pm

Stepping back a bit, I am going to lay down my concerns/priorities (knowing that really you could make anything work, but question of the best way of doing it).

Lets take this example (from here)

%1 = linalg.generic {indexing_maps=[affine_map<(d0, d1) -> (d1)>,
                                    affine_map<(d0, d1) -> (d0, d1)>],
                     iterator_type = ["parallel", "parallel"]}
                    ins(%0 : tensor<4xf32>) {
  ^bb0(%arg0 : f32) : 
    linalg.yield %arg0
  } -> tensor<?x4xf32>

Here the ? depends on the shape of the output which is the trigger for the issue. So the proposed solution is

%1 = ....
%2 = linalg.generic {indexing_maps=[affine_map<(d0, d1) -> (d1)>,
                                    affine_map<(d0, d1) -> (d0, d1)>],
                     iterator_type = ["parallel", "parallel"]}
                    ins(%0 : tensor<4xf32>)
                    init(%1 : tensor<?x4xf32, shape_only = 1> {
  ^bb0(%arg0 : f32) : 
    linalg.yield %arg0
  } -> tensor<?x4xf32>

Here there is a new %1 tensor that is introduced with shape_only = 1. This is fine, and it solves the issue. The main point for me here is that there should be no case where you need to look at the value of any element in %1. I don’t see it is an optimization issue, but see it as a specification/requirement.

In this patch I just prototyped a linalg.init_tensor that would produce the %1 above

%1 = linalg.init_tensor [%s, 4] : tensor<?x4xf32, shape_only = 1>

This operands of this operation are only related to the shape of the tensor, and deliberately doesnt specify a value to use. It should be illegal to look at the value of this tensor. I don’t see this as an optimization. The contract of tensor<?x4xf32, shape_only = 1> is that you need the tensor for the shape and that shouldn’t be violated.

stellaraccident · December 16, 2020, 6:31pm

Leaving aside spelling, the progression here makes sense to me. The semantic here is wanting to producing a “tensor” (ideally with some broadened type system support eventually) that cannot be read and therefore is created without a value. The transformations for linalg init tensors must, for correctness, not attempt to read such a “tensor”, and in practice, that means that once all transformations are done, if any uses remain, the program is not correct. In this case, in the short term, the presence of your linalg.init_tensor op at the end of transformations signals that the constraint was violated.

Based on discussions yesterday and today, it seems pretty clear that some corresponding evolution of the ShapedType type system would help here and make enforcement of such constraints a local property. It seems like minimally you do eventually want both an op to produce such “shape-only-tensors” (i.e. linalg.init_tensor here) and some type system support (i.e. call this a shape and make it implement an interface so that it can interoperate with regular value-carrying tensors that also have a shape). But it also seems like you can skate for a while with just the op?

(not to discount the spelling issues - they are important. I just wanted to make sure I’m getting the argument)

mehdi_amini · December 16, 2020, 6:31pm

This isn’t what I understood (see my message right above yours!!): you seem only to only need the shape, not the tensor itself. It isn’t clear to me why we have to use tensor/memref here.

MaheshRavishankar · December 16, 2020, 7:03pm

Thanks Stella, that was the intention and I should have made this clear. As you say after all canonicalizations which involve forwarding shape values to dim statements (also in the patch) the linalg.init_tensor must be dead. Having better representation of this in the type system would just make it easier to verify by construction. In absence of that if a linalg.init_tensor exists after all canonicalization/transformation that would imply an assumed property of the program was violated.

Totally agree with this point (I saw you asked the same question, and I wasnt answering that, I actually had the same question). In my view ideal solution would be there is a type to represent the shape and that it can be used as an operand to linalg operations when needed. (“Method 3” in this post : Resolving shapes of outputs of Linalg operation on tensors · Issue #4205 · iree-org/iree · GitHub) was my approach, but that was deemed as a more intrusive change.

nicolasvasilache · December 16, 2020, 7:26pm

I don’t think it’s too intrusive and it is definitely where we this is going.
Timing-wise it will prob. take ~3 months to get all the pieces right and properly connected.
When this shape type is available and composes properly, I’ll happily use it.

herhut · January 14, 2021, 5:15pm

Coming back to this after the break.

Not sure this falls under spelling but producing a tensor without a value and modelling that in the type system seems strange to me. Why is it a tensor then in the first place?

Note that I am not arguing against this as a first step to make things work.

I agree that using the op is good enough to get this working now. Changing the type system in core to get better static guarantees for this (I hope temporary) hack seems going too far in that direction for me. Longer term, I’d prefer this to be modeled in linalg and break its dependencies on the tensor and memref dialects. That would allow use to use any type to produce a shape value, be it tensor<?> or shape or shapex.

Topic		Replies	Views
[RFC] Promoting `linalg.init_tensor` to the `bufferize` dialect MLIR	43	1525	May 22, 2022
Computing output shapes of structured ops on tensors MLIR	9	1017	April 15, 2020
RFC for introducing `abstract_tensor` as a `builtin` type MLIR	25	942	January 10, 2022
Linal.generic on tensor seems to have a weird semantics MLIR	20	1465	October 11, 2021
Shape constraint representation MLIR	10	1393	December 15, 2020

Linalg and Shapes

Buffer land

Tensor land

Shape dialect

Stepping forward

Related topics