Best way to model transpose op "permutation"

_sean_silva · March 7, 2020, 10:25pm

Consider an op:

%1 = "foo.transpose"(%0) {permutation = [0,2,1]} : (tensor<3x4x5xf32>) -> tensor<3x5x4xf32>

This is a common operation in most “ML dialects”. For more details, see e.g. XLA’s description of the semantics.

We currently model this in the xla_hlo dialect downstream as an I64ElementsAttr.

However, I’ve found that to be pretty clunky, and I find myself constantly using helpers like:

    auto extract1DVector = [](DenseIntElementsAttr elements) {
      SmallVector<int64_t, 6> ret;
      for (const APInt &element : elements) {
        ret.push_back(element.getLimitedValue());
      }
      return ret;
    };
    auto make1DElementsAttr = [&rewriter](ArrayRef<int64_t> integers) {
      auto type = RankedTensorType::get({static_cast<int64_t>(integers.size())},
                                        rewriter.getIntegerType(64));
      return DenseIntElementsAttr::get(type, integers);
    };

I’d like to reduce the boilerplate associated with this. I was thinking of just adding similar helpers to DenseIntElementsAttr proper, but was wondering if folks had any thoughts on better modeling of this before adding special-casey stuff to DenseIntElementsAttr for this.

Thoughts?

River707 · March 7, 2020, 10:48pm

extract1DVector
If it is known to be an I64 just do elements.getValues<int64_t>().
make1DElementsAttr
The Builder already has a helper for getI32VectorAttr, we can just add more helpers there.

nicolasvasilache · March 7, 2020, 10:55pm

Hi Sean,

On my end, I was thinking of extending Linalg.transpose to work on tensors. Would that work for you?

Transpose is one of those special ops that can be implemented in metadata only or can be materialized in a Linalg.generic that moves data.

The idea is one can go back and forth between the version that performs copies or not for the purpose of tiling and taking subviews (with potential overlap) and then get back in metadata-only form inside each tile.

As usual it depends on the type of transformations you want. If you only care about transportation of indexing logic in the type system at the tensor level + rewrite patterns and canonicalizations you don’t have such needs to worry about the mapping back and forth to generic.

Still I’m suggesting that if we use the same abstraction it will benefit everyone. Since I have laid some of the design considerations at the buffer transformation level, I think the semantics I described are useful. Care to take a look at Linalg.transpose and give a shot at generalizing it?

It probably also needs some TLC on the verifier front…

If you see issues with the semantics it is very open to evolution/change. Would be great to consider the buffer world if changes are required.

Thanks!

bondhugula · March 9, 2020, 3:34am

Any reason you are using RankedTensorType instead of VectorType here? It looks like you always need a (1-d) vector here.

We probably don’t need a builder method here, but just something like DenseIntElementsAttr::getVector<T>(ArrayRef<int64_t> elements) where T could an elt type like i32, i64 - which is what @_sean_silva is suggesting I think. In fact, Builder::getI32VectorAttr can be removed then.

River707 · March 9, 2020, 3:51am

You still need to pass in the context, which makes the builder API still useful. Also, the type of ‘elements’ should be enforced to match ‘T’.

bondhugula · March 9, 2020, 4:11am

That’s right - we’ll need to send the context as well this way.

_sean_silva · March 9, 2020, 7:54pm

Hi Nicolas, I’m really excited to see how this all can be represented in Linalg, but unless we are rewriting IREE’s VMLA reference backend to lower via linalg instead of xla_hlo/std ops, it’s orthogonal to my concerns now.

Maybe the most actionable thing I could ask you to do is to participate in the Tensor Compute Primitives discussion (if you’re not already) so we can move that forward and break our (IREE’s) effectively historic dependence on xla_hlo as “tensor compute primitives” which is part of the reason I end up having to touch this stuff in xla_hlo’s representation so much.

_sean_silva · March 9, 2020, 7:56pm

VectorType support multidimensional vectors and has the same API as RankedTensorType w.r.t. construction.

_sean_silva · March 9, 2020, 7:58pm

To expand on that, “VectorType” does not mean “vector” in the sense of a mathematical rank-1 tensor, but rather as a “CPU vector register”; it’s a ShapedType just like TensorType since we don’t want to close the door to hardware targets with multidimensional vector registers.

bondhugula · March 9, 2020, 8:09pm

Right - I understood that But RankedTensorType and VectorType differ in two ways as you already know: (1) you can have a tensor of vector elt type, but you can’t have a vector of vector elt type (enforced in its ‘get’ method), (2) vector shapes are always static/constant. I guess these aren’t relevant for your purpose and you could use either, but vector came to my mind first because the number of elements you have in the vector is fixed.

_sean_silva · March 9, 2020, 11:13pm

Perfect. Thanks River. I made a patch adding getI64VectorAttr: ⚙ D75883 Add Builder::getI64VectorAttr.

_sean_silva · March 9, 2020, 11:19pm

+1! I think VectorType totally makes more sense for a small static list like this. Sadly, it looks like a lot of the code I’m looking at happens to use a tensor type.

_sean_silva · March 19, 2020, 2:55am

Heh, actually VectorType is not the right type here, since VectorType doesn’t allow zero-element vectors, whereas a transpose permutation and other similar uses (such as broadcast dimensions) can frequently result in degenerate cases with empty lists.

So I’m adding Builder::get{I32,I64}TensorAttr
https://reviews.llvm.org/D76403

Is there any reason to have Tensor And Vector? Should we just remove get{I32,I64}VectorAttr? This seems like a big gotcha…

bondhugula · March 19, 2020, 3:16am

I see - I completely missed that. Does RankedTensorType allow zero size along a dimension - the doc comment says -1 or strictly positive.

_sean_silva · March 19, 2020, 3:18am

Probably a typo. Should probably say “non-negative” for clarity. Langref says for tensor: " Each dimension may be a static non-negative decimal constant or be dynamically determined (indicated by ? )."

bondhugula · March 19, 2020, 3:23am

Yes, shapes like 0xi32 work fine. The doc comment needs to be fixed. You are right that you need a RankedTensorType because of the zero element case.

antiagainst · March 19, 2020, 3:53pm

I still would like to see the vector builders. It is used in SPIR-V CodeGen to convey information like workgroup size and others. I don’t feel using tensor there is a conceptually cleaner way.

clattner · March 19, 2020, 4:18pm

Wouldn’t it make sense to allow zero element vectors?

jpienaar · March 19, 2020, 4:22pm

Tensor and Vector model different things. Tensor can be used to carry the same information as Vector but the two types represent different things. Vector represents a machine vector register, while Tensor is completely abstract value that never need be materialized during execution and so can carry mostly arbitrary types/functors as values. Vector should be used towards lowering and ops supporting it work with more specialized types/restricted inputs, having those ops need to verify the supported element type and rank of tensors seems unfortunate. Conceptually tensor of rank 0 is a scalar and so tensor could be used to model a lot of other things but I don’t think that adds value/obscures uses and ops. So while there are overlap they model very different things to me and so the use should correspond to what is being modelled.

As to the original question: doesn’t transpose in general need to take the permutation as operand? And so in the move to dynamic types we have to revisit that op.

bondhugula · March 19, 2020, 4:40pm

While on this, I wanted to ask: what’s the difference between Builder::getI32ArrayAttr (which exists) and the getI32TensorAttr that you are adding? Both take the same args. (I know that ArrayAttr can store elements of different types in the array, but that’s not what getI32ArrayAttr gives us.) Is there duplication here? (in fact triplication for a majority of cases given getI32VectorAttr) The implementation details might be different, but these look functionally equivalent (ArrayAttr, DenseIntElementsAttr) and a user can’t tell which one to use.

Topic		Replies	Views
How to build linalg.TransposeOp in mlir pybind? MLIR	18	538	October 30, 2023
[RFC] Introduce a new Dense Array attribute MLIR	8	1044	July 3, 2022
Arguments in tosa.transpose MLIR	2	340	November 30, 2021
1d vector permutation op using Linalg OpDSL MLIR linalg	6	147	November 30, 2023
Vector Dialect Ops for Intel Intrinsics - Optimizing linalg.copy MLIR	5	771	January 26, 2021

Best way to model transpose op "permutation"

Related Topics