Best way to model transpose op "permutation"

Consider an op:

%1 = "foo.transpose"(%0) {permutation = [0,2,1]} : (tensor<3x4x5xf32>) -> tensor<3x5x4xf32>

This is a common operation in most “ML dialects”. For more details, see e.g. XLA’s description of the semantics.

We currently model this in the xla_hlo dialect downstream as an I64ElementsAttr.

However, I’ve found that to be pretty clunky, and I find myself constantly using helpers like:

    auto extract1DVector = [](DenseIntElementsAttr elements) {
      SmallVector<int64_t, 6> ret;
      for (const APInt &element : elements) {
        ret.push_back(element.getLimitedValue());
      }
      return ret;
    };
    auto make1DElementsAttr = [&rewriter](ArrayRef<int64_t> integers) {
      auto type = RankedTensorType::get({static_cast<int64_t>(integers.size())},
                                        rewriter.getIntegerType(64));
      return DenseIntElementsAttr::get(type, integers);
    };

I’d like to reduce the boilerplate associated with this. I was thinking of just adding similar helpers to DenseIntElementsAttr proper, but was wondering if folks had any thoughts on better modeling of this before adding special-casey stuff to DenseIntElementsAttr for this.

Thoughts?

  • extract1DVector
    If it is known to be an I64 just do elements.getValues<int64_t>().

  • make1DElementsAttr
    The Builder already has a helper for getI32VectorAttr, we can just add more helpers there.

Hi Sean,

On my end, I was thinking of extending Linalg.transpose to work on tensors. Would that work for you?

Transpose is one of those special ops that can be implemented in metadata only or can be materialized in a Linalg.generic that moves data.

The idea is one can go back and forth between the version that performs copies or not for the purpose of tiling and taking subviews (with potential overlap) and then get back in metadata-only form inside each tile.

As usual it depends on the type of transformations you want. If you only care about transportation of indexing logic in the type system at the tensor level + rewrite patterns and canonicalizations you don’t have such needs to worry about the mapping back and forth to generic.

Still I’m suggesting that if we use the same abstraction it will benefit everyone. Since I have laid some of the design considerations at the buffer transformation level, I think the semantics I described are useful. Care to take a look at Linalg.transpose and give a shot at generalizing it?

It probably also needs some TLC on the verifier front…

If you see issues with the semantics it is very open to evolution/change. Would be great to consider the buffer world if changes are required.

Thanks!

Any reason you are using RankedTensorType instead of VectorType here? It looks like you always need a (1-d) vector here.

We probably don’t need a builder method here, but just something like DenseIntElementsAttr::getVector<T>(ArrayRef<int64_t> elements) where T could an elt type like i32, i64 - which is what @_sean_silva is suggesting I think. In fact, Builder::getI32VectorAttr can be removed then.

You still need to pass in the context, which makes the builder API still useful. Also, the type of ‘elements’ should be enforced to match ‘T’.

That’s right - we’ll need to send the context as well this way.

Hi Nicolas, I’m really excited to see how this all can be represented in Linalg, but unless we are rewriting IREE’s VMLA reference backend to lower via linalg instead of xla_hlo/std ops, it’s orthogonal to my concerns now.

Maybe the most actionable thing I could ask you to do is to participate in the Tensor Compute Primitives discussion (if you’re not already) so we can move that forward and break our (IREE’s) effectively historic dependence on xla_hlo as “tensor compute primitives” which is part of the reason I end up having to touch this stuff in xla_hlo’s representation so much.

VectorType support multidimensional vectors and has the same API as RankedTensorType w.r.t. construction.

To expand on that, “VectorType” does not mean “vector” in the sense of a mathematical rank-1 tensor, but rather as a “CPU vector register”; it’s a ShapedType just like TensorType since we don’t want to close the door to hardware targets with multidimensional vector registers.

Right - I understood that :slight_smile: But RankedTensorType and VectorType differ in two ways as you already know: (1) you can have a tensor of vector elt type, but you can’t have a vector of vector elt type (enforced in its ‘get’ method), (2) vector shapes are always static/constant. I guess these aren’t relevant for your purpose and you could use either, but vector came to my mind first because the number of elements you have in the vector is fixed.

Perfect. Thanks River. I made a patch adding getI64VectorAttr: ⚙ D75883 Add Builder::getI64VectorAttr.

+1! I think VectorType totally makes more sense for a small static list like this. Sadly, it looks like a lot of the code I’m looking at happens to use a tensor type.

Heh, actually VectorType is not the right type here, since VectorType doesn’t allow zero-element vectors, whereas a transpose permutation and other similar uses (such as broadcast dimensions) can frequently result in degenerate cases with empty lists.

So I’m adding Builder::get{I32,I64}TensorAttr
https://reviews.llvm.org/D76403

Is there any reason to have Tensor And Vector? Should we just remove get{I32,I64}VectorAttr? This seems like a big gotcha…

I see - I completely missed that. Does RankedTensorType allow zero size along a dimension - the doc comment says -1 or strictly positive.

Probably a typo. Should probably say “non-negative” for clarity. Langref says for tensor: " Each dimension may be a static non-negative decimal constant or be dynamically determined (indicated by ? )."

Yes, shapes like 0xi32 work fine. The doc comment needs to be fixed. You are right that you need a RankedTensorType because of the zero element case.

I still would like to see the vector builders. It is used in SPIR-V CodeGen to convey information like workgroup size and others. I don’t feel using tensor there is a conceptually cleaner way.

Wouldn’t it make sense to allow zero element vectors?

Tensor and Vector model different things. Tensor can be used to carry the same information as Vector but the two types represent different things. Vector represents a machine vector register, while Tensor is completely abstract value that never need be materialized during execution and so can carry mostly arbitrary types/functors as values. Vector should be used towards lowering and ops supporting it work with more specialized types/restricted inputs, having those ops need to verify the supported element type and rank of tensors seems unfortunate. Conceptually tensor of rank 0 is a scalar and so tensor could be used to model a lot of other things but I don’t think that adds value/obscures uses and ops. So while there are overlap they model very different things to me and so the use should correspond to what is being modelled.

As to the original question: doesn’t transpose in general need to take the permutation as operand? And so in the move to dynamic types we have to revisit that op.

While on this, I wanted to ask: what’s the difference between Builder::getI32ArrayAttr (which exists) and the getI32TensorAttr that you are adding? Both take the same args. (I know that ArrayAttr can store elements of different types in the array, but that’s not what getI32ArrayAttr gives us.) Is there duplication here? (in fact triplication for a majority of cases given getI32VectorAttr) The implementation details might be different, but these look functionally equivalent (ArrayAttr, DenseIntElementsAttr) and a user can’t tell which one to use.