[RFC] Transpose attribute for Linalg matmul operations

We’re currently looking into linalg.batch_reduce_matmul that can have transposed shapes (ie. non-leading reduction dimension), due to how shapes are packed differently for weights and inputs.

My first attempt was to add an attribute specifying which is the batch dimension. However, upon looking at other examples in opdsl, I realised it could possibly be a transpose argument, that would also remove the need for the other matmul variations.

So, here’s the simplification plan:

Merge matmul and matmul_unsigned

It seems the current matmul can already have a different cast type as an attribute, so front-ends can already lower an unsigned matmul with the attribute.

Why do we have the unsigned version?

Add transpose attribute to matmul and remove matmul_transpose_(a|b)

We could create a new FnAttrDef for other Linalg ops (like transpose), default Identity, and then add attributes for transposing A and/or B.

Implementation would be something like:

    cast=TypeFnAttrDef(default=TypeFn.cast_signed),
    t_a=PermutationAttrDef(default=[0, 1]),
    t_b=PermutationAttrDef(default=[0, 1]),
    ...
    domain(D.m, D.n, D.k)
    implements(ContractionOpInterface)
    # Note `cast` and `t_(a|b)` are defined above
    C[D.m, D.n] += cast(U, t_a(A[D.m, D.k])) * cast(U, t_b(B[D.k, D.n]))

Moving on to batch_reduce_matmul

If the above is reasonable, then we can also add transpose support to batch_reduce_matmul and support things like (m, b, k) x (b, k, n) -> (m, n) in addition to the current (b, m, k) x (b, k, n) -> (m, n) shape by transposing A as [1, 0, 2].

Rationale

On CPUs, for TLB friendly access, we pack the weight matrix with a block transpose ([M][N] -> [BM][bm][BN][bn] -> BM][BN][bm][bn]) but then we also have to pack the input and then unpack the result to return the output.

If we could have a more flexible contraction, we could avoid packing other tensors at run time.

Alternatives

New op

We could add a new linalg.batch_reduce_matmul_transpose_a but this would be going in the wrong direction.

Operand specification

Our proposal last year was to not rely on attributes but have other linalg operations to annotate each other. This is my long term goal, but for now, I’m content to de-duplicate implementations of the existing matmul operations and get the batched version working.

Furthermore, to have a true DAG representation, we’d need to introduce a DAG structured matcher API to replace the current usage of the named op variations checking attributes, which is a non-trivial effort.

We want to get there, but via a route that can work before we have the right solution.

Next steps

If this is a good direction, then we should discuss how we implement the transpose in opdsl, but in parallel, we can also start de-duplicating the existing matmul operations.

@ftynse @nicolasvasilache @MaheshRavishankar @mehdi_amini @asiemien @banach-space

6 Likes

Thanks for bringing this up!

There might be a couple or more RFCs hiding here :slight_smile: Let me quickly comment on one specific item that’s been a pain point for us:

Merging the two makes sense to me (“less is more” in this case). In fact, I suggest being a bit more ambitious here and adding other Ops to the mix as well. In particular Convs:

https://discourse.llvm.org/t/numerical-casting-in-linalg-conv-2d/

In general, I think that it would be beneficial if we converged towards a consistent interface for casting.

-Andrzej

1 Like

Ah, yes, but that’s a much tougher fight. :slight_smile:

I agree we should parametrize all linalg operations and this RFC is just the beginning. If this is an agreeable general direction, then we can even do these in parallel.

I’d be happy with some attributes originally (as proposed by @MaheshRavishankar earlier) then move/complement with DAG matchers later.

But my understanding of the opdsl is limited and I believe some support for more complex operation definitions is in order. Hopefully @ftynse can give some pointers there.

I have a high level question. Is this RFC just about the opdsl implementations of Linalg matmul ops or are you also suggesting we merge (e.g. matmul/matmul_unsigned) into one/fewer operations with attributes to control behavior? I am definitely +1 if this is about the latter (with details to be discussed).

Both. Simplify the linalg dialect by drastically reducing the number of operations where we can easily replace the op explosion with some simpler mechanism.

First step is do that with attributes, which is what this RFC recommends.

Second step is do that with “op annotations” plus DAG matchers (ie. an API that matches a matmul(transpose(a), b) DAG “as if” it was a matmul(a, b) #transpose=a “as is” our current matmul_transpose_a(a, b)).

Here, the “annotation” is simply other operations (like casts, transposes, fills) forming a DAG, which don’t need to be linalg only. This should be an API that works across dialects. But that’s certainly for a future RFC.

If we are going to make changes to Linalg named ops, I’d rather forgo opdsl completely and build a set of named operations that can encompass a wider range of named ops currently generated through opdsl.

I think it would be useful to start adding operations that capture contractions and convolutions, and designing it in a way where it can encompass all the variants of matmuls and convolutions that exist today. We can start with matmuls and then go to convolutions.

What I am thinking of is essentially create an operation that is an embodiment of the isContractionOp into an actual operation and using isa<ContractionOp> instead of this method.

Adding a transpose attribute itself is a bit short term IMO. What if we have a broadcast and we want to represent broadcast + matmul as an operation. I think what we really need is a named op, call it linalg::ContractionOp with

  • Indexing maps for each of the opreands
  • Contraction dimensions list
  • An attribute that represents the signedness or unsignedness of each operation. (I must admit I dont know what the story would be for float types though).

Thanks @rengolin for opening up the discussion! Cleaning up the landscape of named ops is long overdue and I’m excited to see the ball rolling.

What I am thinking of is essentially create an operation that is an embodiment of the isContractionOp into an actual operation and using isa<ContractionOp> instead of this method.

  • Indexing maps for each of the opreands
  • Contraction dimensions list
  • An attribute that represents the signedness or unsignedness of each operation. (I must admit I dont know what the story would be for float types though).

+1 to this approach. I don’t think the Contraction dimensions list is necessary though all of that is encoded in the indexing maps. If we wanted to add iterator types or dimension classifications, that would just be for convenience (same as vector.contraction). For floating point element types, the only case that I see as invalid is unsigned fp → fp extension, and that can just be encoded in the verifier. So as an example:

// mixed signedness linalg.matmul
linalg.contraction {
  indexing_maps = [
    affine_map<(d0, d1, d2) -> (d0, d2)>,
    affine_map<(d0, d1, d2) -> (d2, d1)>,
    affine_map<(d0, d1, d2) -> (d0, d1)>],
  cast_fns = [
    #linalg.type_fn<cast_unsigned>,
    #linalg.type_fn<cast_signed>]
  } ins(%0, %1 : tensor<?x?xi8>, tensor<?x?xf16>)
    outs(%fill : tensor<?x?xf32>) -> tensor<?x?xf32>

We could opt to omit cast_fns in the most common case of signed, signed as well.

To me this eliminates the worst part of matching on a linalg.generic, which is matching the body, and also gives basic guarantees about the form of the indexing maps. We could also opt to disallow cases that do not include any contracting dimensions, such as a basic outer product:

// mixed signedness linalg.matmul
linalg.contraction {
  indexing_maps = [
    affine_map<(d0, d1) -> (d0)>,
    affine_map<(d0, d1) -> (d1)>,
  } ins(%0, %1 : tensor<?xf16>, tensor<?xf16>)
    outs(%fill : tensor<?x?xf32>) -> tensor<?x?xf32>

but I’d like to hear others opinions on that.

Does this proposed operation adequately handle your matmul(transpose(a), b) example? By carrying the indexing maps on the matmul operation, reshapes, transposes, and casts can all be almost trivially folded into a linalg.contraction; no need for complex DAG matching of such cases. I see that you are asking for something that goes beyond linalg, but I’m not following how such a proposal fits in to this RFC. IMO some sort of planned DAG matching infrastructure shouldn’t affect how we plan to pare down the set of named operations. I would rather first focus on designing new operations that strike an effective balance between generality and ease of matching.

If we could have a more flexible contraction, we could avoid packing other tensors at run time.

The proposed linalg.contraction op does seem very close to what you’re asking for here also.

Fly on the wall feedback: I hardly see a program these days that doesn’t need all of the representational things that Quinn and Mahesh mention above. I’ve even had to start emitting these directly from frontends to cover all of the mixed precision and broadcast cases that show up in the wild these days. That may be well known, but I just wanted to underline that all of these variants are very mainstream, whereas maybe a couple of years ago, they were not.

Also +1 for not being bound by opdsl for these hero ops: get the representation right, hand coding if needed, versus trying to live within the bounds transcribed by opdsl.

(I wrote that and think it is a bridge too far for these advanced cases)

1 Like

We discussed linalg.contract last year, but the consensus was that it was too generic trying to cater to too many cases at the same time. I’m not against it, but I think we may have to hop through a few iterations (of other less generic named ops) until we have a good representation.

Matching a complex DAG of multiple operations is not that much more complex than matching a complex generic operation with indexing types, affine maps, payload ops and lots of attributes. But that’s not something I’m going to work on in the near term, so I’d be happy to have an interim state where we have less named ops but more complex named ops.

Exactly. This is main our motivation behind the commoning up of linalg operations.

I don’t have a strong preference here, other than being a bit lost on Python code. But in the interest of forward progress, I’d prefer to at least remove the redundant ops in opdsl right now, like matmul_unsigned, matmul_transpose(a|b), etc, before we can start redesigning somewhere else.

We are using some of those in production. While I agree that the needs here justify breaking API changes, I’d like to see us do this in a way that allows a smooth upgrade. So I’d prefer that we define the replacement, give a little time, and then delete the old unless if there is a strong reason to invert that.

So maybe separate concerns here: let’s get the new representations defined in parallel with cleanup. If anyone is using a to-be-deleted op in a way that would need the replacement in order to have something equivalent, we hold off on deleting until the replacement is available.

For my part, of the matmuls, I know we would like this treatment for the transpose variants. I’d need to go search through torch-mlir and a couple of other repos to verify others.

I’m all for cleaning this stuff up. Just want it done in a supportable fashion.

Can you remind me who “we” was and the forum? (I’m trying to place the context and spacing specifics)

I think I do have a strong preference: if opdsl is pushing us into a representation that we wish could be better, we shouldn’t use it. Some of the limitations here are what led to the op explosion you’re trying to address (thank you). A lot of times people don’t realize how limiting it is for these complex cases until too late, and then we’ve got weirdly unusable ops.

In the fullness of time, I wouldn’t mind seeing us drop it entirely, but my primary concern is that we not repeat prior mistakes by using it.

Sorry I missed discussion on this and am probably out of the loop. Is there a discourse thread to read up on regarding “annotating” ops?

I am missing why you would prefer to match a sequence of operations over one sufficiently expressive one. I agree that “catch all” operations, if left to grow unbounded, add complexity in the form of matchers and pattern writing, however excessively atomic operations are subject to a combinatorial explosion of local program structures. For example,

%0 = linalg.transpose : tensor<?x?xf32>
%1 = linalg.matmul ins(%lhs, %0)
%2 = linalg.matmul ins(%0, %rhs)

Here the same transpose “annotates” the operands of two different matmuls. It’s not clear to me whether a matcher would want to match each as a matmul_transpose_b and matmul_transpose_a respectively, or opt to “materialize” the transpose once and keep it separate. IME it’s easier to write a pattern that looks at a DAG like this once and makes a decision up front about what to do with the transpose (fuse vs not) and then lets later patterns be more local.

If for such a matching infrastructure you want to disallow multi-use annotating operations, then I question why not just go straight for a more general operation because as you say, it is at least a little bit less complex.

Apologies if I’m appearing oppositional, I’m just commenting from my experiences in IREE and haven’t done my homework on what you’re asking for (If you have a link to a discussion I’m happy to read :slight_smile: )

indexing types, affine maps, payload ops and lots of attributes

My hope would be that in redesigning named ops, we keep the best part about them which is the implicit body (we should drop the hidden region though). Otherwise indexing maps and two attributes to annotate casting semantics does not seem too difficult to match against to me. linalg::inferContractionDims already exists for taking a list of affine maps and describing each dimension type, and we could add further helpers given how critical path contraction-like operations are. vector.contraction has existed like this for quite some time and has appeared to be successful to me. Also as Stella said, frontends are actually starting to want to emit the more general form.

For matmul_unsigned there has always been a replacement. I really have no idea why anyone would want to create such a named op in the first place. For transpose, I guess we can create the mechanism and let downstream change, but practically it should be a very simple syntactical change (just like unsigned), so it shouldn’t take a lot of code changes.

I have merged the PR before reading this. Feel free to revert for now and we can reapply when you’re not using matmul_unsigned anymore.

I know I have discussed this with a lot of different people, not sure there’s any trace of it in the forums. If we start something upstream I’ll remember all the details, so for now, let’s just hash out what each of us are thinking.

Hindsight is 20/20. :slight_smile:

Let’s use that hindsight and not throw the baby with the bath water.

Annotation isn’t always materialized. Sometimes the matmul algorithm can do the transpose “for free”. In your example, depending on how you want to lower, that’s just two matmuls (one transpose a, another b), or you actually materialize the transpose (because one transpose is cheaper than two), and then those are two identity matmuls.

That’s the power of this notation, but it is more complex to understand, yes. We should not have the one true form, but a mix of different forms, and use what works best for each case.

That’s how it begins… To @stellaraccident poing above, hindsight and all that.

But I’d rather discuss the contraction side on a separate RFC, as we’re already going way out of focus for this cleanup RFC.

I don’t have any issue removing this one. For history, opdsl over its history has been hard to adapt with respect to casting and signedness. It’s a little better today but has a lot of footguns. I think this op was added when it was basically impossible to represent another way. An example of the line I don’t think we should cross again.

Let’s see the replacement code and then decide… Some of this stuff has things leaning against it, and 3am with the person on call to apply the latest llvm patches trying to grok things is not the best way to manage changes like this.

Good or bad, these things are an API that has been around for a while, and I’d like to see a sequence of intelligible change points to such things, preferably without atomic changes to semantics without replacement. Forgive my tired eyes, but it looks to me in this rfc that the new forms will come about in a sequence of steps – which is good but it’s always hard to adapt to an API that is changing like that. Better to get what we want in place, tweak the design of need be, and then switch.

If this looks like I’m making a mountain out of a mole hill when seeing the actual patches, I’ll moderate my opinion based on actuals. I really just want to see us treating this like an API break and signal accordingly.

I’m not sure that’s a baby you are seeing in the bathwater :slight_smile: In any case, I think we’re agreed on the principle (don’t compromise the representation or the c++ API to appease opdsl), and we can debate the identity of the things swimming in the bath tub later.

I’m not going to add to this debate except to say that I agree with Renato on the principle but my intuition tells me that contraction is special… This “specialness” happens to also be encoded in the ContractionOpInterface. I’ve spent a lot of time over many years trying to spell contractions better but the truth tends to win out: at the bottom, there is almost always a few “thick” representations for it.

Can be done either way. The choice is based on experience and practicality.

+1.

My plan is to reduce the duplication in opdsl before we speak of a new design. This will make it easier to work on the new infrastructure. I’d also not try to have ops in this new mechanism that already exist in opdsl (ex. some convolutions there, some here).

I’d create a new op, like contraction, that can replace the other ops in opdsl (with appropriate syntax), and then when we’re using that successfully, we remove the ones from opdsl.

I’d also create a way in the new mechanism to create aliases, so that we can have atomic implementation changes (move logic) but not syntactic / semantic changes in IR. For example, alias convolution and matmul into the new contraction operation as we remove it from opdsl, so that linalg.matmul maps to the new implementation, with semantic guarantees on the switch.

Thanks for bringing this up, Renato! This has been a pain for quite some time. I don’t recall the details but the signed variant or the cast attributes could be the result of unblocking progress while still trying to get consensus on how to go about this.

Encoding the signed/unsigned cast conversion in the op looks promising. However, the complexity could escalate quickly if we consider that distinct signed/unsigned information is needed per operand plus any rounding mode/fast-math or any other cast modifier that each of the potential casting operations may have. Some targets also apply a conversion to the final accumulated result, which would increase even more the encoding overhead if we decided to also make this part of the matmul op. For all these reasons I have mixed feelings about this as matching 2x operand cast op + matmul [+ result cast op] seems relatively simple while keeping the representation simple as well. No strong opinion, though.

Regarding removing the transpose ops, IMO Linalg is a level of abstraction where we start applying optimization strategies that are specific to each matmul flavor. From this point of view, a “regular” matmul is significantly different from a matmul with a or b transposed. Encoding everything into a single operation would lead to having transformations analyzing generic matmul ops and implementing dispatchers for the specific variants. This, IMO, indicates that the intermediate representation is not serving its purpose well. I wonder if a new MatmulOpInterface inheriting from ContractionOpInterface` could give the generality you are looking for while still preserving the transpose matmul ops to discern between the specific variants for case where we still need to do so. WDYT?

My two cents :slight_smile:

@dcaballe That is exactly the reason why we need to start matching DAGs and not increase the number of ops or attributes, but I’m trying to find a common ground so that we can iterate into a more generic form.

The explosion of named ops (and generic forms) for all variations we need will not go away by replacing more ops with attributes, but at least matching them for codegen becomes more manageable. From a very long switch statement for each op (quadratic complexity for single op matching, combinatorial for multiple op matching) we go into pruned search (ops, then attributes) which is still quadratic, but less bad. We can also do smarter things like pick one anchor op (matmul, convolution) and span out into def-use chains. This is harder to do when the “sequence” can be one-out-of-many for each step.

But that’s just matching a DAG, so why stop there? Why encode an attribute for each cast and each transpose to each operand of a matmul when you can just have a cast and a transpose op as producers of the operand? Instead of checking first ops then attributes, you’re just checking ops.

The hard part is the rewriter: how to replace an arbitrary DAG with another arbitrary DAG and how to prove correctness with the surrounding code? At the very least you need to guarantee equivalence of input/output values (like inlining/outlining), but that’s the easy part.

My point is that:

  1. We already have such attributes, so before we create something “better” (new contraction op, DAG rewriters), we can at least reduce the cost of matching endless op variations.
  2. It helps prune the matching, so at least we can talk about “finding a matmul” like it’s just a contraction (parametrized by the transposes, casts, accumulation types).
  3. It’ll serve as a baseline for when we start looking into DAG matchers. In the end, we need to be able to represent each op variant today with a DAG and that DAG needs to be easily replaceable with another DAG equivalent to the current result of a transform, for example, tiling, fusing, distributing.

LLVM has a lot of problem with those, so I’m not saying it will be easy or even doable for all cases. But I think we can mix and match, pick strategies. Same argument as the canonicalization discussion elsewhere: forms are usually not globally canonical (to all purposes), they often depend on usage.

It’s a balance. Attribute laden ops are basically a shorthand for a certain set of canonical forms that all other equivalent DAGs must be reduced to. Where to draw the line for something as storied as mm is a question that mixes taste and experience at a point in time.

Regardless, for the set of everything that needs to materialize in terms of hardware intrinsics or a specific distribution pattern for these ops, a strong canonical form for the DAG that comprises that variant is a requirement for sane optimizations. That will undoubtedly involve a combination of attribute-laden ops and modifier ops. You try to draw the line in a way that is practical and may age well, but the two forms are basically equivalent.

For matmul specifically, the current window of expertise would bias me towards defining a good, mixed precision matmul op via attributes as the core and then having a few modifier sequences defined as canonical forms for the rest. In my experience, that could entail (in order of preference):

  1. Attributes to control the transposition.
  2. Attribute to control the casting behavior of the result.
  3. Attributes to control the casting behavior of operands.
  4. Encoding of restricted broadcasting semantics necessary for the correctness of the IR.
  5. Encoding of scaling parameters necessary for mm of scaled values.

Very subjectively, I would say that the first three fall well within the scope of any 2024-era matmul op. I could be convinced on 4, and 5 probably pushes us into another op family.

1 Like

The place where we have hit roadblocks downstream is in the fixed rank nature of linalg.(batch_)matmul* operations. Browsing the dialect documentation, most other named operations have restrictions on the form of the indexing maps, but otherwise allow for arbitrary rank inputs (linalg.(elementwise_op), linalg.broadcast, linalg.reduce, linalg.transpose). The exception to this are the contraction and convolution variants. So for us to use the contraction variants in tandem with other named ops requires reshapes that introduce non-trivial complexity to transformations like tiling, distribution, and operator fusion.

The other place we have struggled with the limitations of named matmul operations is with layout transformations (packing). There is one path to go from linalg.matmullinalg.mmt4d by introducing tensor.pack operations on the inputs, but that is a very narrow path that requires a jump to a linalg.generic for any of

  1. A different desired inner tile layout
  2. Further packing of the inner dimensions
  3. Partially broadcasted variants like @MaheshRavishankar mentioned

And compounding that with the desire for frontends to emit higher dimensional math like @stellaraccident mentions with 5) has led me to the conclusion that fixed rank named operations isn’t going to scale.

We’ve already started moving towards more general contraction operations in IREE for this reason here: iree/compiler/src/iree/compiler/Codegen/Dialect/GPU/IR/IREEGPUOps.td at a56975ddba226748b3e59efbc879a250b02147fa · iree-org/iree · GitHub

That is part of the reason why I was advocating for a more general contraction operation above, but at this point I’m wondering if linalg named ops are just a different (earlier) level of the stack. My assumption was that if the ostensibly lower level operation, vector.contraction, opts for a more general representation, then we would want the higher level version to express at least as much information, but perhaps there is room for contraction-like operation in linalg with varying levels of generality.