Deprecate `(batch_)matmul_transpose_{a|b}` linalg operations

rengolin · July 8, 2025, 7:21pm

As planned last year, I’m working to remove the transpose variants from OpDSL. Most tests are redundant (transpose variations on the non-transposed tests), but a few of them are not, so I’d like to ask what’s the need and if we can convert to matmul with affine maps.

The non-trivial tests are…

transform-op-pad.mlir, on #145058, by @fabianmc (reviews @nicolasvasilache, @matthias-springer):

Here the dynamic padding test doesn’t seem to need it to be a transposed op. Can we just change it to a simple matmul and keep the original semantics of the test?

matmul-transpose-a.mlir, on #71644, by @c-rhodes (review @banach-space, @MacDue):

This was introduced as an end-to-end test to ArmSME, so I’m wary of assuming I can just replace it with a flat matmul, or even with a matmul + affine maps. Looking for feedback.

Other feedback on the ability to remove those ops would also be welcome.
@MaheshRavishankar @jpienaar @nicolasvasilache @ftynse @mshahid

fabianmc · July 8, 2025, 7:33pm

You might need to update one of the CHECK-DAG, but beside that, yes, you can change it.

banach-space · July 8, 2025, 9:20pm

Thanks!

Yes, we can replace it with matmul + affine maps. The variant with transposition is much more attractive to us, hence the dedicated test.

I can update and verify it for you - some tweaking might be required (we haven’t tested the matmul + affine maps path yet).

-Andrzej

rengolin · July 8, 2025, 9:48pm

I’m doing it now, so perhaps easier if I do and you double check that I did it as expected in your test.

In case you have some time to test on an Arm machine before me, here’s my change:

github.com/rengolin/llvm-project

mlir/test/Integration/Dialect/Linalg/CPU/ArmSME/matmul-transpose-a.mlir

linalg-cleanup

// RUN: mlir-opt %s \
// RUN:   -transform-interpreter -test-transform-dialect-erase-schedule \
// RUN:   -one-shot-bufferize="bufferize-function-boundaries" \
// RUN:   -test-lower-to-arm-sme -test-lower-to-llvm | \
// RUN: %mcr_aarch64_cmd \
// RUN:   -e=main -entry-point-result=void \
// RUN:   -march=aarch64 -mattr="+sve,+sme" \
// RUN:   -shared-libs=%native_mlir_runner_utils,%native_mlir_c_runner_utils,%native_arm_sme_abi_shlib | \
// RUN: FileCheck %s

func.func @matmul_transpose_a(%A : tensor<?x?xf32>, %B : tensor<?x?xf32>, %C : tensor<?x?xf32>) {
  %res = linalg.matmul
            indexing_maps = [
                    affine_map<(d0, d1, d2) -> (d2, d0)>,
                    affine_map<(d0, d1, d2) -> (d2, d1)>,
                    affine_map<(d0, d1, d2) -> (d0, d1)>]
            ins(%A, %B: tensor<?x?xf32>, tensor<?x?xf32>)
                                   outs(%C: tensor<?x?xf32>) -> tensor<?x?xf32>
  %xf = tensor.cast %res : tensor<?x?xf32> to tensor<*xf32>
  call @printMemrefF32(%xf) : (tensor<*xf32>) -> ()

This file has been truncated. show original

banach-space · July 8, 2025, 10:10pm

Your change works after relaxing this condition: llvm-project/mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp at 5f1141ded7845f1879cbf43bc8540f3d6a8dea5e · llvm/llvm-project · GitHub

Relaxing should be safe, but we’ll need to add a test

ftynse · July 8, 2025, 10:16pm

No problem on removing those from me. It has been discussed previously.

nicolasvasilache · July 8, 2025, 10:20pm

Good to remove in my end.

rengolin · July 9, 2025, 12:59pm

@mshahid @asiemien: that clause was introduced to make sure the passes were not caught off guard by the new affine maps. Now we need to move the matmul_transposed logic into the matmul passes.

I can look into it soon, but if you (or anyone else) beat me to it, let me know.

MaheshRavishankar · July 10, 2025, 11:20pm

I know this has been discussed and that we want to move to a “unified” matmul, but the removal of these operations is going to cause a lot of churn in upstream and downstream projects. I’d really hope we can provide some sort of deprecation path for these rather than just dropping support for this in tree. Here are some typical changes needed as far as I can see.

Projects like Torch-MLIR/StableHLO/TOSA will be lowering to batch_matmul/matmul` transpose variants. Those need to be modified to lower to these operations. We probably need some builder methods in-tree to allow for easy migration.
Downstream compilers will be matching on these named ops. We will need recognizer methods for these operations to allow for easy migration there.
I know it is not prescribed, but people do use a lot of Linalg based MLIR in testing. All those will need to eb updated. Linalg is not a stable format, but still while making such heavy changes we will need to provide a glide path for these things to help with migration. Could be a simple IR rewrite passes that move from old path to new path.

With these in place we can then provide a deprecation timeline, and explicitly state how we can have different projects move from all named ops to the new ones. Right now it seems like we are just breaking everyone by dropping the op definitions. That is going to be hugely disruptive to the entire eco system. This needs a clear deprecation plan/timeline/utilities to help with the transition. We dont know who all are using MLIR, but who ever is should be able to work out for themselves broad steps they need to take (even if they are not Linalg/compiler experts).

rengolin · July 11, 2025, 10:13am

There may be some misunderstandings, here. This thread and the PR are not trying to move to a unified matmul (though this is one small step towards that). Here, I’m just performing the cleanup that we are planning for a long time. The removal of these operations has been agreed by those in the discussions and that included you, so maybe a refresh would help.

We first proposed refactoring the matmul operations exactly 1 year ago. That was a long discussion (42 replies, ~1 month), and we agreed to a few steps forward:

Move the operations out of OpDSL, which was done for matmul, batch_matmul, batch_reduce_matmul.
Introduction of linalg contract and a new elementwise.
Remove the conflicting ops, like the old elemwise variants and the transposed variants.

There were other discussions in this topic as well. The explosion of ops in Linalg, following the multiple convolution alternatives being added, culminating with the linalg operation tree proposal. We also had long discussions on both contract and elementwise, all of which you either participated directly, or people in your team did. These were all over the last year, as recent as this month, and all long discussions with many replies and long duration.

So I’m a little surprised when you say:

Now, for the practical side of things, let me try to clarify some points below:

First, let’s focus on the transposed versions, since this thread/PR do not mention the batched versions, nor I have the intention to deprecate them soon.

Here are some findings:

Torch-MLIR has no mention of the transposed variants, so it should need no changes.
TOSA is in-tree and does not use the transposed version, no changes needed.
StableHLO doesn’t seem to use it either.

We’re also discussing the builder methods as specializations, which I’m looking into, as proposed by @qed.

We’re also discussing this on the PR, but here it starts to get fuzzy the upstream / downstream responsibilities. As proposed on the PR, IREE can have those matchers in its own tree, as well as an addition to linalgx for the time being. None of these interim solutions are upstream, but a longer term structured matcher for this cases has been proposed a long time ago by our team when it comes to linalg.generic and op DAG pattern matching.

If this is something that will be beneficial to all upstream, we can start working on it, but this work should not hold the cleanup of Linalg that has been agreed and discussed for a whole year, including heavily with your team.

To be clear, this is not a heavy change at all. And the guide is in the PR already. Once it gets merged, with all the additions we’re discussing in the PR, it will have the necessary tools to let people know how to do that on their downstreams. There is no pass I can create to convert an operation that no longer exists.

There is no deadline, we’re (and here you’re included) changing this very slowly and moving forward. This is one small step in a list of small steps that we’ve been doing together for a whole year. We’ll continue doing these steps, as the community continues to agree (only positive responses so far).

Unlike LLVM IR, MLIR has no deprecation policy because it’s still too young. We’re trying to move into a state where the dialects reach a reasonably stable form to be able to bring a deprecation policy, but right now, this would only slow down progress. That’s why I always include you in the discussions, and why you have been participating on all of them.

Furthermore, we cannot (and should not) have to consider all unknown downstream projects that use MLIR, if they do not participate in the discussions upstream. Whoever has a project based in MLIR should be able to work out for themselves how to move at the upstream speed, as has been the case for LLVM for the last 20 years.

kuhar · July 11, 2025, 3:33pm

I think the suggestion is to have a tool to convert from the deprecated ops to the new format before these get deleted. For simpler IR changes, it’s often customary to include a sed pattern to upgrade existing IR/tests in the PR description. Here, we could have a very short-lived pass to do the upgrade and delete it once the deprecated ops go away. This is what we did when IREE switched from MHLO to StableHLO.

rengolin · July 11, 2025, 4:29pm

This seems like a simple hack that doesn’t even need to live on the repository, since it’s a one-off thing, and people will replace their usages in different ways (ex. additional attributes). Asking for a generic apply-all regex-replacer upstream is not reasonable, IMHO. Is there precedent for this?

Note: I used VSCode regexp-replace for LLVM’s tests. It was very simple.

Such a pass has the same problem as the script above: it depends on usage, and downstream projects use them differently. I don’t think there is a generic pass that would do what everyone wants. This only makes sense to live in the downstream repos.

That is a much larger change than this. Here, we’re going slow precisely to avoid this change-all scenario. A change-all solution would be overkill here.

fabianmc · July 11, 2025, 4:38pm

This is a bit off thread and likely needs it’s own conversation. But I’ve been thinking we need something like clang: clang::Rewriter Class Reference in MLIR to be able to do source to source changes specifically for breaks in tests, and rollout breaking changes more smoothly.

nicolasvasilache · July 11, 2025, 7:32pm

Can’t we mostly genAI this rewrite?

ftynse · July 11, 2025, 9:22pm

This approach sounds like a reasonable compromise to me: provide precise instructions on how to handle transitions downstream. Whoever is doing the upstream transition must be doing something already, so it’s only a matter of documentation, and we are always lacking documentation.

Yes, there is a precedent of providing transition tooling: [RFC] First-Party Support for LLVM Types - #16 by ftynse. I could suggest doing something similar here: have a variant of mlir-opt accepting the syntax of the deprecated op but constructing an equivalent remaining op instead.

I don’t think there is a case for scope-creeping this for a generic IR modernizer as a condition for this specific change to go through.

rengolin · July 11, 2025, 9:31pm

This is exactly what I did in the PR, thanks to @qed’s suggestion.

MaheshRavishankar · July 15, 2025, 12:34pm

I am responding here (after a lot of offline chat on this topic) just to record somethings from my perspective.

I dont see an advantage of landing the transition helpers after the fact? I think it is useful to have transition helpers in tree and remove that in the end.

Of course there will be positive responses if you ignore the negative responses. I have been saying from the start we should double track this. We dont need to delete these operations right now as far as I can see. We can basically have all downstream projects move to not use these operations and then delete them after a notice period with clear dates and agreed schedule. We have been looking at this “for a year”, but with no real deadlines/dates. Its hard to plan things around that. I understand its hard to come up with dates. I know your team has been pushing on the replacement functionality for a while, and thanks for doing that, but as far as I know there wasnt really a timeline associated with this since it was not clear how much time this work would take, and whether we had dedicated developer time to land things. I agree, the functionality exists today, but I dont think that can be counted towards a “deprecation timeline”.

I dont see why that has to be the case. MLIR can have a deprecation policy. This is as good a time to start with as any. Thank you for including me in all the discussions, but there needs to be clearer guidelines of how to adapt to changes. I/We can adapt even without this, cause we are deep into these parts of the codebase in MLIR. Some good examples I can think of for deprecation is

IMO this kind of stance is detrimental to the project. Yes, upstream reserves the right to make breaking changes with good reason, but we do need to be mindful of cost of breakages. If MLIR is being used in “real world”, I guarantee you most people are doing it “on top of their regular job”. So the cost of transition is borne by developers. We need to be extremely mindful of developer time IMO.

fabianmc · July 15, 2025, 2:23pm

We do have some sort of minimal policy: Breaking Changes - Developer Guide - MLIR (though we maybe should revisit with the new governance structure).
Tradition (and the policy) suggest that such changes should have a clear timeline and plan for the depreciation on discourse, which I’d poise wasn’t fully elaborated on for this case. Furthermore, there’s an usual best effort notification added to Deprecations & Current Refactoring - MLIR .

See PSA: `OpTy::create` now with 100% more tab complete for such a PSA example.

Now, on the patch I’d argue that I view it as somewhat problematic that so many tests are getting removed, this seems to indicate that we are temporarily removing functionality, which AFAIK is not what it was proposed in the beginning, so maybe this should receive a bit more explanation or confirmation.

rengolin · July 15, 2025, 2:51pm

I believe I have followed that guide to the letter, since we started this exactly 1 year ago and everyone was involved in every step of the way. Similar to this vector change.

This is meant for much harder problems, spanning a wider net across the MLIR code than removing operations that were already agreed to be deprecated a year ago.

The removed tests were mechanical testing, which no longer apply because the operation doesn’t exist anymore. The non mechanical tests were moved to the new format.

fabianmc · July 15, 2025, 3:11pm

Ok I should have been more explicit with my comment, take for example this test that was removed:

#map = affine_map<(d0, d1, d2) -> (d2, d0)>
#map1 = affine_map<(d0, d1, d2) -> (d2, d1)>
#map2 = affine_map<(d0, d1, d2) -> (d0, d1)>
func.func @matmul_transpose_a(%arg0: memref<5x3xf32>, %arg1: memref<5x7xf32>, %arg2: memref<3x7xf32>) {
  linalg.generic
     {indexing_maps = [#map, #map1, #map2], iterator_types = ["parallel", "parallel", "reduction"]}
     ins(%arg0, %arg1 : memref<5x3xf32>, memref<5x7xf32>) outs(%arg2 : memref<3x7xf32>) {
      ^bb0(%in: f32, %in_0: f32, %out: f32):
       %0 = arith.mulf %in, %in_0 : f32
       %1 = arith.addf %out, %0 : f32
       linalg.yield %1 : f32
  }
  return
}

// CHECK-LABEL: @matmul_transpose_a
// CHECK-SAME: %[[ARG0:.+]]: memref<5x3xf32>, %[[ARG1:.+]]: memref<5x7xf32>, %[[ARG2:.+]]: memref<3x7xf32>) {
// CHECK-NOT: linalg.generic
// CHECK: linalg.matmul_transpose_a ins(%[[ARG0]], %[[ARG1]] : memref<5x3xf32>, memref<5x7xf32>) outs(%[[ARG2]] : memref<3x7xf32>)

Is there an existing test that verifies that the above specializes to the correct matmul op? Same thing for all the folder, canonicalizations, etc. Do they work as expected with the new matmul ops?

I think there was agreement and I won’t argue that. I also think this change it’s a good step forward. But, AFAICT there was no timeline or notice that removal would happen at a particular date or with a particular timeline in advance, but maybe I missed it.

Not necessarily, see the depreciation notice of Removal of vector.reshape in the page. I would also argue we should do a better effort with putting important depreciations there, however, this is not policy and it’s best effort atm.

Topic		Replies	Views
[RFC] Transpose attribute for Linalg matmul operations MLIR linalg	41	1477	August 13, 2024
[PSA] Deprecating linalg named elementwise ops Deprecation & Important Refactoring linalg	6	440	June 23, 2025
MLIR News, 74rd Edition (1st March 2025) Newsletter llvm-weekly	0	349	March 1, 2025
[RFC] Move `tensor.pack` and `tensor.unpack` into Linalg MLIR linalg	27	1190	January 22, 2025
[RFC] Primitive Ops: add MapOp, ReductionOp, TransposeOp, BroadcastOp to Linalg MLIR	33	2542	August 8, 2022

Deprecate `(batch_)matmul_transpose_{a|b}` linalg operations

Related topics