PSA: Retire `tileAndFuseLinalgOps` method

Patch D129901 removes the tileAndFuseLinalgOps method from Linalg. Concretely this means
a) The method that automatically tiles and fuses a sequence of untiled linalg operations with buffer semantics is deprecated. The core functionality to fuse an untiled producer with a tiled consumer is still available if needed. What is removed is the method that works on a sequence of untiled operations and tries to “automagically” create the tile + fused code. This logic turns out to be extremely complex in general, cannot account for all use cases, and inherently limited
b) The same method performed similar transformations on sequence of untiled linalg operations with tensor semantics. There are other ways in Linalg/MLIR to achieve the same. The preferred way is to use the approach that uses the TilingInterface to tile and fuse a sequence of untiled operations. Instead of being automatic, it expects the caller to set the correct options to ensure the generated code respects all dependencies. To prove that this is indeed equivalent, the tests from the deprecated method (on tensor semantics ops) have been moved over to tests that check the tile + fuse pattern that relies on TilingInterface.

3 Likes

Really glad to see these cleanups landing. Deprivileging linalg and encapsulating the behavior needed for these transformations in interfaces that anyone can implement is a really important step to generalizing all of this. I know that this has taken a lot of work. Thank you for doing it.

Hi @MaheshRavishankar. Is there any timeline on deprecation of API tileConsumerAndFuseProducers. do you have any plans to have a similar API with TilingInterface covering Linalg and non-Linalg Ops? thanks --Mohan

Hi Mohan,

Yes. The linalg::tileConsumerAndFuseProducersGreedily and linalg::tileLinalgOp need to be deprecated. In terms of core functionality, the scf::tileConsumerAndFuseProducerGreedilyUsingSCFForOp and scf::tileUsingSCFForOp should be on par with those methods. The first target is to deprecate linalg::tileConsumerAndFuseProducerGreedily with the TilingInterface counterpart. CheYu from the IREE team is looking into it for now, but we dont have a timeline yet. After we make some headway in terms of having a patch to do it, we will have a PSA on it as well.

Would help if I could get some idea of what part of this is of interest to you?

Thanks Mahesh. we will go ahead with scf::tileConsumerAndFuseProducerGreedilyUsingSCFForOp usage.

Before retiring the older functionality, it was possible to perform simple linalg.ops tiling with scf.for loops with a convenient rewriter pattern.

///...
LinalgTilingOptions tilingOptions;
tilingOptions = tilingOptions.setTileSizes(tileSizes);
patterns.add<LinalgTilingPattern>(opName, ctx, tilingOptions, kLinalgTransFilter);

This file had an example: llvm-project/Transforms.h at 83c65fbc2842909444bfe0a74ed083d164381078 · llvm/llvm-project · GitHub

Is the preferred approach to implement similar behavior to use/write a pass similar to TestTileUsingSCFForOp?

I think the preferred approach is to use the transform dialect to write these transformations.
With respect to TestTilingInterface.cpp firstly using the pattern based approach is definitely not preferable. It’s on my to-do list to drop the patterns and instead just do a walk on the function and call the core transformation method directly. If you can’t use transform dialect, that approach would be better. The pattern based approach works best for fixed point/work list based algorithms. Doing tiling etc. isn’t fixed point. The transformation just needs to be applied once/fixed number of times. So a walk and calling the core transformation would do the trick.

1 Like

Thank you for the reply. I have not played much with the transform dialect yet and will further investigate it.

On initial investigation I am trying to understand how an optimization+lowering end-to-end pipeline would be represented.

Do transform transformations happen greedily at any point of a lowering pipeline?

Are the steps below the correct way to use the transform dialect?

  1. Translate into MLIR
  2. Tell the pass pipeline to use transform transformations greedily
  3. Parse (from external MLIR file) the transform dialect information that does not require analysis
  4. PM.addPass(...) all the other lowering and optimization passes
  5. Lower > Analysis > builder.create new transform.ops that are IR depended
  6. Pipeline will be executed and transform transformations will happen when pattern is matched at any abstraction as the IR is being lowered

Is there an example in tree that uses transform in an end-to-end (tensor to llvm IR) scenario?

I found this example: llvm-project/test-conv-3d-ndhwc-dhwcf-call.mlir at main · llvm/llvm-project · GitHub

This file already contains the transform.sequence ops in the IR, which I will be missing on an end-to-end pipeline.

  transform.sequence failures(propagate) {
  ^bb0(%arg0: !pdl.operation):
    %0 = transform.structured.match ops{["linalg.conv_3d_ndhwc_dhwcf"]} in %arg0
    %tiled_linalg_op, %loops:3 = transform.structured.tile %0[0, 5, 5, 5]
  }

@MaheshRavishankar, sorry if this is a trivial question…
When translating and lowering from other languages, ex tf->tosa->linalg, is the recommended way to use a builder to create the transform op using a custom pass (probably after TosaToLinalg pass) and then run the TestTransformDialectInterpreterPass, followed by the TestTransformDialectEraseSchedulePass?

There is no notion of pass manager greediness or any sort of automagic with the transform dialect. Transform dialect application happens in a pass. We expect downstreams to roll their own pass for this purpose (probably should mention this in the documentation) so they can register additional transform operations and their dependencies, e.g., the dialects that can be produced in the payload IR. For experimentation, you can use -test-transform-dialect-interpreter that will take all top-level transform operations, such as transform.sequence, and apply them to the module in which they are contained. Other setups are possible, for example, one could load the top-level transform operations from a separate file or construct them on-the-fly. IREE has an example here iree/TransformInterpreterPassBase.h at main · iree-org/iree · GitHub with loading from an external file and here https://github.com/iree-org/iree/pull/11886 with on-the-fly construction. This is entirely up to the client pass, the infrastructure takes over with the call to applyTransforms.

You can think of transform dialect operations as IR representation of Linalg*Options. With an additional benefit of being able to say something like “tile and then fuse the result of tiling” by having only “tile” and “fuse” primitives, no need to introduce “tileAndFuseAndMapAndVectorize” to keep targeting the same op.

Why would you miss it? It can be created at any point.

This is a supported way of using transform dialect, but not the only one. If the payload IR allows for transform dialect to be attached, e.g., at the end of the module, without breaking the verifier, this is probably the easiest one. Note that TestTransformDialect* passes are intended for testing as their name indicates. We expect clients to roll their own with proper registration.

1 Like

Thank you for the reply. This information and your open meeting presentation helped me have better mental picture on the motivations and how to use this dialect. Having transformations represented in IR text without recompiling the tool does indeed provide more DSE opportunities that would be interesting to explore.

Got it. I tried to convey this idea in my example, the transform.sequence ops would not exist yet. Indeed, they would have to be created.

Noted, I am already spinning my own version Interpreter pass. Thank you for the pointer.

I do question what is the intended way for someone downstream to create and maintain their transformations/optimizations in standalone tools. It seems that you are advocating for a database of transform.sequence files, instead of using the builder to create the transformations. At first I thought that .mlir based transforms would be less permanent and more prone to changes, if compared to C code… but after reflecting a bit about it I think dialect operations tend to change less than MLIR C code.
Excited to see where this is going.

Not necessarily files and likely with more top-level combiners than just transform.sequence, but separating transformation strategies/decisions from implementation details is indeed the goal. An important aspect here is the discipline in maintaining that separation that is enforced by ops.

Personally, I’m on the side of strategies being mostly static and mostly declarative because there are significant benefits to this (analyzability, composability, etc). For storage, the generic op syntax is quite stable (though op semantics/verifier may change over time), and the bytecode is designed to be even more stable. In any case, it is much easier to write a dialect-modernize tool on IR that would update the “database” that it is to automatically update C++ code generating it. That being said, I don’t hold this opinion strongly and think that the infrastructure should support all use cases.

1 Like