Transform Dialect Cheatsheet

I constantly find myself going over and over older scripts and having trouble finding the information I need easily, so I thought I’d just start a living cheatsheet of the essential transformations I regularly use.

This is by no means complete and is meant to evolve over time.

The list also includes transforms we use on the IREE side and refixed by transform.iree. that are not available upstream for various reasons.

This is only meant to be a quick reference for people who already know how to use these properly.
This list should also help increase scrutiny on these ops and trigger refactorings/ergonomics improvements/type propagation/cleanups and ideally deduplication where appropriate.
For proper details please see the various op docs.

If it helps me, I expect this will also help others: @qcolombet @chelini:

/// Tiling ops implementing TileableInterface to parallel loops on tensors.
%foreach_thread_l2, %matmul_l2 =
  transform.structured.tile_to_foreach_thread_op %matmul_l1 tile_sizes [16, 16]
    ( mapping = [#gpu.thread<y>, #gpu.thread<x>] )

// Tile a reduction dimension at position ``` to `5` threads and chunks padded to `3` to guarantee 
// static sizes. The step size will be `5 * 3 = 15`.
// Map the filed parallel dimension to `#gpu.thread<x>`.
// This is a more dynamic way of implementing split-K with a foreach_thread.
%foreach_thread, %fill, %more_parallel_generic, %combiner =
  transform.structured.tile_reduction_using_foreach_thread %0 
    by num_threads = [0, 5], tile_sizes = [0, 3], mapping = [#gpu.thread<x>]

// IREE's first level of tiling must connect to the "workgroup_count_region".
%foreach_thread_l1, %matmul_l1 =
        %matmul tile_sizes [128, 128]
    ( mapping = [#gpu.block<y>, #gpu.block<x>] )

/// Tiling ops implementing TileableInterface to sequential loops on tensors.
%matmul_l2, %loops:3 = 
  transform.structured.tile_to_scf_for %matmul_l1 [16, 16, 16] 
    { interchange = [1, 0, 2] }

// Tile a reduction dimension by [0, 5] and pad to guarantee static sizes.
// This is a more dynamic way of implementing split-K with a loop.
%loop, %fill, %more_parallel_generic, %combiner = 
  transform.structured.tile_reduction_using_scf %0 by tile_sizes = [0, 5]

/// Controlled pattern application.
%func = transform.structured.match ops{["func.func"]} in %variant_op 
  : (!pdl.operation) -> !pdl.operation
%func_2 = transform.iree.apply_patterns %func 
  {  rank_reducing_linalg, rank_reducing_vector }

Feel free to send suggestions in comments, or even better: send patches to fix the ergonomics/cleanup of existing ops directly, and I’ll update accordingly.