[RFC] Data layout in Transform dialect and CuTe library

CuTe:
Abstractions for defining and operating on hierarchically multidimensional layouts
• Each layout is a composition of a shape and a stride
• Shapes and strides can be fully static, dynamic, or have mixed static/dynamic components
• Compactly package the type, shape, memory space, and layout of data
• Performing the complicated indexing for the user

We came across this framework in CUTLASS, called CuTe,

CuTe is a collection of C++ CUDA template abstractions for defining and operating on hierarchically multidimensional layouts of threads and data. CuTe provides Layout and Tensor objects that compactly package the type, shape, memory space, and layout of data while performing the complicated indexing for the user. This lets programmers focus on the logical descriptions of their algorithms while CuTe does the mechanical bookkeeping for them. With these tools, we can quickly design, implement, and modify all dense linear algebra operations.

Transform Dialect:

It’s a dialect in MLIR that provides operations that can be used to control the transformation of the IR using a different portion of the IR. The main idea behind this dialect is to allow domain experts, compiler writers, or users to apply code transformations without changing the computations captured in payload IR, and the user writes the transform sequences in the transform IR.

Our Interest in transform dialect is to enable data layout transformations and optimize data movement using the transform IR.

Transform on memref:

Memref has support for data layout and is too low-level for a user. We want it to be done through transform dialect and enable users to specify whether to copy it or get a view of the data without moving it. To do so, we want input on how the CuTe could map to the transform dialect work and MLIR framework in general. Looking at the CuTe we see an example which mentions that CuTe uses strides and shapes as the primary input for doing the mappings. This seems familiar with the memref type layout attribute. We want to understand the feasibility of mapping the parts of CuTe through transform dialect, which targets a memref operation and changes the data layout specified by the user in Transform Op. Our proposed operation is shown in the following example:

Payload IR:
...

Transform IR:

transform.sequence failures(propagate) {
  ^bb1(%arg1: !pdl.operation):
  %0 = transform.structured.match ops{["memref.alloc"]} in %arg1
  %1 = transform.memref.data_layout %0 {stride : [2,2], shape : [1,2]} // Operation "data_layout"
  }

We need feedback and comments from the community to understand the support for this idea in MLIR, specifically Transform Dialect. (@ftynse @nicolasvasilache)

Thank you!

I took a quick look at the library. Most of the layouts can indeed be expressed as strided layouts on the memref type. We don’t have the hierarchical approach, but it is likely to be expressible as an affine map or a composition thereof.

A big chunk of the functionality is operations on the layout object itself (tile, flatten, etc.). Those can be expressed in the transform dialect, but you need to decide on the (payload) IR modeling of the layout. The other big chunk is primitive computational operations on multidimensional arrays (called tensors in CuTe, but with different semantics than what MLIR understands for tensors) that we either already have or can derive using Linalg (copy is available, so is pure matmul, but no gemm AFAIK).

Could you clarify what specifically would you like to achieve with the transform dialect here? I can’t immediately understand what the proposed data_layout operation would do. One can’t just assign a data layout to a memref: some layouts can be obtained naturally as a subview of an existing memref, others will require copying.