CuTe:
Abstractions for defining and operating on hierarchically multidimensional layouts
• Each layout is a composition of a shape and a stride
• Shapes and strides can be fully static, dynamic, or have mixed static/dynamic components
• Compactly package the type, shape, memory space, and layout of data
• Performing the complicated indexing for the user
We came across this framework in CUTLASS, called CuTe,
CuTe is a collection of C++ CUDA template abstractions for defining and operating on hierarchically multidimensional layouts of threads and data. CuTe provides Layout and Tensor objects that compactly package the type, shape, memory space, and layout of data while performing the complicated indexing for the user. This lets programmers focus on the logical descriptions of their algorithms while CuTe does the mechanical bookkeeping for them. With these tools, we can quickly design, implement, and modify all dense linear algebra operations.
It’s a dialect in MLIR that provides operations that can be used to control the transformation of the IR using a different portion of the IR. The main idea behind this dialect is to allow domain experts, compiler writers, or users to apply code transformations without changing the computations captured in payload IR, and the user writes the transform sequences in the transform IR.
Our Interest in transform dialect is to enable data layout transformations and optimize data movement using the transform IR.
Transform on memref:
Memref has support for data layout and is too low-level for a user. We want it to be done through transform dialect and enable users to specify whether to copy it or get a view of the data without moving it. To do so, we want input on how the CuTe could map to the transform dialect work and MLIR framework in general. Looking at the CuTe we see an example which mentions that CuTe uses strides and shapes as the primary input for doing the mappings. This seems familiar with the memref type layout attribute. We want to understand the feasibility of mapping the parts of CuTe through transform dialect, which targets a memref operation and changes the data layout specified by the user in Transform Op. Our proposed operation is shown in the following example:
Payload IR:
...
Transform IR:
transform.sequence failures(propagate) {
^bb1(%arg1: !pdl.operation):
%0 = transform.structured.match ops{["memref.alloc"]} in %arg1
%1 = transform.memref.data_layout %0 {stride : [2,2], shape : [1,2]} // Operation "data_layout"
}
We need feedback and comments from the community to understand the support for this idea in MLIR, specifically Transform Dialect. (@ftynse @nicolasvasilache)
Thank you!