Hello all!
I would like to ask about tilePerfectlyNested() function. It seems it doesn’t work well for my case.
I have the following MLIR code
affine.for %arg3 = 0 to 2048 step 480 {
affine.for %arg4 = 0 to 2088 step 3 {
affine.for %arg5 = 0 to 2048 step 16 {
affine.for %arg6 = #map0(%arg3) to min #map1(%arg3) {
affine.for %arg7 = #map0(%arg4) to #map2(%arg4) {
affine.for %arg8 = #map0(%arg5) to #map3(%arg5) {
%1 = affine.load %0[%arg7, %arg8] : memref<2088x2048xf32>
%2 = affine.load %arg0[%arg7, %arg6] : memref<2088x2048xf32>
%3 = affine.load %arg1[%arg6, %arg8] : memref<2048x2048xf32>
%4 = mulf %2, %3 : f32
%5 = addf %4, %1 : f32
affine.store %5, %0[%arg7, %arg8] : memref<2088x2048xf32>
}
}
}
}
}
}
and it is needed to tile the second loop “affine.for %arg4 = 0 to 2088 step 3” with tile size 330.
The following code tries to implement this tiling:
SmallVector<AffineForOp,3> oneLoop({tiled_nest[1]});
SmallVector<unsigned, 3> oneLoopTileSize({330});
SmallVector<AffineForOp, 8> tiled_nest2;
tilePerfectlyNested(oneLoop, oneLoopTileSize, &tiled_nest2));
And after tilePerfectlyNested() is done I can see the following result:
#map0 = affine_map<(d0) -> (d0)>
#map1 = affine_map<(d0) -> (d0 + 330, 2088)>
#map2 = affine_map<(d0) -> (d0 + 480, 2048)>
#map3 = affine_map<(d0) -> (d0 + 3)>
#map4 = affine_map<(d0) -> (d0 + 16)>
affine.for %arg3 = 0 to 2048 step 480 {
affine.for %arg4 = 0 to 2088 step 330 {
affine.for %arg5 = #map0(%arg4) to min #map1(%arg4) {
affine.for %arg6 = 0 to 2048 step 16 {
affine.for %arg7 = #map0(%arg3) to min #map2(%arg3) {
affine.for %arg8 = #map0(%arg5) to #map3(%arg5) {
affine.for %arg9 = #map0(%arg6) to #map4(%arg6) {
%1 = affine.load %0[%arg8, %arg9] : memref<2088x2048xf32>
%2 = affine.load %arg0[%arg8, %arg7] : memref<2088x2048xf32>
%3 = affine.load %arg1[%arg7, %arg9] : memref<2048x2048xf32>
%4 = mulf %2, %3 : f32
%5 = addf %4, %1 : f32
affine.store %5, %0[%arg8, %arg9] : memref<2088x2048xf32>
}
}
}
}
}
}
}
i.e. my initial loop “affine.for %arg4 = 0 to 2088 step 3” was split to the couple of loops
affine.for %arg4 = 0 to 2088 step 330 {
affine.for %arg5 = #map0(%arg4) to min #map1(%arg4) {
and the second loop has default step value =1 (it is not correct), while in the initial loop the step value was 3. Why the initial step value 3 has been ignored during tiling?
BR, Oleg.