Bug in `PadOpTransformationPattern`

KFAF · May 23, 2022, 5:38pm

Hi there,

I think I found a bug in the current implementation of PadOpTransformationPattern. The implemented stragtegy for lowering is invalid when there is no padding in lower dimensions. This is due to the implicit iteration domains of the linalg.generic used for copying. Consider the following input:

module {
    func.func @test(%in : tensor<1x128x23x23xf32>) -> tensor<1x128x24x24xf32> {
        %cst0 = arith.constant 0.0 : f32
        %out = tensor.pad %in low [0, 0, 0, 0] high [0, 0, 1, 1] {
            ^bb0(%n : index, %c : index, %y : index, %x : index):
                tensor.yield %cst0 : f32
        } : tensor<1x128x23x23xf32> to tensor<1x128x24x24xf32>
        return %out : tensor<1x128x24x24xf32>
    }
}

On 598c5ddba (recent head), you will receive the following error:

test.mlir:15:17: error: 'linalg.generic' op inferred input/output operand #1 has shape's dimension #2 to be 23, but found 24
        %copy = linalg.generic {
                ^
test.mlir:15:17: note: see current operation: %3 = "linalg.generic"(%arg0, %2) ({
^bb0(%arg1: f32, %arg2: f32):
  "linalg.yield"(%arg1) : (f32) -> ()
}) {indexing_maps = [affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>, affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>], iterator_types = ["parallel", "parallel", "parallel", "parallel"], operand_segment_sizes = dense<1> : vector<2xi32>} : (tensor<1x128x23x23xf32>, tensor<1x128x24x24xf32>) -> tensor<1x128x24x24xf32>

Indeed, this lowering is not valid:

func.func @test2(%in : tensor<1x128x23x23xf32>) -> tensor<1x128x24x24xf32> {
    %cst0 = arith.constant 0.0 : f32
    %init = linalg.init_tensor [1, 128, 24, 24] : tensor<1x128x24x24xf32>
    %fill = linalg.fill 
        ins(%cst0 : f32) 
        outs(%init : tensor<1x128x24x24xf32>) 
        -> tensor<1x128x24x24xf32>
    %copy = linalg.generic {
        indexing_maps = [
            affine_map<(n,c,y,x) -> (n,c,y,x)>,
            affine_map<(n,c,y,x) -> (n,c,y,x)>
        ],
        iterator_types = ["parallel", "parallel", "parallel", "parallel"]
    } 
        ins(%in : tensor<1x128x23x23xf32>) 
        outs(%fill : tensor<1x128x24x24xf32>) {
        ^bb0(%arg0 : f32, %arg1 : f32):
            linalg.yield %arg0 : f32
    } -> tensor<1x128x24x24xf32>
    return %copy : tensor<1x128x24x24xf32>
}

I looked at the code and the proposed alternative of using tensor.insert_slice seems to work fine:

func.func @test3(%in : tensor<1x128x23x23xf32>) -> tensor<1x128x24x24xf32> {
    %cst0 = arith.constant 0.0 : f32
    %init = linalg.init_tensor [1, 128, 24, 24] : tensor<1x128x24x24xf32>
    %fill = linalg.fill
        ins(%cst0 : f32)
        outs(%init : tensor<1x128x24x24xf32>)
        -> tensor<1x128x24x24xf32>
    %copy = tensor.insert_slice %in into %fill
        [0,0,0,0][1,128,23,23][1,1,1,1] :
        tensor<1x128x23x23xf32> into tensor<1x128x24x24xf32>
    return %copy : tensor<1x128x24x24xf32>
}

I have never used Phabricator before and I don’t have the time to steward this fix now, but I wanted to let people know. Maybe @agostini01 can keep an eye on this one if they decide to do another pass on this implementation.

KFAF · May 30, 2022, 8:30am

I just noticed that there is already a GeneralizePadOpPattern that implements the desired behavior. I’m now wondering what the difference is, seeing as both only work on constant padding values, except that the lowering to tensor.insert_slice actually works.

I also noted that tensor.init_tensor and tensor.fill could now probably be replaced with tensor.splat.

Topic		Replies	Views
Linalg.pad_tensor's output type and pad size MLIR	0	211	October 22, 2021
How to lower linalg.generic to affine loops MLIR	1	623	January 31, 2023
Index maps and tiling MLIR	7	859	November 7, 2022
Bug in --convert-linalg-to-affine-loops? MLIR	22	1414	April 29, 2021
Understanding Iteration Space Inference for GenericOp MLIR	7	533	August 12, 2021

Bug in `PadOpTransformationPattern`

Related topics