Hi there,
I think I found a bug in the current implementation of PadOpTransformationPattern
. The implemented stragtegy for lowering is invalid when there is no padding in lower dimensions. This is due to the implicit iteration domains of the linalg.generic
used for copying. Consider the following input:
module {
func.func @test(%in : tensor<1x128x23x23xf32>) -> tensor<1x128x24x24xf32> {
%cst0 = arith.constant 0.0 : f32
%out = tensor.pad %in low [0, 0, 0, 0] high [0, 0, 1, 1] {
^bb0(%n : index, %c : index, %y : index, %x : index):
tensor.yield %cst0 : f32
} : tensor<1x128x23x23xf32> to tensor<1x128x24x24xf32>
return %out : tensor<1x128x24x24xf32>
}
}
On 598c5ddba
(recent head), you will receive the following error:
test.mlir:15:17: error: 'linalg.generic' op inferred input/output operand #1 has shape's dimension #2 to be 23, but found 24
%copy = linalg.generic {
^
test.mlir:15:17: note: see current operation: %3 = "linalg.generic"(%arg0, %2) ({
^bb0(%arg1: f32, %arg2: f32):
"linalg.yield"(%arg1) : (f32) -> ()
}) {indexing_maps = [affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>, affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>], iterator_types = ["parallel", "parallel", "parallel", "parallel"], operand_segment_sizes = dense<1> : vector<2xi32>} : (tensor<1x128x23x23xf32>, tensor<1x128x24x24xf32>) -> tensor<1x128x24x24xf32>
Indeed, this lowering is not valid:
func.func @test2(%in : tensor<1x128x23x23xf32>) -> tensor<1x128x24x24xf32> {
%cst0 = arith.constant 0.0 : f32
%init = linalg.init_tensor [1, 128, 24, 24] : tensor<1x128x24x24xf32>
%fill = linalg.fill
ins(%cst0 : f32)
outs(%init : tensor<1x128x24x24xf32>)
-> tensor<1x128x24x24xf32>
%copy = linalg.generic {
indexing_maps = [
affine_map<(n,c,y,x) -> (n,c,y,x)>,
affine_map<(n,c,y,x) -> (n,c,y,x)>
],
iterator_types = ["parallel", "parallel", "parallel", "parallel"]
}
ins(%in : tensor<1x128x23x23xf32>)
outs(%fill : tensor<1x128x24x24xf32>) {
^bb0(%arg0 : f32, %arg1 : f32):
linalg.yield %arg0 : f32
} -> tensor<1x128x24x24xf32>
return %copy : tensor<1x128x24x24xf32>
}
I looked at the code and the proposed alternative of using tensor.insert_slice
seems to work fine:
func.func @test3(%in : tensor<1x128x23x23xf32>) -> tensor<1x128x24x24xf32> {
%cst0 = arith.constant 0.0 : f32
%init = linalg.init_tensor [1, 128, 24, 24] : tensor<1x128x24x24xf32>
%fill = linalg.fill
ins(%cst0 : f32)
outs(%init : tensor<1x128x24x24xf32>)
-> tensor<1x128x24x24xf32>
%copy = tensor.insert_slice %in into %fill
[0,0,0,0][1,128,23,23][1,1,1,1] :
tensor<1x128x23x23xf32> into tensor<1x128x24x24xf32>
return %copy : tensor<1x128x24x24xf32>
}
I have never used Phabricator before and I don’t have the time to steward this fix now, but I wanted to let people know. Maybe @agostini01 can keep an eye on this one if they decide to do another pass on this implementation.