We are looking at implementing padding support for lowering an aten.conv2d
operation through tcf.conv2d_nchw
and wanted to revisit our initial discussion about adding a pad
operation to the reference backend.
Currently we are using the following lowering pattern for basic conv support.
aten.conv2d(input, filter, bias, stride, padding, dilations, groups)
// -convert-aten-to-tcf
tcf.conv2d_nchw(input, filter, bias, stride, padding, dilations, groups)
// -convert-tcf-to-linalg
// bunch of shape-related error-handling
linalg.conv_2d_nchw(...)
Looking back at our original npcomp conv2d discussion, we had initially discussed adding a tcp.pad
op (if padding != 0) that does a deepcopy into a padded buffer with the correct dimensions and then passes the whole thing off to linalg.conv_*
(that way the linalg ops only need to know how to handle VALID padding). So the lowering flow for a conv2d would now look like this.
tcf.conv2d(input, filter, bias, stride, dilations, padding)
// --convert-tcf-to-linalg
// if padding != 0
%0 = tcp.pad(input, padding, fill_value/*=0*/)
%1 = linalg.conv2d_nchw(%0, ...)
// else
%0 = linalg.conv2d_nchw(input)
With this approach the lowering of tcp.pad would lower to something like:
tcp.pad(input, padding, fill_value)
// convert-tcp-to-std
// get the shape of padded buffer
%pad_buf = std.alloc(sizeof(padded_buf))
%1 = std::fill(%pad_buf, input, <starting address>) // fill with subview at the right location
%2 = linalg.conv_2d_nchw(%1,...)
A few points for discussion:
- Does this still sound like a good first-pass approach?
- TCP is a mostly unused dialect currently, do we still think this is the right place to add a PadOp, or should we just add to TCF and then change over to an upstream linalg lowering when there is one?
- Will lowering a PadOp to our own implementation that
alloc
’s break our current usage of the upstream bufferization passes? It looks like the TCPBufferize logic has still stuck around for the SplatOp, but also maybe there’s a way to express the padding operation without having to allocate a constant pad buffer to move data into. Maybe we can reuse therefbackrt::AllocMemRefOp
?