Hi all-
I feel like I’m missing something. I want to bufferize the following code and convert the result to a memref argument:
func.func @forward(%arg0: tensor<5xi32>, %arg1: tensor<5xi32>) -> tensor<i32> {
%c0_i32 = arith.constant 0 : i32
%0 = tensor.empty() : tensor<i32>
%1 = linalg.fill ins(%c0_i32 : i32) outs(%0 : tensor<i32>) -> tensor<i32>
%2 = linalg.dot ins(%arg0, %arg1 : tensor<5xi32>, tensor<5xi32>) outs(%1 : tensor<i32>) -> tensor<i32>
return %2 : tensor<i32>
}
I can do this simply enough with -one-shot-bufferize='allow-return-allocs bufferize-function-boundaries' -buffer-results-to-out-params -buffer-deallocation
, but this creates a needless memref.alloc
and memref.copy
which I can’t have.
func.func @forward(%arg0: memref<5xi32, strided<[?], offset: ?>>, %arg1: memref<5xi32, strided<[?], offset: ?>>, %arg2: memref<i32>) {
%c0_i32 = arith.constant 0 : i32
%alloc = memref.alloc() {alignment = 128 : i64} : memref<i32>
linalg.fill ins(%c0_i32 : i32) outs(%alloc : memref<i32>)
linalg.dot ins(%arg0, %arg1 : memref<5xi32, strided<[?], offset: ?>>, memref<5xi32, strided<[?], offset: ?>>) outs(%alloc : memref<i32>)
memref.copy %alloc, %arg2 : memref<i32> to memref<i32>
memref.dealloc %alloc : memref<i32>
return
}
I think there was once a -copy-removal
pass which would handle this scenario? Either way, I’ve tried a huge number of permutations of relevant-sounding passes but still am not getting the results I need. Am I missing something? How should I accomplish this?
The output I want is something like:
func.func @forward(%arg0: memref<5xi32, strided<[?], offset: ?>>, %arg1: memref<5xi32, strided<[?], offset: ?>>, %arg2: memref<i32>) {
%c0_i32 = arith.constant 0 : i32
linalg.fill ins(%c0_i32 : i32) outs(%arg2 : memref<i32>)
linalg.dot ins(%arg0, %arg1 : memref<5xi32, strided<[?], offset: ?>>, memref<5xi32, strided<[?], offset: ?>>) outs(%arg2: memref<i32>)
return
}