Consider linalg MatmulOp like this:
module {
func.func @test_matmul(%arg0: tensor<1x8192x8192xf32>, %arg1: tensor<1x8192x8192xf32>) → tensor<1x8192x8192xf32> {
%cst = arith.constant 0.000000e+00 : f32
%0 = tensor.empty() : tensor<1x8192x8192xf32>
%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<1x8192x8192xf32>) → tensor<1x8192x8192xf32>
%2 = linalg.batch_matmul ins(%arg0, %arg1 : tensor<1x8192x8192xf32>, tensor<1x8192x8192xf32>) outs(%1 : tensor<1x8192x8192xf32>) → tensor<1x8192x8192xf32>
return %2 : tensor<1x8192x8192xf32>
}
}
if %1 fills with other vector<1*8192x8192xf32> that have initial value. Does linalg.batch_matmul just calculate matmul of %arg0, %arg1 then generate %2 or calculate and add %1 then generate %2 ?
The semantics here is of accumulation
, not addition
. Basically C += A x B
.
If the init tensor (%1
) is initialized as all-zeroes (in your case above), then the accumulation is on a zero init memory and the result is just %2 = %arg0 x %arg1
, or C = A x B
(technically C += A x B
with C = 0
).
If the init is non-zero, then the accumulation is done on non-zero memory, which “adds” (by accumulation) on the existing values, essentially doing %2 = %1 + %arg0 x %arg1
, or C += A x B
.
Note that there is no add
op here after the matmul
, it’s just accumulation on a pre-existing tensor.
Thx for your relpy