Hello.
Could you please tell me how can I create a scalar variable in MLIR, that we can use to sum-reduce (to accumulate) over a memref with elements of type f16. Currently I am using a memref with only one element (see variable %0 in the code below):
// This sums up the elements of memref %1.
%0 = memref.alloc() : memref<1xf16> // %0 is a memref with only 1 element
%1 : memref<1x1000xf16> // %1 is the memref we want to sum-reduce over
%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f16
memref.store %cst, %0[%c0] : memref<1xf16>
affine.for %arg1 = 0 to 1 {
affine.for %arg2 = 0 to 1000 {
%4 = affine.load %1[%arg1, %arg2] : memref<1x1000xf16>
%c0_0 = arith.constant 0 : index
%5 = affine.load %0[%c0_0] : memref<1xf16>
%6 = arith.addf %5, %4 : f16
affine.store %6, %0[%c0_0] : memref<1xf16>
}
}
I would like to use instead of this memref %0 a true scalar variable as the accumulator.
Could you please tell me if it is possible through using the LLVM dialect?
Thank you very much,
Alex
Hi Alex!
When I implemented a simple sum-reduce over a 2d-memref I produced something like this here:
%7 = memref.alloc() : memref<1024x1024xf16>
... fill with some values ...
%cst = arith.constant 0.0000e+00 : f16
%sum = affine.for %arg0 = 0 to 1024 iter_args(%arg1 = %cst) -> (f16) {
%cst_0 = arith.constant 0.0000e+00 : f16
%inner_sum = affine.for %arg2 = 0 to 1024 iter_args(%arg3 = %cst_0) -> (f16) {
%14 = memref.load %7[%arg0, %arg2] : memref<1024x1024xf16>
%15 = arith.addf %arg3, %14 : f16
affine.yield %15 : f16
}
%13 = arith.addf %arg1, %inner_sum : f16
affine.yield %13 : f16
}
This loops over the two dimensions of the memref
and since we are working with SSA-values you cannot initialize the scalar sum before the loop and then change it’s value during the loop.
The AffineYieldOp
helps you with that, so %sum
is the scalar value of the sum-reduce.
You could also check out this helpful blog post from @Lewuathe .
This can also be done with linalg using reduction dimensions I believe.
2 Likes
Exactly. Below is an example taken from a Resnet softmax, with element-wise exp
+ reducing add
.
#map4 = affine_map<(d0, d1) -> (d0, d1)>
#map5 = affine_map<(d0, d1) -> (d0)>
%1582 = tensor.empty() : tensor<1x1000xf32>
%1583 = linalg.generic {indexing_maps = [#map4, #map4], iterator_types = ["parallel", "parallel"]} ins(%1581 : tensor<1x1000xf32>) outs(%1582 : tensor<1x1000xf32>) {
^bb0(%in: f32, %out: f32):
%1591 = math.exp %in : f32
linalg.yield %1591 : f32
} -> tensor<1x1000xf32>
%1584 = tensor.empty() : tensor<1xf32>
%1585 = linalg.fill ins(%cst_0 : f32) outs(%1584 : tensor<1xf32>) -> tensor<1xf32>
%1586 = linalg.generic {indexing_maps = [#map4, #map5], iterator_types = ["parallel", "reduction"]} ins(%1583 : tensor<1x1000xf32>) outs(%1585 : tensor<1xf32>) {
^bb0(%in: f32, %out: f32):
%1591 = arith.addf %out, %in : f32
linalg.yield %1591 : f32
} -> tensor<1xf32>
1 Like