Create a scalar accumulator variable in MLIR

Hello.
Could you please tell me how can I create a scalar variable in MLIR, that we can use to sum-reduce (to accumulate) over a memref with elements of type f16. Currently I am using a memref with only one element (see variable %0 in the code below):

// This sums up the elements of memref %1.
%0 = memref.alloc() : memref<1xf16> // %0 is a memref with only 1 element
%1 : memref<1x1000xf16> // %1 is the memref we want to sum-reduce over
%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f16
memref.store %cst, %0[%c0] : memref<1xf16>
affine.for %arg1 = 0 to 1 {
affine.for %arg2 = 0 to 1000 {
%4 = affine.load %1[%arg1, %arg2] : memref<1x1000xf16>
%c0_0 = arith.constant 0 : index
%5 = affine.load %0[%c0_0] : memref<1xf16>
%6 = arith.addf %5, %4 : f16
affine.store %6, %0[%c0_0] : memref<1xf16>
}
}

I would like to use instead of this memref %0 a true scalar variable as the accumulator.
Could you please tell me if it is possible through using the LLVM dialect?

Thank you very much,
Alex

Hi Alex!

When I implemented a simple sum-reduce over a 2d-memref I produced something like this here:

%7 = memref.alloc() : memref<1024x1024xf16>

... fill with some values ...

%cst = arith.constant 0.0000e+00 : f16
%sum = affine.for %arg0 = 0 to 1024 iter_args(%arg1 = %cst) -> (f16) {
  %cst_0 = arith.constant 0.0000e+00 : f16
  %inner_sum = affine.for %arg2 = 0 to 1024 iter_args(%arg3 = %cst_0) -> (f16) {
    %14 = memref.load %7[%arg0, %arg2] : memref<1024x1024xf16>
    %15 = arith.addf %arg3, %14 : f16
    affine.yield %15 : f16
  }
  %13 = arith.addf %arg1, %inner_sum : f16
  affine.yield %13 : f16
}

This loops over the two dimensions of the memref and since we are working with SSA-values you cannot initialize the scalar sum before the loop and then change it’s value during the loop.
The AffineYieldOp helps you with that, so %sum is the scalar value of the sum-reduce.

You could also check out this helpful blog post from @Lewuathe .

This can also be done with linalg using reduction dimensions I believe.

2 Likes

Exactly. Below is an example taken from a Resnet softmax, with element-wise exp + reducing add.

#map4 = affine_map<(d0, d1) -> (d0, d1)>
#map5 = affine_map<(d0, d1) -> (d0)>

  %1582 = tensor.empty() : tensor<1x1000xf32>
  %1583 = linalg.generic {indexing_maps = [#map4, #map4], iterator_types = ["parallel", "parallel"]} ins(%1581 : tensor<1x1000xf32>) outs(%1582 : tensor<1x1000xf32>) {
  ^bb0(%in: f32, %out: f32):
    %1591 = math.exp %in : f32
    linalg.yield %1591 : f32
  } -> tensor<1x1000xf32>

  %1584 = tensor.empty() : tensor<1xf32>
  %1585 = linalg.fill ins(%cst_0 : f32) outs(%1584 : tensor<1xf32>) -> tensor<1xf32>
  %1586 = linalg.generic {indexing_maps = [#map4, #map5], iterator_types = ["parallel", "reduction"]} ins(%1583 : tensor<1x1000xf32>) outs(%1585 : tensor<1xf32>) {
  ^bb0(%in: f32, %out: f32):
    %1591 = arith.addf %out, %in : f32
    linalg.yield %1591 : f32
  } -> tensor<1xf32>
1 Like