Linalg.generic parallel vs. reduction

aqjune · October 17, 2021, 3:32pm

Hello all,

I found that the output of mlir/test/Dialect/Linalg/drop-unit-extent-dims.mlir contains linalg.generic that does summation but its iterator type still parallel:

#map = affine_map<(d0) -> (d0)>
module  {
  func @unit_dim_for_both_reduction(%arg0: tensor<1x?x1x1xf32>) -> tensor<1x1xf32> {
    %cst = arith.constant 1.000000e+00 : f32
    %0 = linalg.tensor_collapse_shape %arg0 [[0, 1, 2, 3]] : tensor<1x?x1x1xf32> into tensor<?xf32>
    %1 = linalg.init_tensor [1] : tensor<1xf32>
    %2 = linalg.fill(%cst, %1) : f32, tensor<1xf32> -> tensor<1xf32> 
    %3 = linalg.generic {indexing_maps = [#map, #map], iterator_types = ["parallel"]} ins(%0 : tensor<?xf32>) outs(%2 : tensor<1xf32>) {
    ^bb0(%arg1: f32, %arg2: f32):  // no predecessors
      %5 = arith.addf %arg1, %arg2 : f32
      linalg.yield %5 : f32
    } -> tensor<1xf32>
    %4 = linalg.tensor_expand_shape %3 [[0, 1]] : tensor<1xf32> into tensor<1x1xf32>
    return %4 : tensor<1x1xf32>
  }
}

IIUC the iterator type should be “reduction” because the loop is doing a reduction.
Is my understanding wrong, or is it a bug in the test?

gysit · October 18, 2021, 7:43am

Hi,

This test case is a bit confusing in the sense that the remaining input dimension is of size “?” while the output dimension is of size 1. However, since both input and output tensor are one-dimensional we cannot perform a reduction here. Instead it is a normal parallel iteration which implies the size of the input tensor is actually 1 (so “?” == 1).

A reduction is only possible if the output tensor has less dimensions than the input tensor. In particular, we cannot index the output tensor with a reduction dimension.

I hope this helped clarifying?

PS Also note in the test input there are two reduction dimensions that both have size one and as a result are dropped…

aqjune · October 18, 2021, 1:36pm

Hi @gysit , thank you for your explanation!

A reduction is only possible if the output tensor has less dimensions than the input tensor. In particular, we cannot index the output tensor with a reduction dimension.

It makes sense to me, but I am not still convinced because the loop body is adding %arg1 and %arg2 where the latter is the result of the previous iteration.
Or, is this a special case that is allowed when the input and output tensors have dimensions of size one?

gysit · October 18, 2021, 2:37pm

You can read from an output tensor if it contains values (otherwise it is undefined behavior). In absence of a reduction iterator you will read the initial value stored in the output tensor. Otherwise, you read the initial value in the first iteration of the reduction and the accumulated value in later iterations. In the example, the output tensor is filled with ones meaning the operation will add one to the value in the input tensor.

However, the example is a bit contrived since the original operation was actually performing a reduction of size one. An parallel operation would normally read from the input tensors only.

aqjune · October 18, 2021, 2:58pm

In absence of a reduction iterator you will read the initial value stored in the output tensor. Otherwise, you read the initial value in the first iteration of the reduction and the accumulated value in later iterations. In the example, the output tensor is filled with ones meaning the operation will add one to the value in the input tensor.

I understood, thank you!

Topic		Replies	Views
[MLIR][Linalg] Vectorization Fail with reduction MLIR	3	199	June 29, 2024
Linalg.generic for multiple, different reduction dimensions MLIR linalg	1	478	December 16, 2022
Fusing linalg fill and reduce MLIR	4	144	December 2, 2025
New Linalg Code Generation Strategy for Innermost Reductions MLIR	7	481	December 5, 2024
[MLIR][RFC] Semantics of `linalg.reduce` with variadic operands MLIR	1	113	December 13, 2024

Linalg.generic parallel vs. reduction

Related topics