[RFC] OpenMP reduction support

Meinersbur · April 27, 2021, 9:46pm

I responded by email, but discourse didn’t like it. Here it is again:

Thanks for the extensive write up. Here are some of my thoughts:

I’d find the value approach nicer for an SSA-based IR, with well-defined content that does not change when it is accessed. The possibility of delaying the result might be a major problem that would require an alternative means to define a definition as well. This reminds me of a ‘future’, and we could base an SSA representation on that (please excuse my broken MLIR):

%parallel_reduction_future = omp.parallel ... reduction(@add_f32: %in -> % wsloop_reduction_result : f32){
  %wsloop_reduction_future = omp.wsloop ... reduction(@add_f32: %in -> %out : f32) {
    %intermediate = @add_f32(%in, %value)
    %out = @add_f32(%intermediate, %another_value) ...
    yield %out
  }
  %wsloop_reduction_result = omp.reduction.get(%wsloop_reduction_future, %in)
  yield %wsloop_reduction_result
}
omp.barrier
%parallel_reduction_result = omp.reduction.get(%parallel_reduction_future, %in)

use(%parallel_reduction_result)

It might be beneficial to have in-built reductions (for add, min, max, …) instead of having to define the init/combine functions for them. Some architectures have dedicated support for them and these could not be used easily if their semantics is ‘hidden’ inside some function definitions.
You mention the runtime would decide whether to use the atomic. To keep the overhead down, I think the involvement of the runtime should be minimal, and the majority of decisions be made by the OpenMPIRBuilder.
A lot of OpenMP directives support reduction clauses, the newest being the Scope directive (scope Construct). This might motivate to define reductions orthogonal to other constructs, maybe as independent operations instead of attributes, but I don;t know how that would look like.
In your suggested syntax the omp.reduction operation seems to be the only one carrying which reduction to apply. What if is optimized away, e.g. by dead code elimination:

omp.wsloop ... reduction(%token1 -> %accum1 = ...,) {
  if (false) {
    omp.reduction %r1, @add_f32, %0 : f32
  }
}

The result of the reduction with 0 elements to reduction could be 0 (for addition), but also 1 (if multiplication), i.e. it needs to know the neutral element.

A @sub_i32 reduction is weird to define. In OpenMP a subtract reduction is actually that same as addition, but the OpenMP committee was considering deprecating it.

Michael

Topic		Replies	Views
Why some OpenMP ops only interoperate with llvm dialect? MLIR	2	171	October 18, 2023
[RFC] Representing combined/composite constructs in the OpenMP dialect MLIR openmp	15	524	March 11, 2024
Status of OpenMP in MLIR MLIR core	1	627	April 28, 2023
OpenMP Worksharing Loop RFC MLIR	14	1237	October 5, 2020
Summary of F18 OpenMP design Flang	1	107	December 8, 2019

[RFC] OpenMP reduction support

Related Topics