[RFC] OpenMP reduction support

I responded by email, but discourse didn’t like it. Here it is again:

Thanks for the extensive write up. Here are some of my thoughts:

  1. I’d find the value approach nicer for an SSA-based IR, with well-defined content that does not change when it is accessed. The possibility of delaying the result might be a major problem that would require an alternative means to define a definition as well. This reminds me of a ‘future’, and we could base an SSA representation on that (please excuse my broken MLIR):
%parallel_reduction_future = omp.parallel ... reduction(@add_f32: %in -> % wsloop_reduction_result : f32){
  %wsloop_reduction_future = omp.wsloop ... reduction(@add_f32: %in -> %out : f32) {
    %intermediate = @add_f32(%in, %value)
    %out = @add_f32(%intermediate, %another_value) ...
    yield %out
  }
  %wsloop_reduction_result = omp.reduction.get(%wsloop_reduction_future, %in)
  yield %wsloop_reduction_result
}
omp.barrier
%parallel_reduction_result = omp.reduction.get(%parallel_reduction_future, %in)

use(%parallel_reduction_result)
  1. It might be beneficial to have in-built reductions (for add, min, max, …) instead of having to define the init/combine functions for them. Some architectures have dedicated support for them and these could not be used easily if their semantics is ‘hidden’ inside some function definitions.

  2. You mention the runtime would decide whether to use the atomic. To keep the overhead down, I think the involvement of the runtime should be minimal, and the majority of decisions be made by the OpenMPIRBuilder.

  3. A lot of OpenMP directives support reduction clauses, the newest being the Scope directive (scope Construct). This might motivate to define reductions orthogonal to other constructs, maybe as independent operations instead of attributes, but I don;t know how that would look like.

  4. In your suggested syntax the omp.reduction operation seems to be the only one carrying which reduction to apply. What if is optimized away, e.g. by dead code elimination:

omp.wsloop ... reduction(%token1 -> %accum1 = ...,) {
  if (false) {
    omp.reduction %r1, @add_f32, %0 : f32
  }
}

The result of the reduction with 0 elements to reduction could be 0 (for addition), but also 1 (if multiplication), i.e. it needs to know the neutral element.

  1. A @sub_i32 reduction is weird to define. In OpenMP a subtract reduction is actually that same as addition, but the OpenMP committee was considering deprecating it.

Michael