[RFC][OpenMP] Re-Implement the Parallel Workshare construct

thirumalai · July 10, 2024, 8:46am

Problem

Consider an example:

program test
    integer, parameter :: N = 5
    real :: x(N)
    !$omp parallel workshare NUM_THREADS(5)
        x = get_thread_num()
    !$omp end parallel workshare
    print *, x
contains
impure elemental integer function get_thread_num()
    use omp_lib
    get_thread_num = omp_get_thread_num()
    ! print *, get_thread_num
end function
end program test

Output:

$ flang-new x.f90 -fopenmp && ./a.out
 0. 0. 0. 0. 0.
$ gfortran x.f90 -fopenmp && ./a.out
   0.00000000       1.00000000       2.00000000       3.00000000       4.00000000

As you can see the assignment is happening in the single thread itself.
Also, It seems we have implemented a workaround for workshare here: [Flang][OpenMP] : Add a temporary lowering for workshare directive by kiranchandramohan · Pull Request #78268 · llvm/llvm-project · GitHub

FIR

[...]
    omp.parallel {
      omp.single {
        [...]
        omp.terminator
      }
      omp.terminator
    }
[...]

The Workshare construct allows only the following

Array & Scalar assignments
Forall statements & constructs
Where statements & constructs
parallel, critical & atomic constructs

Solution

Scalar Assignments: Enclose the scalar assignments into single construct.

!$omp parallel workshare
  x = x + 1
  y = y + 1
  [...] ! Other statements
  z = z + 1
!$omp end parallel workshare

omp.parallel {
  omp.single {
    [...] // x = x + 1
    [...] // y = y + 1
    omp.terminator
  }
  [...] // Other constructs 
  omp.single {
    [...] // z = z + 1
    omp.terminator
  }
  omp.terminator
}

Array assignment: Enclose the array assignment within a wsloop construct

!$omp parallel workshare
  x = x + 1
!$omp end parallel workshare

omp.parallel {
  omp.wsloop ... {
    [...] // x = x + 1
  }
  omp.terminator
}

Forall statement: No modification
Where statement: No modification
Critical and Atomic: No modification (should run on the single thread)

For 3, 4 and 5, I need to implement and check if everything works as expected.
I’m new in writing an RFC, if any other details to be added, please let me know.

Thank you

kiranchandramohan · July 10, 2024, 10:06am

Thanks @thirumalai for writing this RFC. As mentioned in the patch the existing lowering to single is a temporary one till the real lowering comes in. Ivan who has working with @jdoerfert was working on a lowering for the coexecute construct (OpenMP 6.0) that is similar to the workshare construct. We have to check whether Ivan has plans to upstream this.

Regarding the use of omp.wsloop, the previous issue was that we always had the loop control along with omp.wsloop. This was relexed recently but it still requires omp.loop_nest that has the loop control. For array assignments in HLFIR an hlfir.assign operation will be created and this will further be lowered to a _FortranAAssign runtime call or a loop.

hlfir.assign %2#0 to %1#0 : !fir.box<!fir.array<?xi32>>, !fir.box<!fir.array<?xi32>>

So omp.wsloop will have to be modified to work with hlfir.assign and the lowerings to the runtime call/loop be modified suitably.

ivanradanov · July 13, 2024, 11:12am

I have posted an RFC for my workdistribute proof of concept implementation here.

I think the idea of using high level transformations in hlfir and then adding a custom lowering for those to openmp operations should work here as well. However, the scope of the allowed statements inside a workshare is larger so further work may be needed for this than for workdistribute.

Other than assignments, array intrinsics should also be considered as they also lower to runtime calls such as _FortranAAssign.
I believe one option here is to provide OpenMP-enabled implementations of these functions.

Most of the complexity in my workdistribute implementation stems from the inability to synchronize code in an omp distribute and the need to alter the host/target boundary both of which are not a problem here so it should be simpler in that way.

kiranchandramohan · July 15, 2024, 2:04pm

@thirumalai Would you be interested in going through @ivanradanov’s RFC and discussing its suitability for workshare?

Separately, ifx seems to be doing something similar to the temporary lowering in flang-new. Compiler Explorer

I see that OpenMP 5.2 has a few more restrictions that makes this particular example non-conforming.

The construct must not contain any user-defined function calls unless either the function is pure and elemental or the function call is contained inside a parallel construct that is nested inside the workshare construct.

thirumalai · July 16, 2024, 7:29am

I think work-distribution has the similar restrictions, we can have the same implementation for both.

Yes, I used the above example to check the thread usage. After implementing the workshare. I will add the required semantics check to catch it : )

ivanradanov · August 1, 2024, 2:30am

I have submitted 4 draft PRs for a workshare implementation.

Perhaps discussing this with some code can be beneficial.

1/4 [MLIR][omp] Add omp.workshare op by ivanradanov · Pull Request #101443 · llvm/llvm-project · GitHub
2/4 [flang][omp] Emit omp.workshare in frontend by ivanradanov · Pull Request #101444 · llvm/llvm-project · GitHub
3/4 [flang] Introduce ws loop nest generation for HLFIR lowering by ivanradanov · Pull Request #101445 · llvm/llvm-project · GitHub
4/4 [flang] Lower omp.workshare to other omp constructs by ivanradanov · Pull Request #101446 · llvm/llvm-project · GitHub

thirumalai · August 1, 2024, 5:35am

Overall, it looks great. Thank you very much.
I’m new to the llvm-project, so I’m leaving it to experts to provide suggestions.

Topic		Replies	Views
[RFC][OpenMP] Parallelize the possible intrinsic functions in the workshare construct Flang	3	100	November 16, 2024
[RFC] OpenMP workdistribute construct implementation in Flang Flang	7	259	August 26, 2024
[Flang][OpenMP][RFC] Replace flang/module/omp_lib.{f90,h} with openmp/runtime/src/include/omp_lib.{f90,h} Flang openmp	23	350	February 29, 2024
[RFC][OpenMP] Should type declaration be allowed after threadprivate? Flang	11	138	October 15, 2024
Flang OpenMP Status & Developer Call Announcement Flang	0	122	July 21, 2020

[RFC][OpenMP] Re-Implement the Parallel Workshare construct

Problem

Solution

Related topics