This post covers the design issues in the design and implementation of the parallel operation in the OpenMP dialect. The post is written to layout all the issues and to discuss and take decisions.
The specification of the parallel construct can be found in Section 2.6 of the OpenMP 5.0 specification document. https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf
A trivial example of an OpenMP parallel construct in Fortran is given below.
integer :: a, b, c
!$omp parallel shared(a,b,c)
c = a + b
!$omp end parallel
The design issues to consider here are,
-
Representation of the Parallel Construct
The construct can be represented by an operation (omp.parallel) with one region. -
Representation of the clauses
i) Clauses whose values are constants
default and proc_bind clauses are optional and they take a constant argument. These clauses can be represented as Optional attributes, possibly StrEnumAttributes.
ii) Clauses with values as integer/logical expression
Clauses if and num_threads take in integer/logical expressions as input. These can be modelled as Operands with I1 and AnyInteger (or i32) type. These clauses are also optional and we should have a way of modelling the optional part.
iii) Clauses having a list of variables
Clauses private, firstprivate, shared, copyin, allocate take in a list of variables as input. These clauses are also optional.
a) These clauses should be modelled as operands since they are a list of variables. I don’t see a way of modelling as attributes.
b) The type of the variable
What should be the type of the variable?
b1) Should it be a ref type? Or the type of the value referenced?
b2) Should it be types from all co-existing dialect types like (fir, std, llvm)?
b3) Should it just be the standard dialect? We can mandate that there be conversion operations from the respective dialects to standard dialect.
c) How to model the list of variables?
→ Variadic (with segment length)
→ Tuples
Some related discussion Handling optional input/output arguments
d) Examples below
d1) using tuples, ref type, omp dialect co-existing with fir types.
module {
func @MAIN_() {
%0 = fir.alloca i32 {name = "c"}
%1 = fir.alloca i32 {name = "b"}
%2 = fir.alloca i32 {name = "a"}
%shared = tuple %0, %1, %2
omp.parallel(%shared : tuple<!fir.ref<i32>, !fir.ref<i32>, !fir.ref<i32>>) {
%3 = tuple_get %shared, 2
%4 = tuple_get %shared, 1
%5 = tuple_get %shared, 0
%6 = fir.load %3 : !fir.ref<i32>
%7 = fir.load %4 : !fir.ref<i32>
%8 = addi %6, %7 : i32
fir.store %8 to %5 : !fir.ref<i32>
omp.terminator
}
return
}
}
d2) using tuples, ref type, omp dialect co-existing only with standard type.
module {
func @MAIN_() {
%0 = fir.alloca i32 {name = "c"}
%1 = fir.alloca i32 {name = "b"}
%2 = fir.alloca i32 {name = "a"}
%std_0 = fir.convert %0
%std_1 = fir.convert %1
%std_2 = fir.convert %2
%shared = tuple %std_0, %std_1, %std_2
omp.parallel(%shared : tuple<memref<i32>, memref<i32>, memref<i32>>) {
%3 = tuple_get %shared, 2
%4 = tuple_get %shared, 1
%5 = tuple_get %shared, 0
%6 = std.load %3 : memref<i32>
%7 = std.load %4 : memref<i32>
%8 = addi %6, %7 : i32
std.store %8, %5[] : memref<i32>
omp.terminator
}
return
}
}
iv) Reduction clause
TBD. The GPU dialect has a reduction
-
Normalized operation
i) Define a normalized operation as one with minimal number of clauses and expanded to cover all variables. -
Terminator
i) There is an implicit barrier at the end of a parallel region and all threads in the parallel region will hit this barrier as the last operation. Should this implicit barrier be modelled explicitly?
ii) Should a dummy parallel operation be added for this operation? -
Where should transformations like privatisation be performed?
i) During lowering to MLIR from parse-tree.
ii) In MLIR.
iii) During translation to LLVM IR (using OpenMPIRBuilder). -
Optimisations with parallel region
i) Constant propagation into parallel region.
ii) Removal of adjacent barriers in a parallel region (unlikely to occur in practice).
iii) ? -
An example representation
def ClauseDefault : StrEnumAttr<
"ClauseDefault",
"default clause",
[ClauseDefaultPrivate, ClauseDefaultFirstPrivate, ClauseDefaultShared, ClauseDefaultNone]> {
let cppNamespace = "::mlir::omp";
}
def ClauseProcBind : StrEnumAttr<
"ClauseProcBind",
"procbind clause",
[ClauseProcMaster, ClauseProcClose, ClauseProcSpread]> {
let cppNamespace = "::mlir::omp";
}
def ParallelOp : OpenMP_Op<"parallel">,
Arguments<(ins I1:$if_expr_var,
AnyInteger:$num_threads_var,
OptionalAttr<ClauseDefault>:$default_val,
TupleOf<[AnyMemRef]>:$private_vars,
TupleOf<[AnyMemRef]>:$firstprivate_vars,
TupleOf<[AnyMemRef]>:$shared_vars,
TupleOf<[AnyMemRef]>:$copyin_vars,
OptionalAttr<ClauseProcBind>:$proc_bind_val)> {
let summary = "parallel construct";
let description = [{
The parallel construct includes a region of code which is to be executed
by a team of threads.
}];
let regions = (region AnyRegion:$region);
let parser = [{ return parseParallelOp(parser, result); }];
let printer = [{ printParallelOp(p, *this); }];
}