RFC for omp.target construct

This post is aimed at starting a discussion for the design and implementation of the target operation in the OpenMP dialect('omp' Dialect - MLIR). The specification of the target construct can be found in Section 2.12 of the OpenMP 5.0 specification document.
https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf.

If one looks at the earlier OpenMP construct implementations for the OpenMP dialect, the omp.parallel operation is very close and can be used as a template for the target operation.

The current effort for OpenMP dialect is towards Flang and OpenMP target support has not been implemented in the front end yet. The following path is being adopted by the community to lower the OpenMP support to LLVM IR for Flang.

[Fortran code] → Parser → [AST] → Lowering → [FIR + OpenMP MLIR] → Conversion → [LLVM + OpenMP MLIR] → Translation (Use OpenMPIRBuilder) → [LLVM IR]

OpenMPIRBuilder does not support the OpenMP target construct yet. Based on the current support, one possible path to implementing the target operation is:

[C code] → Parser → [AST] → Lowering → [OpenMP MLIR] → Conversion → [LLVM + OpenMP MLIR] → [LLVM IR]

Since MLIR operations are front-end agnostic, the implementation can be used by the Flang front end when it has the target support.

Regarding design issues:

  1. Representation of the target Construct: The construct can be represented by an operation (omp.target) with one region.
  2. Representation of the clauses: It should be similar to the omp.parallel operations
    a. [OpenMP] Parallel Operation design issues
    b. 'omp' Dialect - MLIR

omp.target (::mlir::omp::TargetOp)

target construct

Map variables to a device data environment and execute the construct on that device

The $if_expr_var parameter specifies a Boolean result of a conditional check. If this value is 0, the target region is executed by the host device in the host data environment.

The $num_device_var specifies the device ID which is a non-negative integer value less than the value of omp_get_num_device().

The $default_map_val attribute specifies the mapping rule from firstprivate to tofrom for scalar variables.

The $private_vars and $firstprivate_vars parameters are a variadic list of values that specify the data-sharing attribute of those values.

The $map_var attribute specifies the data variables in the list to be explicitly mapped from the original variables in the host data environment to the corresponding variables in the device data environment of the device specified by the construct.

The $in_reduction_var specifies target task variables that are subject to reduction operation at the end of the regions. The GPU dialect has a reduction clause.

The $is_device_ptr_var indicates the data variables that are device pointers that exist within the device data environment.

The $depend_var establishes scheduling dependencies between the target task and sibling tasks that share a list of variables.

The $allocate_var specifies an integer expression of omp_allocator_kind.

The $uses_allocators_var specifies $allocate_var available in the target region.

The $ancestor_var specifies the parent device.

The $device_num_var indicates the device number.

Thanks @abidmalikwaterloo for this RFC. Great to see interest and progress in the OpenMP target side.

For the host-side constructs we are using the OpenMP IRBuilder during translation (LLVM dialect → LLVM IR) for lowering the OpenMP operations. As you might know, @clementval is working on the OpenACC dialect which lowers by making calls to the OpenMP runtime. The initial plan is to lower the data mapping constructs like enter_data which is similar to omp target enter data map. I believe Valentin is planning to implement it at the translation phase and use the OpenMP IRBuilder. Would you like to join efforts with Valentin and implement the data mapping constructs first and also implement using the OpenMP IRBuilder?

The representation of if, device and nowait clauses is straightforward. We are currently reviewing the way data-sharing clauses are being represented and handled in the OpenMP dialect. It might be good to wait till we have clarity on these clauses (private, firstprivate, in_reduction). It might also be good to spend some time thinking about the representation of the other clauses (depend, map, alloc, is_device_ptr). Like what should be the representation of dependencies? Should this have a specific type or can be anytype? The async dialect has a value and token-type and there is an operation async.await ('async' Dialect - MLIR).

FYI: @jdoerfert @clementval @kirankumartp @SouraVX

@kiranchandramohan Thank you, Kiran for the comments and feedback. I am already in contact with Valentin for the target construct and how the two dialects’ operations and paths can benefit from each other. Data mapping clauses seem the right first step in this direction. I am thinking of implementing the basic omp.target operation in MLIR first. By then the data mapping clause for OpenACC dialect will be done.

Thanks for the RFC! I have several comments.

The largest concern is typing. MLIR does not have the concept of a “variable” and at least half of the proposal uses it. Please clarify how OpenMP concepts map to MLIR concepts, namely the relation between variables and immutable SSA values as well as their types.

The second-largest is the relation with OpenMPIRBuilder. While I myself won’t oppose having a temporary flow that emits LLVM IR directly in the OpenMP dialect translation, we need a clear path and consensus on moving that functionality to OpenMPIRBuilder as soon as possible. Otherwise we create a double load on the maintainers of the translation who will have to understand and support both the custom translation code and OpenMPIRBuilder code. The alternative is to emit all of LLVM IR directly from MLIR (using an openmp-to-llvm dialect conversion, rather than translation) bypassing OpenMPIRBuilder, but this is against the current consensus. But please let’s not mix the two in the long run.

Global naming comment: I would consider dropping the _var suffix from op arguments unless it means something specific. For example, I can understand why private_vars has the suffix – it lists the equivalent of private variables, – but not why ancestor_var has it – it is merely the identifier of a device, which isn’t even mutable.

Which types are supported? There’s no “Boolean” in upstream MLIR.

Can this be an SSA value or is it always a constant (at which point it should be an attribute)?
omp_gen_num_device does not exist in MLIR.
What types are supported here?

There are no variables in MLIR.
I suppose the “implicit-behavior” part of the clause is specified as an enum attribute, please clarify if it is something else.
It is unclear to me how the “variable-category” part is represented here and how it maps to the MLIR’s open type system, i.e. what qualifies as “scalar”, “aggregate”, “allocatable” and “pointer” and how these are made available to MLIR types other than those representing Fortran types.

Again, there are no variables in MLIR. What happens when somebody wants to map a variable “from” a device to host. Do we define a new SSA value that is return from omp.target? Do we expect a pointer-like (define what pointer-like types are?) value as argument and modify the memory it points to? Something else?

There is no support for reductions in the OpenMP dialect so it’s unclear how this can identify reductions, where the results are stored (remember, no variables), etc.

The fact that GPU dialect has a reduction operation doesn’t sound relevant here. Several MLIR dialects have reduction-like constructs, including Affine, Linalg and SCF.

Can’t we just define such pointers in the region of omp.target?

I suppose this is another enum attribute, please clarify.

Specifies how? Another integer number?

The largest concern is typing. MLIR does not have the concept of a “variable” and at least half of the proposal uses it. Please clarify how OpenMP concepts map to MLIR concepts, namely the relation between variables and immutable SSA values as well as their types.

---------->

Thank you for the detailed reply and feed back. First, I misused the terminology . I wanted to say “operands” instead of variables. I was using the following discussion as a reference for this operation:

We can consider “variadic operands” for this operation as well.

----------------->

The second-largest is the relation with OpenMPIRBuilder. While I myself won’t oppose having a temporary flow that emits LLVM IR directly in the OpenMP dialect translation, we need a clear path and consensus on moving that functionality to OpenMPIRBuilder as soon as possible. Otherwise we create a double load on the maintainers of the translation who will have to understand and support both the custom translation code and OpenMPIRBuilder code. The alternative is to emit all of LLVM IR directly from MLIR (using an openmp-to-llvm dialect conversion, rather than translation) bypassing OpenMPIRBuilder, but this is against the current consensus. But please let’s not mix the two in the long run.

----->

I brought up this discussion because OpenMPIRBuilder is still under development and does not fully support the target construct. However, I fully agree that we should not bypass the path that includes OpenMPIRBuilder. Kiran also suggested the same. Since data scope operations for OpenACC dialect will be supported soon, we can start by using them first.

---->

For the following comments, the main confusion arises because of the term “variable” used in the discussion. As I mentioned earlier, these are operands. I am reproducing your comments here for the parallel operation which are also true for this operation:

“More importantly, OpenMP lives in core MLIR and should not be anyhow bound to FIR. I would suggest these clauses to accept any type since it cannot know whether a type defined in an external dialect has reference semantics (unless we implement something like type interfaces).”

I think it makes sense to make operands “anytype”. However, as kiran mentioned we need to think and discuss before adopting it for every operand.

Kiran comments

“The representation of if, device and nowait clauses is straightforward.”

—>These operands can be “anytype”.

“We are currently reviewing the way data-sharing clauses are being represented and handled in the OpenMP dialect. It might be good to wait till we have clarity on these clauses (private, firstprivate, in_reduction).”

—>I will wait . Although “anytype” would work??? I would like to consult the group before moving ahead.

“ It might also be good to spend some time thinking about the representation of the other clauses (depend, map, alloc, is_device_ptr). Like what should be the representation of dependencies? Should this have a specific type or can it be anytype? The async dialect has a value and token-type and there is an operation async.await (‘async’ Dialect - MLIR 1).

—> I am not sure if there is a thread that discusses this already. I would like to see where the discussion is heading.

Global naming comment: I would consider dropping the _var suffix from op arguments unless it means something specific. For example, I can understand why private_vars has the suffix – it lists the equivalent of private variables, – but not why ancestor_var has it – it is merely the identifier of a device, which isn’t even mutable.

----> Thanks. I will do that.

abidmalikwaterloo:

The $if_expr_var parameter specifies a Boolean result of a conditional check.

Which types are supported? There’s no “Boolean” in upstream MLIR.

—> “anytype”

abidmalikwaterloo:

The $num_device_var specifies the device ID which is a non-negative integer value less than the value of omp_get_num_device().

Can this be an SSA value or is it always a constant (at which point it should be an attribute)?

omp_gen_num_device does not exist in MLIR.

What types are supported here?

----> device num is an integer type but it is not constant. omp_gen_num_device() is a runtime routine. It will not be called within the operation. OpenACC has the same service. Would like to see how OpenACC dialect is handling it. As far as I know acc.parallel does not support acc_set_device_num().

abidmalikwaterloo:

The $default_map_val attribute specifies the mapping rule from firstprivate to tofrom for scalar variables.

There are no variables in MLIR.

I suppose the “implicit-behavior” part of the clause is specified as an enum attribute, please clarify if it is something else.

It is unclear to me how the “variable-category” part is represented here and how it maps to the MLIR’s open type system, i.e. what qualifies as “scalar”, “aggregate”, “allocatable” and “pointer” and how these are made available to MLIR types other than those representing Fortran types.

—> it makes sense to treat it as::mlir::StringAttr

abidmalikwaterloo:

The $map_var attribute specifies the data variables in the list to be explicitly mapped from the original variables in the host data environment to the corresponding variables in the device data environment of the device specified by the construct.

Again, there are no variables in MLIR. What happens when somebody wants to map a variable “from” a device to host. Do we define a new SSA value that is return from omp.target? Do we expect a pointer-like (define what pointer-like types are?) value as argument and modify the memory it points to? Something else?

—> As suggested by kiran we need more thinking before finalizing it.

abidmalikwaterloo:

The $in_reduction_var specifies target task variables that are subject to reduction operation at the end of the regions. The GPU dialect has a reduction clause.

There is no support for reductions in the OpenMP dialect so it’s unclear how this can identify reductions, where the results are stored (remember, no variables), etc.

The fact that GPU dialect has a reduction operation doesn’t sound relevant here. Several MLIR dialects have reduction-like constructs, including Affine, Linalg and SCF.

—> Agreed. Currently it is not supported for parallel operation as well. I would defer this as suggested by Kiran

abidmalikwaterloo:

The $is_device_ptr_var indicates the data variables that are device pointers that exist within the device data environment.

Can’t we just define such pointers in the region of omp.target?

—> As suggested by kiran we need more thinking before finalizing it.

abidmalikwaterloo:

The $allocate_var specifies an integer expression of omp_allocator_kind.

I suppose this is another enum attribute, please clarify.

—> Your observation is correct but , we need more thinking before finalizing it.

abidmalikwaterloo:

The $ancestor_var specifies the parent device.

Specifies how? Another integer number?

—> It’s an integer type.

Are we not clear about the type of private or firstprivate clauses or the scope of these clauses? alloc, is_device_ptr can be TupleOf<[AnyMemRef]>. depend and map clauses need more thinking. Any suggestions or thoughts on these clauses.
At this stage, I can submit the target operation using if_cond, device, and no wait clauses/operands

We are considering whether,

  1. private should have a representation in MLIR or whether it should be dissolved to allocas/allocs by the frontend preceding MLIR.
  2. private like clauses should only work with variables and not values and how will we enforce this in the spec.

I don’t know whether AnyMemRef will cover FIR refs.

I think the locator-list items in depend can be Variadic:$dependVars. I guess there can be multiple depend clauses each with a different dependence-type. So i guess we can model the dependence type as some kind of ArrayAttribute, with each entry corresponding to dependVars. That leaves the modifiers which were introduced in OpenMP 5.0.

OK with me.

Operands are still uses of SSA values, we need to be careful if the semantics of the operation mutates them, they types of operands need to have some sort of generic memory reference semantics that scales across dialects (probably a type trait or interface).

Sure, like I said, I’m fine with temporary working solutions as long as there is a viable pass forward. Personally, I’ve added some OpenMP features in both the OpenMPIRBuilder and the MLIR dialects.

Yes, “anytype” will technically work but we need to be careful with mutation like I mentioned above and, apparently, in the comment on another thread. Short of the final solution, precisely documenting the contract that the types of these operations should respect is the way to go.

What happens if the condition operand is of type bf16? !llvm.struct<"foo", opaque>? Again, careful specification of what is expected of the type would be helpful.

I generally think tuple is not the right abstraction here. You just want a variadic operand list. Note that there is no “built-in” way of constructing a tuple type so the dialect will have to provide one just to be able to use the target operation.

The answer to this depends on whether you want to transform code within MLIR, at which point you are better off with having a special representation, or not, at which point you don’t. My preference is on having the special representation because it doesn’t assume there is a frontend immediately before the OpenMP dialect.

Would having a PointerLike or MemoryReference type interface be sufficient? From what I understand, the main issue is being able to mutate data, not even sure we actually need to take its address. This is the same trade-off that I discussed in the reduction modeling thread - it’s possible to introduce “variables” in the flow but it significantly complexifies any optimization flow because we need to deal with memory-induced dependencies, aliasing, etc.

It won’t, it has no knowledge of FIR or any other non-core dialect. Neither it should. Again, having an interface that is common to memref and FIR types sounds like a good compromise here.

I have posted an RFC to discuss this [RFC] Privatisation in OpenMP dialect

I think a reference or pointerlike interface should be sufficient. Can it just be a TypeConstraint?

1 Like

Thanks!

I don’t think a TypeConstraint suffices. One, it doesn’t exist beyond ODS, it decays to ifs in the verifier. Two, a constraint needs to know about specific types (unless they implement an interface, at which point it only needs to know about the interface), and I don’t think core MLIR would accept code that has FIR types hardcoded.

How FIR treats the map clause? Should it be treated as an operation?

I believe @clementval is working on data-mapping constructs/clauses in OpenACC. I think he is writing some conversion patterns to bring the FIR data types into a format suitable for lowering to LLVM.

Yes, I’m almost done with that. The idea is to extract the information we need for the runtime when we still have information about the original FIR type and we know the transformation applied. In the end the operands are normalized to be LLVM types that the translation understand and can map to the runtime.
The same conversion for memref is done here as an example: ⚙ D102170 [mlir][openacc] Conversion of data operand to LLVM IR dialect

Since the map clause is required on target data directives (enter/exit and the region one). I would not handle this as a separate operation. It should be one operation where map variables are operands devided into segments. If you model your operation in a similar way than acc.enter_data, acc.exit_data and acc.data you have almost no work to do for the conversion and translation as you can reuse the code directly. We might need to put it in a better place and generalize some part but the translation should not differ that much besides the flag to apply to each operands.