The OpenACC dialect currently keeps track of data operations in various operand
lists which encode semantics based on the name of the list.
acc parallel copyin(var)
===>
acc.parallel copyin(%var) {
// use %var
}
I would like to separate the data clauses into separate MLIR operations:
acc parallel copyin(var)
===>
%accvar = acc.copyin(%var)
acc.parallel dataOperands(%accvar) {
// use %accvar
}
The motivation:
- Friendlier to add attributes such as MemoryEffects to a single operation. This will better reflect semantics (like the fact that an acc.copyin operation only reads). Additionally, operations can be moved/optimized (eg CSE).
- Correctly represent dataflow. Currently, a write to %var in the acc region appears as a mutation to the original variable. But when compiling this code for GPU, a mutation to var inside acc region is not visible outside of region (unless there is a copyout)
- Easier to keep track of debug information. Line location can point to the text representing the data clause instead of the construct (as it does right now). Additionally, we can use attributes and keep track of variable names in clauses without having to walk the IR tree in attempt to recover the information (this makes acc dialect more agnostic with regards to what other dialect it is used with).
Data clause operations which have semantics at both entry and exit of region will be decomposed:
acc copy => acc.copyin (before region) + acc.copyout (after region)
acc copyout => acc.create (before region) + acc.copyout (after region)
acc attach => acc.attach (before region) + acc.detach (after region)
acc create => acc.create (before region) + acc.delete (after region)
Here is the format I would like to recommend for acc data entry operations (acc.copyin, acc.deviceptr, acc.present, acc.create):
let arguments = (ins OpenACC_PointerLikeTypeInterface:$varPtr,
Optional<OpenACC_PointerLikeTypeInterface>:$varPtrPtr,
Variadic<OpenACC_DataBoundsType>:$bounds,
DefaultValuedAttr<OpenACC_DataClauseEnum,clause>:$dataClause,
OptionalAttr<OpenACC_DataClauseEnum>:$decomposedFrom,
DefaultValuedAttr<BoolAttr, "true">:$structured,
DefaultValuedAttr<BoolAttr, "false">:$implicit,
OptionalAttr<StrAttr>:$name);
let results = (outs OpenACC_PointerLikeTypeInterface:$accPtr);
Explanation of the non-evident various pieces:
- varPtr - the address of the variable to copy.
- varPtrPtr - Used for data structures with pointers. This is important for OpenACC due to implicit attach semantics on data clauses (2.6.4).
- bounds - used when copying just slice of array or array’s bounds are not encoded in type
- decomposedFrom - keeps track of original user specified clause such as “copy”
- structured - flag to note whether this is associated with structured region (parallel, kernels, data) or unstructured (enter data, exit data). This is important due to spec specifically calling out structured and dynamic reference counters (2.6.7).
Data exit operations (acc.copyout, acc.delete, acc.detach) and acc.attach will have a similar format as the entry operations except they won’t produce a result.
let arguments = (ins Optional<OpenACC_PointerLikeTypeInterface>:$varPtr,
Optional<OpenACC_PointerLikeTypeInterface>:$accPtr, …
So an acc.copyout operation will be modeled like a “store”:
acc parallel copy(var)
===>
%accvar = acc.copyin(%var)
acc.parallel dataOperands(%accvar) {
// use %accvar
}
acc.copyout(%var, %accvar)
Currently the acc dialect also has no way of representing bounds information. When used with flang, a FIR descriptor is created to encode bounds and then the descriptor is placed in the copy operand. This coupling is not ideal since the bounds are an OpenACC language concept (2.7.1). Thus I would like to also recommend creating an acc.bounds operation (use of its result is shown in the data entry operation specification above):
let arguments = (ins Optional<AnyType>:$lowerbound,
Optional<AnyType>:$upperbound,
Optional<AnyType>:$extent,
Optional<AnyType>:$stride,
DefaultValuedAttr<BoolAttr, "false">:$strideInBytes,
Optional<AnyType>:$startIdx);
let results = (outs OpenACC_DataBoundsType:$result);
Explanation of non-evident pieces:
- Both/either upperbound and extent can be specified. In the OpenACC spec 2.7.1, for C++ extent is used in the acc clause, while for Fortran upperbound is used. That said, FIR uses extent in its descriptor. So flexibility is useful.
- The strideInBytes flag and startIdx are more useful for Fortran since a slice of derived type may be a stride that is not divisible by element size. Additionally, having startIdx allows the possibility of better error messages that match the user’s code (instead of assuming zero based array).
Let me know if you have any questions or suggestions.