Hi all,
We have been working on the openmp “worksharing loop” construct (aka “do”) for
MLIR and would like to start a discussion on a couple of design topics we have
encountered. A separate RFC on the lowering process for this construct will be posted soon.
The OpenMP workshare loop represents a loop whose iterations can be distributed
across a number of threads and executed in parallel.
When developing a construct for this in the openmp MLIR dialect, the question
arises whether to have this as a construct that contains another loop consturct,
or whether to have it as a loop construct itself.
For example:
omp.do {
scf.for %iv = %lb to %ub step %step {
// body
}
}
or:
omp.do %iv = %lb to %ub step %step {
// body
}
The first has the advantage of allowing us to take advantage of optimisations
that already exist in the scf.for operation, or other loop-style operations that
we might want to be contained inside the omp operation (for example fir.do).
However, when the loop body is lowered to the LLVM dialect the loop bounds are
no longer present since LLVM doesn’t have a loop operation itself. As such the
loop indices would need to somehow be remembered so that they can be used when
generating the relevant OpenMP runtime calls with OpenMPIRBuilder[1]. This approach
is also more similar to how the OpenMP operations behave in higher level languages
and is more consistent with the other operations like omp.parallel.
The second resolves this problem but doesn’t allow optimisations to be reused
from other dialects.
Another option would be to have both available, with a preference for mlir to be
generated using the first and then a later lowering step adding the indexes to
the omp.do construct so that they can be used later. Or alternatively the lower
bound, upper bound and step could just be replicated on both constructs every
time (this might be quite error prone, if ever the two sets of values get out of
sync).
Does anyone have any input on this, or possibly any ideas that I haven’t thought
of for resolving this issue?
You can find the current proposed patch and some discussion on the review for it
here: https://reviews.llvm.org/D86071
Thanks!
David Truby
[1] The OpenMP IRBuilder project generates the LLVM IR with runtime calls for an
OpenMP construct. It also aims to unify the OpenMP LLVM IR codegeneration for
Clang and Flang. This is achieved by refactoring the codegen for OpenMP directives
from Clang and placing them in the llvm/frontend directory.