I think the main issue here is that omp.target
does not define teams, so it’s not supposed to hold a num_teams
clause. Conceptually, target
just defines a target task executed by a single thread. It’s true that restrictions regarding the nesting of a teams
construct inside of target
and the host evaluation of certain teams
clauses enable us to produce a single kernel call. But, even then, num_teams
refers to the teams
construct, so it logically belongs to the omp.teams
operation.
If we imagine that omp.target
allowed live-ins into the region, I think we both agree this would be the proper representation (the clause is evaluated in the host and attached to the applicable operation):
// #pragma omp target teams num_teams(x)
%num_teams = ... : i32
omp.target {
omp.teams num_teams(%num_teams : i32) {
...
}
omp.terminator
}
So, I think the question here is not about where the clause information logically belongs, but rather how do we address the limitation that omp.target
being IsolatedFromAbove
imposes.
The passthrough approach is about creating an MLIR-only mapping between host-evaluated outside values and values inside of the target region, represented as entry block arguments. This mapping, unlike the map
, private
and reduction
clauses, is not intended to result in actual code generation to allocate, initialize or copy data. It would have the very limited use of linking host-evaluated values with clauses inside of the omp.target
. So, the previous example would become this:
// #pragma omp target teams num_teams(x)
%num_teams = ... : i32
omp.target host_eval(%num_teams -> %num_teams_fwd : i32) {
omp.teams num_teams(%num_teams_fwd : i32) {
...
}
omp.terminator
}
The verifier for omp.target
would make sure that forwarded values are only used in the very restricted cases these are for: num_teams
and thread_limit
in omp.teams
, loop bounds and step of a nested omp.loop_nest
(and possibly the num_threads
of an omp.parallel
) if it’s representing a target SPMD kernel. No other uses of these values would be legal.
In the MLIR to LLVM IR translation of omp.target
for the host, we would have an LLVM IR value for these host-evaluated clauses, since their MLIR initialization happened before the omp.target
operation, and we would know what they are representing by looking at the uses of the corresponding forwarded value. With that, we are able to store the host values into the proper kernel arguments structure, pass it to the __tgt_target_kernel
call or anywhere else we needed to.
The passthrough requires one step more than attaching these clauses directly to omp.target
(finding out what values have been forwarded and what clauses they are for), but it keeps the representation consistent with the rules of the dialect.
I think we can leave the target teams
representation discussion pending for now, until we close the main discussion. I can just point you to the discussion over composite construct representation, where adding operations for each of them was considered (the “composite operations” approach, which is the same you’re suggesting doing with target teams
): [RFC] Representing combined/composite constructs in the OpenMP dialect.