[RFC] OpenMP dialect representation of num_teams, thread_limit and target SPMD

skatrak · October 1, 2024, 12:31pm

I think the main issue here is that omp.target does not define teams, so it’s not supposed to hold a num_teams clause. Conceptually, target just defines a target task executed by a single thread. It’s true that restrictions regarding the nesting of a teams construct inside of target and the host evaluation of certain teams clauses enable us to produce a single kernel call. But, even then, num_teams refers to the teams construct, so it logically belongs to the omp.teams operation.

If we imagine that omp.target allowed live-ins into the region, I think we both agree this would be the proper representation (the clause is evaluated in the host and attached to the applicable operation):

// #pragma omp target teams num_teams(x)
%num_teams = ... : i32
omp.target {
  omp.teams num_teams(%num_teams : i32) {
    ...
  }
  omp.terminator
}

So, I think the question here is not about where the clause information logically belongs, but rather how do we address the limitation that omp.target being IsolatedFromAbove imposes.

The passthrough approach is about creating an MLIR-only mapping between host-evaluated outside values and values inside of the target region, represented as entry block arguments. This mapping, unlike the map, private and reduction clauses, is not intended to result in actual code generation to allocate, initialize or copy data. It would have the very limited use of linking host-evaluated values with clauses inside of the omp.target. So, the previous example would become this:

// #pragma omp target teams num_teams(x)
%num_teams = ... : i32
omp.target host_eval(%num_teams -> %num_teams_fwd : i32) {
  omp.teams num_teams(%num_teams_fwd : i32) {
    ...
  }
  omp.terminator
}

The verifier for omp.target would make sure that forwarded values are only used in the very restricted cases these are for: num_teams and thread_limit in omp.teams, loop bounds and step of a nested omp.loop_nest (and possibly the num_threads of an omp.parallel) if it’s representing a target SPMD kernel. No other uses of these values would be legal.

In the MLIR to LLVM IR translation of omp.target for the host, we would have an LLVM IR value for these host-evaluated clauses, since their MLIR initialization happened before the omp.target operation, and we would know what they are representing by looking at the uses of the corresponding forwarded value. With that, we are able to store the host values into the proper kernel arguments structure, pass it to the __tgt_target_kernel call or anywhere else we needed to.

The passthrough requires one step more than attaching these clauses directly to omp.target (finding out what values have been forwarded and what clauses they are for), but it keeps the representation consistent with the rules of the dialect.

I think we can leave the target teams representation discussion pending for now, until we close the main discussion. I can just point you to the discussion over composite construct representation, where adding operations for each of them was considered (the “composite operations” approach, which is the same you’re suggesting doing with target teams): [RFC] Representing combined/composite constructs in the OpenMP dialect.

Topic		Replies	Views
omp_set_num_threads() with target region OpenMP	1	333	October 29, 2019
Runtime aborts when OMP_THREAD_LIMIT is set and "target teams" construct is used OpenMP	1	93	August 6, 2019
[RFC] Representing combined/composite constructs in the OpenMP dialect MLIR openmp	15	652	March 11, 2024
OpenMP Worksharing Loop RFC MLIR	14	1298	October 5, 2020
[RFC] - Lowering and codegen of the depend clause on target ops in the omp dialect in MLIR MLIR openmp	16	491	June 3, 2024

[RFC] OpenMP dialect representation of num_teams, thread_limit and target SPMD

Related topics