[RFC][OpenMP] Splitting combined/composite directives in lowering

kparzysz · January 24, 2024, 4:48pm

Background:
OpenMP allows certain combinations of directives to be listed in a single source code directive, for example PARALLEL and SECTIONS may be specified together as PARALLEL SECTIONS. Intuitively, these “aggregated” directives apply as if they were specified individually (although some combinations may not be possible to express that way in the source code). The OpenMP spec refers to them as “combined” or “composite” (depending on some details), but for the purpose of this RFC they are treated equally. The individual directives are referred to as “leafs”.

Such aggregated directives may also have a list of clauses, where each clause may apply only to a subset of the leaf directives. The exact rules are described in the OpenMP spec.

Current situation:
Each permissible combination of leaf directives is treated as an individual directive. Looking at the enumeration defining directive identifiers, there are OMPD_target, OMPD_teams, OMPD_distribute, and so on, but also OMPD_target_teams, OMPD_target_teams_distribute, OMPD_target_teams_distribute_parallel_do, etc.

When lowering an OpenMP construct, each construct-handling routine tries to deal with the aggregated directives on its own, creating implied OMP dialect operations as needed, and leaving applying clauses to these individual operations. For example, in genOMP for a loop construct, we pass the full clause list to gen<Operation>Op for every constituent operation recognized.

This approach makes handling of clauses complicated, since whether a given clause applies to a given directive may depend on whether it applies to other leaf directives in the aggregated directive.

Proposal:
I propose that each aggregated directive with a list of clauses is broken up into individual leaf directives, where each leaf directive will have a list of clauses that should apply to it according to the OpenMP spec. This way

The application of OpenMP rules would be implemented in one place.
Each operator-generating function would simply apply the clauses that were present without needing to be concerned about other directives. This would simplify data sharing, since the outerCombined flag would no longer be necessary.
Each construct/operator handling function would not need to check for other constituent directives, since each directive (after the break-up) will be a leaf directive.

kiranchandramohan · January 24, 2024, 5:02pm

While it might be OK for the purpose of this RFC. It is probably desirable to have separate operations for composite directives since their semantics is different from the individual application of the leaf directives.

How do you plan to split the clauses to the leaves? Are there a set of generic rules? Can you infer that from OMP.td (llvm-project/llvm/include/llvm/Frontend/OpenMP/OMP.td at main · llvm/llvm-project · GitHub)? Or should this be hardcoded?

outerCombined was a fallout of handling the structured and unstructured regions differently. How does it apply here?

kparzysz · January 24, 2024, 5:49pm

Right. I imagine this break-up to produce a queue of leafs, each with a list of clauses. This queue would be visible to each op generator function, so they can look forward in it to see if they the current queue item is a part of a composite directive. For operators that cannot be composed in that sense, their generator functions wouldn’t need to look further in the queue.

I actually added the list of leafs to the .td file for OpenMP/OpenACC directives (they use the same base .td file). It’s empty by default. For all directives with a non-empty set of leafs, the auto-generated function getLeafConstructs will return the ordered list (SmallVector) of leaf directives (i.e. their integer id’s).

It seemed to control privatization in the presence of certain combined constructs. What I meant is that the clause handling (including data sharing) will no longer have to pay attention to combinations of directives, I haven’t looked too deeply into the connection between outerCombined and unstructured regions.

clementval · January 24, 2024, 9:13pm

I’m not sure I follow where this is fundamentally different to what you have today? I guess a little refactoring of the lowering functions and you could achieve the same results without changing the parse tree representation and centralize the handling of each directive/clauses.

In OpenACC there is also some combined construct and they are represented as a single node in the parse tree.

!$acc parallel loop
// becomes
OpenACCCombinedConstruct -> directive = parallel_loop

Then in lowering we have a function that handle lowering of the parallel construct and one that handle the loop one. This is the same whether the loop construct is nested or combined or just a single directive. The full list of clauses is passed to both the parallel lowering and the loop lowering. Each of them just discard the one they don’t support.

In the long term we might need to carry the information done to MLIR whether the construct was combined or composite. Is your new parse tree representation still able to conserve this information?

If you have a combined PARALLEL SECTIONS vs. a SECTIONS nested in a PARALLEL, will you end up with different parse-tree representation and lowering?

kparzysz · January 24, 2024, 9:52pm

I’m not planning to change the parse tree. The separation would be entirely internal to the lowering of OpenMP.

That’s roughly what I’m proposing.

The OpenMP spec has two full pages detailing what happens with clauses on combined/composite directives. What happens with a given object may depend on whether was present on some other clause, and whether a specific leaf directive was a constituent of the aggregated one. I’m trying to implement it all in one place. The RFC is just about how I’m planning to do it.

clementval · January 24, 2024, 9:54pm

Ok, thanks for the clarification. This was not clear to me since you mention change to the .td files and these are mostly used by parsing/semantic for both flang and clang.

kparzysz · January 24, 2024, 9:57pm

I see. That’s the .td file in llvm/Frontend.

In “llvm/include/llvm/Frontend/Directive/DirectiveBase.td”

@@ -152,6 +152,9 @@ class Directive<string d> {
   // List of clauses that are required.
   list<VersionedClause> requiredClauses = [];

+  // List of names of leaf constituent directives.
+  list<string> leafs = [];
+
   // Set directive used by default when unknown.
   bit isDefault = false;
 }

Then in “llvm/include/llvm/Frontend/OpenMP/OMP.td”, for each aggregated directive I specify the list of leafs, for example:

@@ -771,6 +771,7 @@ def OMP_TargetParallel : Directive<"target parallel"> {
     VersionedClause<OMPC_OMPX_DynCGroupMem>,
     VersionedClause<OMPC_ThreadLimit, 51>,
   ];
+  let leafs = ["target", "parallel"];
 }
 def OMP_TargetParallelFor : Directive<"target parallel for"> {
   let allowedClauses = [

clementval · January 24, 2024, 10:02pm

Ok I see. I misread your RFC and thought you wanted to carry this change in the parse tree node.

Would having the corresponding directive from this file give you more information like the clauses that each “leaf” support.

list<Directive> leafs = [];

// ...

let leafs = [OMP_Target, OMP_Parallel];

kparzysz · January 24, 2024, 10:06pm

That may be better actually. I wasn’t sure if TableGen would allow it, so I made it use strings, but your form is nicer.

Edit: I’ll make it the way you propose before creating PRs for this.

clementval · January 24, 2024, 10:09pm

You can have list of class that are defined in the td file. The different list of allowed clauses are all list of VersionedClause class that is a Clause with a version information.

Maybe tablegen will not allow it because it’s list of Directive inside the Directive class. I have never tried.

kparzysz · January 30, 2024, 9:03pm

I have a draft PR that implements the core of this functionality (well, most of it). It could definitely use some feedback, so please take a look if you’re interested in this work.

clementval · January 31, 2024, 5:52pm

To follow up after today’s call.

This is the place where we have different semantic based on the combined construct vs. normal construct in OpenACC lowering.

github.com

llvm/llvm-project/blob/254287658f4b3e413d47870c1cca422071b398ff/flang/lib/Lower/OpenACC.cpp#L2295-L2311


      
          // A reduction clause on a combined construct is treated as if it appeared
          // on the loop construct. So don't generate a reduction clause when it is
          // combined - delay it to the loop. However, a reduction clause on a
          // combined construct implies a copy clause so issue an implicit copy
          // instead.
          if (!outerCombined) {
            genReductions(reductionClause->v, converter, semanticsContext, stmtCtx,
                          reductionOperands, reductionRecipes);
          } else {
            auto crtDataStart = dataClauseOperands.size();
            genDataOperandOperations<mlir::acc::CopyinOp>(
                std::get<Fortran::parser::AccObjectList>(reductionClause->v.t),
                converter, semanticsContext, stmtCtx, dataClauseOperands,
                mlir::acc::DataClause::acc_reduction,
                /*structured=*/true, /*implicit=*/true);
            copyEntryOperands.append(dataClauseOperands.begin() + crtDataStart,
                                     dataClauseOperands.end());

Basically when we have a !$acc parallel loop, we go through:

createComputeOp
createLoopOp

Each function knows what clauses to process. Of course you have to carry somehow the information about combined vs. not combined.
In that case the reduction clause has a different meaning on the parallel op than on the loop op. This works because we can represent both reduction and copyin in the IR so it doesn’t need to be handled while lowering.

kparzysz · January 31, 2024, 8:08pm

Thanks for the link.

This is the part that gets complicated in OpenMP. At the moment in llvm/Frontend/OpenMP we have 42 combined/composite constructs, and having an individual function that deals with each of them could be a bit cluttered (even if we cut it roughly by half by treating the presence of SIMD as a parameter).

There is also the alternative of just creating an op as-is, with the clauses attached more-or-less verbatim, and then rewriting it later. In such case we should treat all ops the same, that is apply the clauses in the same MLIR-level pass. This would fundamentally change how OpenMP lowering is done, but if there is a consensus that it’s the right way, we could definitely do it.

In the shorter term I’ll do something in-between (i.e. having clause representations that are easier to work with), then we can discuss it some more when it’s closer to being ready.

clementval · January 31, 2024, 9:46pm

We don’t have a function per combination but a single function to lower parallel and a single to lower loop.

That would be the better approach IMO.

kiranchandramohan · January 31, 2024, 10:21pm

Among the combined constructs which ones are you facing difficulty with handling the same clause multiple times? If the SymbolMap is pushed for each of the leaf constructs and they get different MLIR bindings, would it work? You might have to pull out some of the privatization code also into processPrivate, processFirstPrivate etc.

You will have to make a list of all the clauses that are required and see what is the support at the OpenMP dialect. Is it only the data-sharing clauses that are required? You will have to wait atleast till there is support for data-sharing clauses in the dialect.

kparzysz · February 1, 2024, 12:32am

Yeah, got it now. I misinterpreted your comment.

kparzysz · February 13, 2024, 4:56pm

I created a stack for the intermediate representation of clauses. It introduces a set of classes for clauses, and updates the rest of the code to use them instead of the parser classes. There are quite a few changes, but they are mostly mechanical. There are a couple of functions shared between OpenMP and OpenACC which took parser::[Omp|Acc]Object, which I converted to use evaluate::MaybeExpr.

[flang][OpenMP] Implement flexible OpenMP clause representation by kparzysz · Pull Request #81621 · llvm/llvm-project · GitHub
[flang][OpenMP] Convert unique clauses in ClauseProcessor by kparzysz · Pull Request #81622 · llvm/llvm-project · GitHub
[flang][OpenMP] Convert repeatable clauses (except Map) in ClauseProc… by kparzysz · Pull Request #81623 · llvm/llvm-project · GitHub
[flang][Lower] Convert OMP Map and related functions to evaluate::Expr by kparzysz · Pull Request #81626 · llvm/llvm-project · GitHub
[flang][OpenMP] Convert processTODO and remove unused objects by kparzysz · Pull Request #81627 · llvm/llvm-project · GitHub
[flang][OpenMP] Convert DataSharingProcessor to omp::Clause by kparzysz · Pull Request #81629 · llvm/llvm-project · GitHub

kiranchandramohan · February 13, 2024, 7:11pm

Thanks @kparzysz for the update.

Could you reply to this comment? Or did I misunderstand the requirement here. An example will be useful.

kparzysz · February 13, 2024, 7:46pm

Your comment you quoted above was in response to my post here. I thought that Valentin was saying there that they have a function for each combination of leaf directives that knows how to process the clauses attached to that combination. That interpretation wasn’t correct, but my reply (“this is where it gets complicated”), was regarding the fact that OpenMP has too many combinations to handle each of them separately.

As for having the same clause multiple times—do you mean the exact same clause (i.e. with the same symbols, etc.), or a clause with the same id, but different parameters. The first one we should filter out, but the second one shouldn’t create any problems, as long as the clause should be applied. The issue with clauses on combined/composite directives is that sometimes a given clause should only be applied to a specific leaf, even if it’s valid for other leafs. I don’t have an example on hand, but section 17.2 gives some ideas: the firstprivate clause could be applied to teams, but not if distribute is present.

I’m not aware of any problems that could be caused by changing bindings for a given symbol. As a matter of fact this is what we will need to do in some cases (e.g. in “target”). If each binding correctly represents the semantics of the symbol in its construct, then everything should work fine.

kiranchandramohan · February 14, 2024, 12:39pm

I was thinking that the issue you are facing is that it is difficult to process clauses with names/symbols attached to it multiple times for combined constructs and hence it is desirable to create copies of clauses. If that is not the case, why do we need to create copies of clauses, is it just not enough to tag for each leaf construct whether it should be applied or not using an auxillary Map datastructure?

Topic		Replies	Views
[RFC] Representing combined/composite constructs in the OpenMP dialect MLIR openmp	15	602	March 11, 2024
[RFC] Compound constructs in OpenMP Clang Frontend clang , openmp	1	60	June 24, 2024
OpenMP Semantic checks for combined constructs Flang	2	103	November 27, 2019
Flang Technical Call : Summary of presentation on OpenMP for Flang Flang	12	121	September 27, 2019
Proposal: Resolve combined directives in parsing phase OpenMP	12	105	June 9, 2017

[RFC][OpenMP] Splitting combined/composite directives in lowering

Related Topics