OpenMP lowering from PFT to FIR

What is the motivation behind how OpenMP constructs are lowered to FIR? For those constructs that contain executable code (like loops or critical), we seem to create some skeleton in Lower/OpenMP.cpp, then fill it out in Lower/Bridge.cpp, e.g.

Why not have the OpenMPConstruct be lowered entirely (edit: inside of genOMPConstruct), instead of doing it in parts? Was there a reason for that, or did it just happen this way?

Because the Lower/OpenMP.cpp file is mainly dealing with the creation of OpenMP operations from the OpenMP dialect. Construct that have region needs to use the FIR lowering to lower the Fortran code inside those region.

Right, but why doesnā€™t the lowering code in OpenMP.cpp use FIR lowering from Bridge? The OMP and non-OMP code is intertwined, so it would make sense to lower the body of a block from an OpenMP construct that contains that block.

Iā€™m not sure I get what you mean here. The Fortran code contained in an OpenMP construct is lowered by the Bridge. It is possible that the code in Lower/OpenMP.cpp produces some FIR operations but thatā€™s OpenMP specific. The OpenACC lowering has the same design.

Right, but why doesnā€™t the lowering code in OpenMP.cpp use FIR lowering from Bridge?

It does and it is done in the bridge itself.

What I mean is that instead of doing

  void genFIR(const Fortran::parser::OpenMPConstruct &omp) {
    mlir::OpBuilder::InsertPoint insertPt = builder->saveInsertionPoint();
    localSymbols.pushScope();
    genOpenMPConstruct(*this, bridge.getSemanticsContext(), getEval(), omp);
    [...]
    for (Fortran::lower::pft::Evaluation &e : curEval->getNestedEvaluations())
      genFIR(e);
    [...]
  }

have genOpenMPConstruct generate the FIR for the nested evaluations.

That would mean all the genFIR functions need to be made visible which is not the case now.

Visible in OpenMP.cpp (and other dialect-specific files), yes.

That would help avoid the saving/restoring/resetting of the insertion point. It would also provide a natural context for lowering nested constructs (specifically in target, which requires special handling of host symbols, for example).

That would help avoid the saving/restoring/resetting of the insertion point.

You would do that anyway because you need to move the builder inside the region when you generate operation with region and then put the builder after the op once the region has been generated.

Some of it is unavoidable, but some cases arenā€™t:

  // Reset the insert point to before the terminator.
  resetBeforeTerminator(firOpBuilder, storeOp, block);

Edit: At least it would look less like a state machine, and more like local save/restores:

   x = get(...)
   do_something()
   set(x)

For this particular case, I think this is needed because privatization is done during lowering and it has no representation in the dialect. In OpenACC we donā€™t have such stuff.

Anyway I donā€™t have a strong opinion on this but if you want to change this you probably need to open a RFC and get feedback from the people contributing to the OpenMP lowering.

1 Like

Basically, if we create anything inside the region then we will have to set/reset the insertion point. For eg. I think omp.target operations are implemented as isolated from above with block arguments. The
block arguments are the new bindings for symbols and have hlfir.declares generated for them in the region. If our representation for privatisation is to use block arguments in the region of privatisation they will have the same issue I think. For privatisation specifically, we will move to some higher level representation for tasks.

@kparzysz I donā€™t know whether you are facing any specific issue or this was a question just for information.

I want to change it to do what I described aboveā€”for each construct, generate the whole thing recursively, instead doing it one part at the time. I think it would make the code much clearer, but I wanted to see if there are any compelling reasons for the current approach.

Iā€™m actually working on dealing with firstprivate inside of target, and the privatization inside DataSharingProcessor implementation doesnā€™t handle it right (at least in my simple testcase), specifically due to the ā€œisolation from aboveā€.

One reason is to keep the lowerings separate and to disallow OpenMP code generation performing generation of Fortran constructs.

What is your proposal here?
ā†’ Move genOmp functions to FirConverter ?
ā†’ Derive from the FirConverter to create an OpenMPFIRConverter? And create the right converter based on whether the -fopenmp flag is set?
ā†’ Or something else?

Will recursive generation fix the issue fully for you? Or will you have to make large scale changes to the DataSharingProcessor? Since the ā€˜isolated from aboveā€™ situation leads to the usage of block-arguments, you could also consider representing this in the dialect. Trivial types (scalars, static-arrays) can be allocaā€™d later while interfacing with the OpenMPIRBuilder, others will need more information which can be based on the OpenACC recipe generation or wrappers to createHostAssociateVarClone and copyHostAssociateVar procedures.

omp.target private(%a->%a_pvt) {
bb0:(%a_pvt):
}

Iā€™ve thought of the second approach, i.e. converter subclass, or perhaps some other mechanism to allow combining additional converters in the future.

I donā€™t have a specific proposal yetā€”my first goal here is to understand why the code is written this way to make sure Iā€™m not missing something that is not evident in the code. In the first sentence I quoted above you wrote that disallowing OpenMP lowering to generate Fortran constructs was intentionalā€”what was the motivation for this?

The way I see generating code for omp target is something like this (roughly):

genOpenMPTarget(OpenMPConstruct &omp) {
  localSymbols.pushScope(IsolatedFromAbove);  // indicate that it's isolated
  process-directives();
  genOp<omp::TargetOp>(...)
  genFIR(eval);                               // generate the contents recursively
  localSymbols.popScope();
}

If we overload symbol handling (lookup, cloning, etc.) in the extended converter, we could generate appropriate code on demand, instead of having to pre-collect lists of symbols. The handling would then do the right thing for the context (i.e. generating symbol clones across target boundary, vs. on the same host).

The (meta) benefit of doing lowering ā€œrecursivelyā€ is that the code organization follows the structure of the IR being lowered.

That follows from the intention to keep the OpenMP lowering separate from the Fortran lowering. If we think about the other implementation options which are lumping all OpenMP codegen into FIRConverter, or having further derived classes that need to be selected, I think the current organization is a good choice.

I will tag @schweitz to check whether the reasons are different.

Ok, what are your reasons for this?

We have FirOpBuilder that extends mlir::OpBuilder. At the same time weā€™re keeping the building of FIR separate from building the OpenMP extensions to it. Itā€™s the same principle in both cases.

FWIW, (Iā€™m not super involved) you should keep in mind OpenMP might not be only used by Flang/FIR in the future. Whatever that implies.

True, but weā€™re lowering a post-processed Fortran parse tree into [HL]FIR. The OpenMP lowering still relies on functions/types from both, so itā€™s not ā€œpureā€ in that sense.

What I want is to call the FIR builder from the OMP lowering code. The FIR generation would still be encapsulated in the FirOpBuilder, it would just be called from OpenMP.cpp.

The dialect must be generic to support not only FIR/HLFIR but the lowering is specific the flang in this case.

FWIW we have no issue with handling firstpriavte for OpenACC with the currrent lowering and this is mainly due because we have a representation of it in the dialect so the heavy lifting is not done during lowering. So you might want to explore this path too.