[RFC] Disambiguation between loop and block associated omp::ParallelOp

Thank you @kiranchandramohan for sharing your thoughts.

The main problem is that omp.parallel has the OutlineableOpenMPOpInterface and LoopWrapperInterface, which are functionally incompatible. The first is used to find blocks where allocas are inserted during lowering and certain transforms (e.g. FirOpBuilder::getAllocaBlock() or AllocMemConversion::findAllocaInsertionPoint()), and the second disallows that kind of use.

So, this solution removes the loop wrapper interface, which is a “static property” of the operation. This forces us to hoist omp.parallel out of an omp.distribute + omp.wsloop loop wrapper nest, because leaving it in the middle of two wrappers would be an invalid MLIR representation. What this means in practice is that the “hoisted omp.parallel representation” would be the only way to represent distribute parallel do/for. To pretend omp.parallel was a loop wrapper while constructing an initial MLIR representation, we’d have to initially produce something that wouldn’t pass op verifiers. My guess is that this wouldn’t work with handwritten MLIR, since I suppose it would have to pass verification as a first step.

I think the way to achieve something like that would involve creating a separate parallel wrapper op, which would later be hoisted and swapped for the non-wrapper version. The only difference would be that temporary allocas created in lowering would have already materialized into the parent function rather than the parallel op’s region. Though, if we were to add that operation, then there wouldn’t be a need for hoisting the parallel region out at all; we could just make that the canonical representation too (this is option 1 in my other comment above).