For the sake of correctness, I will disallow it for now, in the same fashion how target construct avoids this. I think we can allow it under OpenMP constructs but not between the omp.canonical_loop and omp.structured_region later.
- Since canonical loop is not an OpenMP construct per.se, the wording might have to be changed a bit.
- The operation will increase nesting. I don’t think there is any issue here.
- The motivation for this operation could also be specified as modelling the
structured blockconcept in the OpenMP standard.
I will edit the post to reflect this.
I had another concern though. Consider the following code
#pragma omp for
for(i = 0; i < N; i++) {
foo();
#pragma omp tile sizes(4)
for(j = 0; j < M; j++) {
bar();
}
baz();
}
I think the MLIR code for this should look like the following:
%outer, %tiled1, %tiled2= omp.canonical_loop [1, N) {
omp.structured_region {
"foo"() : () -> ()
omp.terminator
}
%inner = omp.canonical_loop [1, M) {
omp.structured_region {
"bar"() : () -> ()
omp.terminator
}
omp.yield
}
%tiled = omp.tile(%inner) { tile_sizes = [4] }
omp.structured_region {
"baz"() : () -> ()
omp.terminator
}
omp.yield(%tiled#0, %tiled#1)
}
This is because, we want to be able to reference %inner (or some transform of it) in the yield operation. In the snippet I have written in the first post, the value %inner is out of scope for omp.yield.
During codegen, we will have to enforce that the transforms for an inner loop occur before the next structured region under it, and we will generate the code for the loop as soon as we encounter the next structured region.
This means that -
omp.canonical_loopcan haveomp.canonical_loopandomp.structured_regionunder it.- All the transforms for an
omp.canonical_loopmust be declared before anotheromp.structured_regionor any other operation. omp.canonical_loopwill have only one basic block, with only one yield instruction.
Does this look okay? Any comments, concerns or suggestions for this are welcome.