I think using scf.execute_region makes perfect sense. Thanks for the suggestion! I will keep multiblock bodies in mind from the start.
I am leaning towards keeping the stores and allocating the iteration variables outside the loop. This will be closer to how OpenMP, for example, expects worksharing loops (and maybe OpenACC as well) where we “privatize” the iteration variable with a delayed privatizer. Once we have locality specifiers for do concurrent, we can model these variables using init just like we use private in the OpenMP case. My main points are:
- To more easily bridge the gap between
do concurrentand the target parallelization models we care about: OpenMP and OpenACC. - In the case of sequentializing
do concurrentloops, make code-gen easier to handle.
Let me know if you disagree.
Thanks, I will take a look. For locality specifiers, I hacked what we have in OpenMP to be reused for do concurrent loops. See this PoC. This is just for experimentation, I will write an RFC for shared “Data Management” dialect later (after implementing the proper fir.do_concurrent op
)
Makes sense, thanks for pointing that out!