Thanks for pointing this.
This question is better to ask here
Namely, there seems to be one "fini" call for each "next" call. I assumed
initially only one "fini" before the barrier to indicate the end of all
chunks of work... but this does not appear to be the case.
The code is not quite correct. fini should be called on each iteration and only for ordered loops. It will be fixed.
Also, the stride passed back by "next" does not seems to be used.
Yes, it is not necessary, stride is always 1.
And Clang appears to generate only normalized loop (i.e. lb = 0 and
increment = 1)...
Yes, the loops are normalized internally to support collapse if any.