Quick question to see if I haven’t missed anything: I would like convert counting down loops, i.e. loops with a constant -1 step value, to counting up loops, because the vectoriser is able to better deal with these loops (see e.g. D76838 that I was discussing today with Ayal). It looks like LoopSimplifyCFG and IndVarSimplify don’t do this. So was just curious if I haven’t missed anything here or in another pass I haven’t yet considered. I was perhaps also expecting this to be the canonical form of loops, but couldn’t find any evidence of that in  or in source-code.
The obvious follow-up question is if there would be any objections to adding this to e.g. LoopSimplifyCFG, and adding LoopSimplifyCFG to the optimisation pipeline just before the vectoriser.
LLVM used to rewrite the induction variable in indvars, but we stopped doing it a long time ago. If I recall correctly, it made the generated code worse in some cases, without any clear benefit.
Thanks for commenting. From your reply I understand it would probably be better not to add more logic to indvars. I see that this has the advantage of this pass not making any changes where this is possibly undesired. For the same reason, LoopSimplify, possibly another candidate for this, would not be a good idea. Thus a more focused approach is a new pass run just before the vectoriser, or very similar a helper added to LoopUtils invoked from the vectoriser, to bring loops in a canonical form.
The topic came up before, e.g. ⚙ D60565 [LOOPINFO] Extend Loop object to add utilities to get the loop bounds, step, and loop induction variable.
Some canonicalization passes are designed for this. In particular, IndVarSimplify used to make canonical loops (i.e. start at zero, increment by one). r133502 introduced -disable-iv-rewrite to rely more on ScalarEvolution instead of "opcode/pattern matching" (cite from the commit message). -enable-iv-rewrite=false was made the default in r139579 after finding that it slows down many benchmarks. It was completely removed in r153260.
The general approach in LLVM is to rely on SCEV for analyzing loops
instead of custom handling. As a consequence, any loop structure that
is recognized by SCEV will (/should) not profit from rewriting.
Thanks for the info and the pointer Michael, that’s very useful!
Interesting, thanks for digging this up!
As a consequence, any loop structure that is recognized
by SCEV will (/should) not profit from rewriting.
As discussed in ⚙ D68577 [LV] Apply sink-after & interleave-groups as VPlan transformations (NFC) and PR40816 showed, there is still merit and profit in further simplifying loop induction variables, or at-least the primary one; somewhat independent of continuing to rely on SCEV for analyzing loops.
enable-iv-rewrite=false was made the default in r139579 after finding that it
slows down many benchmarks.
This was 8.5 years ago. Time to revisit and try to re-enable some of these iv-rewrites, with a better understanding why current downstream passes pessimize canonical iv's, if they still do?
SCEV is the de-facto right approach for induction variable analysis.
Any pass not using it will be sensitive to irrelevant variations in
patterns. Indeed, this decision has been made years ago, like most
The alternative is to canonicalize induction variables as
IndVarSimplify used to do. If you would like to change it, feel free
to send an RFC to the llvm-dev list. Justifying this change is
proposer's obligation, including measurements of compile time and
test-suite performance changes.
When I looked at it last time, the reason was that the canonicalized
induction variables introduces yet another induction variable. Typical
example is a loop iterating over a buffer:
for (char *p = start; p < end; ++p)
Canonicalization yields something like:
for (size_t i = 0; i < (end-start); ++i)
That is, an new register for i to determine the number of iterations
that otherwise could also be done using the pointer.
@reames was mentioning that LoopStrengthReduce is supposed to undo
this again, but seems to not always be successful. I could imagine one
reason is that p would overflow earlier than i.
Thanks Ayal and Michael for sharing these further thoughts.
Slightly moving the discussion from D76838 to here just to keep it in one place.
What I want to achieve in D76838 is to rewrite this:
for (int i=N; i>0; i–)
for (int i=0; i<N; i++)
because this enables more tricks in the vectoriser. This trick can also be taught to work on counting down loops of course, but we prefer not to do this and rather have the more canonical counting up form and now the question is where this rewrite should live.
Please note that it is appreciated that SCEV is used for all the heavy lifting, and if you look at the implementation in D76838 you’ll see that the implementation is very minimal and self-contained. I am reluctant to go for indvar simplify because of the impact it may have as also raised here on the dev list, and as this rewrite is to further enable the vectoriser, I would think a helper in LoopInfo used in the vectoriser is a win-win.
In D76838 I have put the rewrite to a counting up loop as late as possible, but at that point vectorisation may still fail. Then the result might be that the loop has been rewritten, but no vectorisation has happened. That might be a surprise, but at the same time I don’t see the problem with that. And of course when vectorisation happens, you will never notice this rewrite.
Please let me know what you think.