GVN vs. LoopVectorizer

Hi -

I'm writing to start discussion about which approach is better for
solving an undesirable interaction between GVN and the loop

A while ago I posted this bug:


I ran into this bug when updating our front-end (for the Chapel
compiler) to emit !noalias metadata in certain situations. I was very
surprised when I discovered that the !noalias metadata actually
*reduced* performance for certain programs!

The reason for that is that GVN does a PRE of a load and this prevents
the loop vectorizer from vectorizing that loop. The linked bug report
includes a complete example.

I can imagine two strategies to resolve this problem:
1. Make GVN's load-PRE optimization avoid cases that interfere with
2. Teach the LoopVectorizer to undo such a PRE so that the loop can
be vectorized

Of course there might be other good approaches. What do you think -
how we should solve this problem?



The pass pipeline could be reorganized with disabled LoadPRE before
LoopVectorize, enabled only after it (Or extract LoadPRE into a
separate pass). GVN itself cannot know whether LoopVectorize will
vectorize, since it depends on profitability heuristics and there are
other passes between GVN and LoopVectorize. NewGVN does not do PRE at
the moment (as far as I recall).

We had the same problem with LoadPRE (and LICM) in Polly. Our solution
is DeLICM [1], which tries to undo LoadPRE on Polly's internal
representation. Its advantage that such loop-carried dependencies are
not necessarily the result of LoadPRE, but can be present in the
source, thus this works on both.

Ayal's shuffle solution that converts this
scalar-loop-carried-dependency into vector-loop-carried-dependency
might be superior when applicable (it should be, since GVN would have
done LoadPRE if there are no conflicting dependencies), since it has
fewer memory accesses in the loop.


[1] https://doi.acm.org/10.1145/3168815