vector optimization


Is there a pass that optimizes vector operations?
If I have for examle a sequence of shufflevector instructions
that optimizes them?
(in opencl notation e.g. a.xyzw.wzyx.xxxx -> a.wwww)


Instcombine does of this, late codegen also does some of it.



I found a "problem" in InstCombiner::visitShuffleVectorInst:

The comment says that combining two shufflevectors only takes
one of the original shuffles.

At the line
if (NewMask == LHSMask || NewMask == Mask) {

I have
NewMask = (1,1,1,1)
LHSMask = (1,8,8,8)
Mask = (0,0,0)

Why is it important not to emit a new mask?
If so, why not additionally test for splat masks (in my case (1,1,1,1))?

How does the instcompine process work? If e.g. two swizzles get
merged is the new one visited immediately again or in a new iteration
of instcombine?
What is if the instruction that is merged has more than one use?
Is it allowed to combine chains of instructions at once (i.e. more than 2)?

I'm asking these questions because it should be possible to write a generic
combiner for shufflevector, insertelement and extract element that "looks
through" all of these instructions to the values that are accessed.
For a vector with 4 elements this is always a list with 4 llvm::values with
an extractelement-index for each value. This list then can be analyzed
again to produce a simpler chain of instructions.

I have written such a thing for my virtual machine. The advantage is that
I only need one procedure to emit the right thing from the list of
value/index pairs. For example I detect if all values are the same and
then if the indices are in consecutive order (0,1,2,3) and so on.
Undefined values are NULL/-1 in the list.
But at this point I can't "look through" instructions with more than one
use since this can produce duplicate code.