LoopVectorize module - some possible enhancements

Hello, Michael,
     I'd like to ask if we can enhance the LoopVectorize LLVM module (I am currently using a version from Jul 2016).

     More exactly:
     - do you envision to support in the near future LLVM IR gather and scatter intrinsics (as described at http://llvm.org/docs/LangRef.html#llvm-masked-gather-intrinsics and scatter)?
       I see you have defined some methods that should use them like:
         - bool isLegalMaskedGather(Type *DataType);
         - void InnerLoopVectorizer::vectorizeMemoryInstruction(Instruction *Instr) which defines a bool CreateGatherScatter, etc
       I gave to clang a simple vector add C program with step/stride 2 with flag "-avx2", but the resulting vector code does NOT use gather nor scatter.

     - did you try to consider pathological cases of loops such as:
       for (int i = 0; i < N; i += 2) {
           C[i/2] = A[i/2] + B[i/2];
      which does NOT get vectorized with my version of LoopVectorize, although it's simple to reason it's trivial to vectorize.

     One more question: how can I obtain an expression for the bounds and the step of the original loop? For example, when I print the ScalarEvolution object in LoopVectorizationLegality::isConsecutivePtr(), I can get the value reported in "Exits" for the indvars.iv instruction which is the upper bound for the loop getting vectorized.

   Thank you,

Hi Alex,

Intel was doing the scatter/gather support for AVX, so I'm copying
Elena who should know more about this.


Hi Alex,

About the gather/scatter intrinsics mechanism: it is already supported (they are part of LLVM-IR), but each target has to decide whether to allow using them for auto-vectorization.
On X86, avx2 gathers are not enabled for vectorization due to cost considerations. With avx512, Gathers and Scatters are enabled for vectorization (so if you use -march=skx or -mavx512bw there's a chance you'll see them used).