There needs to be some actual motivating case to make it worth even writing the code for.
This goes back into “priority to implement” question. If there aren’t any customers, priority goes down, by a lot.
So under that paradigm - followed religiously - one would plug in any loop transformation, polyhedral or non-polyhedral etc cost models etc to morph code vectorizable
I won’t comment on other transformations. Powerful vectorizer certainly helps other optimizers make a case,
and sometimes require more optimizations to fully appreciate.
This might be a good paradigm to follow from the peak performance angle, but not so from the compile-time or code size angle.
It seems best to pursue a paradigm like this with a peak performance library rather than mainstream llvm.
This should be evaluated feature-by-feature. I fully understand that LLVM is also used as JIT compiler.
I don’t think FP induction is adding significantly more compile-time and code-size than integer induction.
So i suggest y’all start from: "Here are the cases we care about making faster, and why we care about making them faster”.
+1
This was our thinking before the paradigm shift.
The following code vectorizes for TTT being int (might need a bit of extension in SCEV) but not when TTT is float/double (unless FP induction
analysis is available). Adding 2-lines of code like this to a 1000-line loop suddenly stops vectorizing the entire loop. These are the things that
greatly irritate programmers. Resolving programmer frustration is equally important as performance. In this case, a robust vectorizer should
either 1) vectorize FP induction or 2) tell the programmer that FP induction needs to be converted to integer induction. Either way, FP induction
analysis is needed. Showing a backward dependence edge on “x” would certainly help, but not as helpful as 1) or 2). ICC Vectorizer customers
appreciate improved “loop was not vectorized” messaging as much as functional and performance enhancements of the vectorizer.
In general, investing in making vectorizer “robust” pays off very well, through performance and/or programmer satisfaction.
void foo(TTT *a, int N, TTT x, TTT y){
int i;
for (i=0;i<N;i++){
A[i] = x;
x+=y;
}
}
FYI, I have a customer asking for an extension of OpenMP linear for non-POD types (I won’t bother getting into that discussion in llvm-dev).
When vectorizer becomes stronger, more feature requests will come. J
Thanks,
Hideki