Hello, I m working on a hardware with very large vector width till v2048. Now when I vectorize using llvm default vectorizer maximum 2047 iterations are scalar remainder loop. These are not vectorized by llvm which increases the cost. However these should be vectorized using next available vector width I.e v1024, v512, v256, v128, v64, v32, v16, v8, v4…
The issue of scalar remainder loop has been there in llvm but this issue is enhanced and can’t be ignored with large vector width. This is very important and significant to solve this issue.
Please help. I m trying to see loopvectorizer.cpp but unable to figure out actual code to make changes.
It’s very important for me to solve this issue.