Vectorizing remainder loop

Hello, I m working on a hardware with very large vector width till v2048. Now when I vectorize using llvm default vectorizer maximum 2047 iterations are scalar remainder loop. These are not vectorized by llvm which increases the cost. However these should be vectorized using next available vector width I.e v1024, v512, v256, v128, v64, v32, v16, v8, v4…

The issue of scalar remainder loop has been there in llvm but this issue is enhanced and can’t be ignored with large vector width. This is very important and significant to solve this issue.

Please help. I m trying to see loopvectorizer.cpp but unable to figure out actual code to make changes.

Hi Hameeza,

At this point Loop Vectorizer does not have capability to vectorize epilog/remainder loop.

Sometime back there is an RFC on epilog loop vectorization but it did not went through because of concerns.

This RFC has a patch as well, maybe you can give a try with it.

  • Ashutosh