KNL Vectorization with larger vector width

Thank You.

But I cannot find your mentioned function LoopVectorizationCostModel::computeFeasibleMaxVF(bool OptForSize, unsigned ConstTripCount). I am using LLVM 4. I have been trying to get the required code portion in LoopVectorize.cpp file. But I am unable to debug this. each time i debug it, it returns me vectorized IR in gdb.

My goal is simple when i mention my target name in opt it should vectorize by keeping the vector width= highest supported by my target which is 2048.
So $ opt -O3 -mytarget 1.ll -o 1_opt.ll

1_opt.ll should emit <2048xi32>, <1024xi32>…<32xi32> etc.

How to achieve this? Please help.

Thank You

“git log -ScomputeFeasibleMaxVF” says this was refactored in . -Eli

Thank You. I got it. Version issue.


How to put my target machine info in TTI?

Please help.

Each target has an implementation, e.g. X86TTIImpl::getRegisterBitWidth.


Thank You.
Right now to see the effect i did following changes;

unsigned X86TTIImpl::getRegisterBitWidth(bool Vector) {
if (Vector) {
if (ST->hasAVX512())
return 65536;

here i changed 512 to 65536. Then in loopvectorize.cpp i did following;

assert(MaxVectorSize <= 2048 && “Did not expect to pack so many elements”
" into one vector!");

changed 64 to 2048.

It runs fine. I can see in IR <2048xi32> or <1024xi64> emission.

But I cannot see the vector mix like in default knl if iterations=15 we see 1<8xi32> and rest scalar. so here when i keep iteration=2047 i get all scalar why is that so? similarly in polly as well i cant see vector mixes like its happening for KNL it emits , ,…so here it should emit recursively like …

how to do this?

What am i missing here?
what further changes do i need to make?

Please help…

Do i need to change following function;

unsigned X86TTIImpl::getNumberOfRegisters(bool Vector) {
if (Vector && !ST->hasSSE1())
return 0;

if (ST->is64Bit()) {
if (Vector && ST->hasAVX512())
return 32;
return 16;
return 8;


if (ST->is2048Bit()) {
if (Vector && ST->hasAVX512())
return 1024;
return 512;
return 256;

please help…


I need help here. I am able to adjust the vector width through WidestRegister value. When number of iterations=31 and I set vector width=32 it gives <16xi32> and <8xi32> instructions.

However if i replicate same behavior with number of iterations=63 and I set vector width=64, no vector instructions are emitted. it should do as previous and gives <32xi32> and <16xi32> vector instructions.

How to do this?
What adjustments are needed?

Please help

I m trying this but unable to solve.

Thank You

There currently isn't any implementation of epilog loop vectorization (see, but it never got merged).

In some cases you might get lucky with loop unrolling plus SLP vectorization.


Thank You.

I am currently seeing how LLVM treats remainder loops. For eg with 63 loop iterations i get 3 v16i32 and 15 scalars. I want to use v8 and v4 for 15 remainder instructions. How to do this?

I am seeing LoopVectorize.cpp but unable to find the code lines that deal with remainder scalar loop iterations.

Please help…

Please help.
I need to vectorize remainder loops because with large iterations and vector width remainder scalar iterations are big problem.