KNL Vectorization with larger vector width

Thank You.

But I cannot find your mentioned function LoopVectorizationCostModel::computeFeasibleMaxVF(bool OptForSize, unsigned ConstTripCount). I am using LLVM 4. I have been trying to get the required code portion in LoopVectorize.cpp file. But I am unable to debug this. each time i debug it, it returns me vectorized IR in gdb.

My goal is simple when i mention my target name in opt it should vectorize by keeping the vector width= highest supported by my target which is 2048.
So $ opt -O3 -mytarget 1.ll -o 1_opt.ll

1_opt.ll should emit <2048xi32>, <1024xi32>…<32xi32> etc.

How to achieve this? Please help.

Thank You
Regards

“git log -ScomputeFeasibleMaxVF” says this was refactored in . -Eli

Thank You. I got it. Version issue.

TTI.getRegisterBitWidth(true)

How to put my target machine info in TTI?

Please help.

Each target has an implementation, e.g. X86TTIImpl::getRegisterBitWidth.

-Eli

Thank You.
Right now to see the effect i did following changes;

unsigned X86TTIImpl::getRegisterBitWidth(bool Vector) {
if (Vector) {
if (ST->hasAVX512())
return 65536;

here i changed 512 to 65536. Then in loopvectorize.cpp i did following;

assert(MaxVectorSize <= 2048 && “Did not expect to pack so many elements”
" into one vector!");

changed 64 to 2048.

It runs fine. I can see in IR <2048xi32> or <1024xi64> emission.

But I cannot see the vector mix like in default knl if iterations=15 we see 1<8xi32> and rest scalar. so here when i keep iteration=2047 i get all scalar why is that so? similarly in polly as well i cant see vector mixes like its happening for KNL it emits , ,…so here it should emit recursively like …

how to do this?

What am i missing here?
what further changes do i need to make?

Please help…

Hello,
Do i need to change following function;

unsigned X86TTIImpl::getNumberOfRegisters(bool Vector) {
if (Vector && !ST->hasSSE1())
return 0;

if (ST->is64Bit()) {
if (Vector && ST->hasAVX512())
return 32;
return 16;
}
return 8;
}

to

if (ST->is2048Bit()) {
if (Vector && ST->hasAVX512())
return 1024;
return 512;
}
return 256;

please help…

Hello,

I need help here. I am able to adjust the vector width through WidestRegister value. When number of iterations=31 and I set vector width=32 it gives <16xi32> and <8xi32> instructions.

However if i replicate same behavior with number of iterations=63 and I set vector width=64, no vector instructions are emitted. it should do as previous and gives <32xi32> and <16xi32> vector instructions.

How to do this?
What adjustments are needed?

Please help

I m trying this but unable to solve.

Thank You

There currently isn't any implementation of epilog loop vectorization (see https://reviews.llvm.org/D30247, but it never got merged).

In some cases you might get lucky with loop unrolling plus SLP vectorization.

-Eli

Thank You.

I am currently seeing how LLVM treats remainder loops. For eg with 63 loop iterations i get 3 v16i32 and 15 scalars. I want to use v8 and v4 for 15 remainder instructions. How to do this?

I am seeing LoopVectorize.cpp but unable to find the code lines that deal with remainder scalar loop iterations.

Please help…

Please help.
I need to vectorize remainder loops because with large iterations and vector width remainder scalar iterations are big problem.