[VPlan] about vectorization factor selection

Shixiong_Xu · June 1, 2018, 9:14pm

Hi,

Current loop vectorizer uses a range of vectorization factors computed by MaxVF. For each VF, it setups unform and scalar info before building VPlan and the final best VF selection. The best VF is also selected within the VF range.

for (unsigned VF = 1; VF <= MaxVF; VF *= 2) {

// Collect Uniform and Scalar instructions after vectorization with VF.

CM.collectUniformsAndScalars(VF);

// Collect the instructions (and their associated costs) that will be more

// profitable to scalarize.

if (VF > 1)

CM.collectInstsToScalarize(VF);

}

It looks when force vectorization is not given, it is not necessary to setup uniform and scalar info for every VF. For a VF, we can do a check before collectUniformsAndScalars() and collectInstsToScalarize() to see if the types used in the code can actually yield any vector types(after type legalization) or not. If not, there is no point for this VF to participate in VPlan and VF selection. As the (scalar) types can be collected once for all VFs, I guess it is cheap enough. As both collectUniformsAndScalars() and collectInstsToScalarize() don’t look cheap, doing such check can speed up vectorization, in particular, for large MaxVFs.

Another minor thing is when force vectorization is enabled and MaxVF > 1, expected cost of VF=2 is computed twice at the moment.

bool ForceVectorization = Hints->getForce() == LoopVectorizeHints::FK_Enabled;

// Ignore scalar width, because the user explicitly wants vectorization.

if (ForceVectorization && MaxVF > 1) {

Width = 2;

Cost = expectedCost(Width).first / (float)Width;

}

for (unsigned i = 2; i <= MaxVF; i *= 2) {

// Notice that the vector loop needs to be executed less times, so

// we need to divide the cost of the vector loops by the width of

// the vector elements.

VectorizationCostTy C = expectedCost(i);

float VectorCost = C.first / (float)i;

Cheers,

Shixiong (Jason) Xu

Caballero_Diego · June 5, 2018, 4:45pm

Hi Xu,

Thanks for pointing this out and sorry for the delayed response.

It looks when force vectorization is not given, it is not necessary to setup uniform and scalar info for every VF. For a VF, we can do a check before collectUniformsAndScalars() and collectInstsToScalarize() to see if the types used in the code can actually yield any vector types(after type legalization) or not. If not, there is no point for this VF to participate in VPlan and VF selection. As the (scalar) types can be collected once for all VFs, I guess it is cheap enough. As both collectUniformsAndScalars() and collectInstsToScalarize() don’t look cheap, doing such check can speed up vectorization, in particular, for large MaxVFs.

I’m not sure I understand your proposal. Please, note that uniform values may vary from VF to VF. For example, a branch condition can be uniform (and be kept scalar) for VF=4 but can be divergent for VF=8. For that reason we have to compute this information for every potential VF since it can be different.

Another minor thing is when force vectorization is enabled and MaxVF > 1, expected cost of VF=2 is computed twice at the moment.

Agreed. It would be great if you can submit a patch for that or open a bug. I guess that we just need to introduce a MinVF that is set to 4 when vectorization is forced.

Thanks!

Shixiong_Xu · June 5, 2018, 5:15pm

Hi Diego,

Thanks for your reply.

Shixiong

Topic		Replies	Views
Vectorization plan (Vplan) Loop Optimizations	15	876	July 4, 2023
[RFC] Allow loop vectorizer to choose vector widths that generate illegal types LLVM Dev List Archives	10	91	June 22, 2016
[LoopVectorizer] getScalarizationOverhead() LLVM Dev List Archives	0	69	September 4, 2018
On vectorization under RISC-V and its existing interface to control scalable vectorization width - vectorize_width(VF, scalable) RISCV	13	1726	May 25, 2023
LoopVectorizer: shufflevectors LLVM Dev List Archives	1	74	September 5, 2018

[VPlan] about vectorization factor selection

Related Topics