LoopVectorizer with ifconversion

Hi,

it seems to be generally a bad idea to enable vectorization of conditional stores on SystemZ, because it will cost extra instructions both to 1. extract compare result element 2. Do a test-under-mask instruction on that element 3. conditional branch past the store block.

Ideally, I would like to adjust the cost for the vector compare. I am not sure if this is feasable since I would need to know that the compare is becoming a vector compare for a block that will not go away after vectorization (which is the case if there is a store in it).

The LoopVectorizer seems to have this information available. Is it possible to easily use it in this case somehow?

Or is it more convenient to add a tti hook to complement the EnableCondStoresVectorization option, perhaps?

/Jonas

Hi,

it seems to be generally a bad idea to enable vectorization of conditional stores on SystemZ, because it will cost extra instructions both to 1. extract compare result element 2. Do a test-under-mask instruction on that element 3. conditional branch past the store block.

In general, this is true everywhere. In a large vectorized loop, this cost may well be worthwhile. The idea is that the cost model should account for all of these costs. If it doesn't properly, we should fix that.

  -Hal

Isn't this only worth when the SIMD instructions can be
conditionalised per lane? I believe AVX512 and SVE have such
abilities, but not NEON or SSE.

Does SystemZ support conditionalisation in SIMD?

cheers,
--renato

No, z13 has no masked vector operations. I think this is checked with the isLegalMasked...() hooks. Is this implicitly assumed in LoopVectorizers if-conversion algorithm somewhere? I see the scalar costs are adjusted with 'ScalarCost /= getReciprocalPredBlockProb()'. It seems that this should be done for all blocks that remain after vectorization?

If not doing the simple thing and disabling store with predication vectorization, I agree with Hal that this could be worth modeling as an extra cost to not reject some large loop with just one conditional store, or so.

I think this extra cost should be added to the compare, but I am not sure if it would be better to instead add it to the branch, because there are also cases of e.g. (AND (COMPARE, COMPARE)). Adding a cost to a vectorized branch instead could be done by assuming that a conditional branch would have to be set up for each branch after the vector compare.

Does the loop vectorizer know which blocks that need predication in the scalar loop will remain after vectorization? SystemZ could check such blocks by looking for stores, but that seems like extra work.

thanks,

Jonas

For loops with conditional stores, the EnableCondStoresVectorization feature allows the vectorizer to scalarize and predicate the stores in the vectorized loop. The same thing is done for conditional instructions that may divide by zero. In both cases the cost model does make an attempt to estimate the cost. It accounts for the cost of the extra inserts/extracts due to scalarization and the fact that the predicated blocks may not be executed the same number of times as the header. It currently assumes a block probably of 0.5 for predicated blocks and scales the costs of the predicated instructions accordingly.

You can take look at the AArch64/predication_costs.ll test to see some examples of how the cost is computed. This may give you some ideas as to what needs to be extended and/or modeled more accurately.

– Matt

In general, this is true everywhere. In a large vectorized loop, this cost
may well be worthwhile. The idea is that the cost model should account for
all of these costs. If it doesn't properly, we should fix that.

Isn't this only worth when the SIMD instructions can be
conditionalised per lane? I believe AVX512 and SVE have such
abilities, but not NEON or SSE.

Does SystemZ support conditionalisation in SIMD?

No, z13 has no masked vector operations. I think this is checked with the isLegalMasked...() hooks. Is this implicitly assumed in LoopVectorizers if-conversion algorithm somewhere? I see the scalar costs are adjusted with 'ScalarCost /= getReciprocalPredBlockProb()'. It seems that this should be done for all blocks that remain after vectorization?

If not doing the simple thing and disabling store with predication vectorization, I agree with Hal that this could be worth modeling as an extra cost to not reject some large loop with just one conditional store, or so.

I think this extra cost should be added to the compare, but I am not sure if it would be better to instead add it to the branch, because there are also cases of e.g. (AND (COMPARE, COMPARE)). Adding a cost to a vectorized branch instead could be done by assuming that a conditional branch would have to be set up for each branch after the vector compare.

Yes, I'd assume that you'd want to add some relative cost of a compare, extract, and a correctly-predicted branch (etc.).

Does the loop vectorizer know which blocks that need predication in the scalar loop will remain after vectorization? SystemZ could check such blocks by looking for stores, but that seems like extra work.

Yes. Legal->blockNeedsPredication (there's also Legal->isScalarWithPredication).

  -Hal

I think this extra cost should be added to the compare, but I am not sure if it would be better to instead add it to the branch, because there are also cases of e.g. (AND (COMPARE, COMPARE)). Adding a cost to a vectorized branch instead could be done by assuming that a conditional branch would have to be set up for each branch after the vector compare.

Yes, I'd assume that you'd want to add some relative cost of a compare, extract, and a correctly-predicted branch (etc.).

Does the loop vectorizer know which blocks that need predication in the scalar loop will remain after vectorization? SystemZ could check such blocks by looking for stores, but that seems like extra work.

Yes. Legal->blockNeedsPredication (there's also Legal->isScalarWithPredication).

-Hal

Thanks Hal and and Matt.

I put up my current patch based on this discussion at https://reviews.llvm.org/D31175.

/Jonas