Does loop vectorizer inquire about target's SIMD capabilities?

Nadav (or anyone who is familiar with the loop vectorizer),

Does the current loop vectorizer inquire about the SIMD capabilities of the target architecture when it decides whether it is profitable to vectorize a loop? I am asking this because I would like to have loop vectorization disabled for targets that don’t support SIMD instructions (for example, standard mips32). Loop vectorization bloats the code size and prolongs compilation time without any improvement to performance for such targets.

Hi Akira!

Does the current loop vectorizer inquire about the SIMD capabilities of the target architecture when it decides whether it is profitable to vectorize a loop?

Yes, it uses a cost model to determine the profitability of vectorization. At the moment only x86 provides the necessary hooks that are needed for calculating the costs. We may need to change the cost defaults to prevent vectorization on targets that don't implement the cost interface. If this is a problem for you then I can do it soon.

I am asking this because I would like to have loop vectorization disabled for targets that don't support SIMD instructions (for example, standard mips32).
Loop vectorization bloats the code size and prolongs compilation time without any improvement to performance for such targets.

Yes. Also, notice that the loop vectorizer tries to be more conservative when the 'optforsize' attribute is used.

Thanks,
Nadav

Isn't the vectorizer disabled by default? Or are you requesting that if the
user chooses -vectorize the compiler prints a warning (unused param since
no SIMD)?

cheers,
--renato

The loop vectorizer is now enabled by default.
!

Hi Nadav,

Hi Akira!

Does the current loop vectorizer inquire about the SIMD capabilities of the target architecture when it decides whether it is profitable to vectorize a loop?

Yes, it uses a cost model to determine the profitability of vectorization. At the moment only x86 provides the necessary hooks that are needed for calculating the costs. We may need to change the cost defaults to prevent vectorization on targets that don’t implement the cost interface. If this is a problem for you then I can do it soon.

I guess I can just implement all the vectorTargetTransformInfo::get*OpCost functions since I will later need a cost model for mips-dsp anyway.

Would the code in LoopVectorizationCostModel::expectedCost work correctly if those functions returned a large integer (max unsigned int)? I am concerned about overflow.

Hi Nadav,

Hi Akira!

Does the current loop vectorizer inquire about the SIMD capabilities of the target architecture when it decides whether it is profitable to vectorize a loop?

Yes, it uses a cost model to determine the profitability of vectorization. At the moment only x86 provides the necessary hooks that are needed for calculating the costs. We may need to change the cost defaults to prevent vectorization on targets that don’t implement the cost interface. If this is a problem for you then I can do it soon.

I guess I can just implement all the vectorTargetTransformInfo::get*OpCost functions since I will later need a cost model for mips-dsp anyway.

Would the code in LoopVectorizationCostModel::expectedCost work correctly if those functions returned a large integer (max unsigned int)?

Yes, it should just work. But I am also going to change the default to something more costly.

I am concerned about overflow.

I never thought about it. I guess that it would be a good idea to change the LoopVectorizer cost accumulator to uint64.

Thanks,
Nadav

I thought that was just a temporary arrangement to get the feel for it, not
to actually have it on all the time (next release). Is it just for -O3 or
lower too?

This can cause problems, for instance on ARMv7, the default is that NEON is
present, but Tegra2 doesn't have NEON, only VFP. It means an optimizing
compilation that used to work on it will fail, unless you specify no NEON
or no vectorization explicitly, no?

I'd prefer if -vectorization had to be passed explicitly to get it done on
production releases of LLVM or something else (-O4 or -OV, for example).

cheers,
--renato

The plan it to enable it by default for the 3.3 release. Until that point we can enable and disable it depending on the situation. It is a good idea to enable it as soon as possible in order to catch performance regressions as soon as possible. The vectorizer is currently enabled by default on O2 and O3, and it runs with reduced functionality on Os.

At the moment ARM does not have a good cost model. It relies on the default implementation which uses the TargetLoweing information to collect information about the target. In theory things should just work, because TLI should know about the vector situation.

I plan to start working on the ARM cost model early next week, and I hope that the ARM folks will help me with this.

Clang has the ‘-fvectorize’ and ‘-fno-vectorize’.

This is great news! I hope to be able to help you with this. Let me know
when you start and we can divide the work.

cheers,
--renato