[Vectorization] Mis match in code generated

Hi Arnold, James

Thanks for getting back on this. Arnold i will look into patches provided by you
and get back on it. Thanks for it.

Note, that they were just quickly put together to support building a a horizontal tree for your case. They definitely need testing and ironing out bugs.

Adding few more observations :

The expression tree is like

sum = (a[0] + a[2]) + (a[1] + a[3]); → which is vectorizable with current code.

If the array a if of type float/double then there is no disturbance/canonicalization
(precision issues??) in formed tree and we get a vectorized code.

You can’t reassociate floating point without enabling fast-math.

We get vectorized code but it is not ideal for float. We should be doing <4 x float> operations.

However, if the array a is of type int, then re-associate pass (runs in O1)
re-organizes this into

sum = (((a[0] + a[2]) + a[1]) + a[3]).

Right, the current code trys to build a vectorization tree at an add with the add’s operands as the root, which will not work well for a reduction like (+ ( + (+ a[0] a[1]) a[2]) a[3]),

Enabling recognizing horizontal reductions should recover this and improve performance if your vector len > 2 once we have solved the sequential load issue.

This results in loss of vectorization opportunity as we do not vectorize this type
of tree in current code stated earlier in the thread.

Also, we do not build vectorization tree on return in current code (PR20035),
which does not vectorizes code like this :

return a[0] + a[1] + a[2] + a[3];

I will try to work on vectorization on return statement. Inputs are always welcomed.

I thought we had added ReturnInst and CallInst operands as starting points of vectorization trees, … it seems not.

Thanks,
Arnold