Yes. That was the first thing i tried, and it didn't do anything. I was looking the vectorizer, but then I saw some things that made me wonder if it was even supposed to do this
I suspect that in the IR you will see a sequence of inserts. At the moment the SLP-vectorizer does not look at “insert” sequences. But it should be really easy (and beneficial) to.
This code maintains a vector of float4 and it inserts and extracts values from this vector. The ’select’ operations are already vectorized. Maybe a sequence of inst-combines (or DAG-combines) can help. If you re-write this code using scalars then the slp-vectorizer, with some tweaks, will be able to catch it.
Nadav,
I think what matt was looking for is why the slp-vectorizer is not vectorizing the booleans? To me it seems like the vectorizer got the first step right(vectorizing the operands), but not the second step(vectorizing the comparison operation). I actually would expect a single icmp ne <4 x i32> %c, <4 x i32><i32 0, i32 0, i32 0, i32 0> instead of 4 icmp's.
I've tried manually scalarizing the arguments so the other select arguments are scalars, but the vectorizer still doesn't change it. Here is the scalarized IR.