Ok, we were missing this specific case because of some instcombine xforms that were only applying to scalars, not vectors. I tweaked them to cover vectors and we're getting "perfect" code for this now (one cmpordps).
However, not all is sunshine and roses, there are some sad puppydog faces left. Specifically, things like this still get scalarized:
#include <emmintrin.h>
__m128i a(__m128 a, __m128 b, __m128 c) { return a==b & c==b; }
The problem is that the IR going into Codegen has been (nicely) simplified to:
define <2 x i64> @a(<4 x float> %a, <4 x float> %b, <4 x float> %c) nounwind readnone {
entry:
%cmp = fcmp oeq <4 x float> %a, %b ; <<4 x i1>> [#uses=1]
%cmp4 = fcmp oeq <4 x float> %c, %b ; <<4 x i1>> [#uses=1]
%and6 = and <4 x i1> %cmp, %cmp4 ; <<4 x i1>> [#uses=1]
%and = sext <4 x i1> %and6 to <4 x i32> ; <<4 x i32>> [#uses=1]
%conv = bitcast <4 x i32> %and to <2 x i64> ; <<2 x i64>> [#uses=1]
ret <2 x i64> %conv
}
When legalize types sees the sext from <4 x i1> -> <4 x i32>, its only solution right now is to scalarize the whole mess feeding into it, giving us really atrocious code.
IMO, the solution to this is to have a legalize-types action for vectors that corresponds to "promote" on scalars. In this case, since X86 supports VSETCC, the 4 x i1 SETCC should "vector promote" to a VSETCC node with a 4xi32 result, the and should vector promote to 4xi32, and the sext should vector promote as a vector sext_inreg.
I don't think that implementing this is particularly hard, but I have plenty of other things I'm working on right now. Is anyone else interested in working on this?
-Chris