I had a few thoughts regarding our short discussion yesterday.
I am not sure how we can lower SEXT into the vpmovsx family of instructions. I propose the following strategy for the ZEXT and ANYEXT family of functions. At first, we let the Type Legalizer/VectorOpLegalizer scalarize the code. Next, we allow the dag-combiner to convert the BUILD_VECTOR node into a shuffle. This is possible because all of the inputs of the build vector come from two values(src and (undef or zero)). Finally, the shuffle lowering code lowers the new shuffle node into UNPCKLPS. This sequence should be optimal for all of the sane types.
Once we implement ZEXT and ANYEXT we could issue a INREG_SEXT instruction to support SEXT. Unfortunately, v2i64 SRA is not supported by the hardware and the code will be scalarized ...
Currently we promote vector elements to the widest possible type, until we hit the _first_ legal register type. For AVX, where YMM registers extend XMM registers, it is not clear to me why we stop at XMM sized registers. In some cases, masks of types <4 x i1> are legalized to <4 x i32> in XMM registers even if they are a result of a vector-compare of <4 x i64> types. I also had a second observation, which contradicts the first one. In many cases we 'over promote'. Consider the <2 x i32> type. Promoting the elements to <2 x i64> makes us to use types which are not supported by the instruction set. For example, not all of the shift operations are implemented for vector i64 types. Maybe a different strategy would be to promote vector elements up to i32, which is the common element type for most processors, and widen the vector from this point onwards. I am not sure how we can implement vector compare/select with this approach.