As 32-bit mode doesn't have 64-bit GPR, the sequence converting v2i32 to
v2f32 is quite inefficient in 32-bit mode. This patch adds the custom
lowering in 32-bit mode. In addition, it teaches DAG combine to
transform (build_vec (Xint2fp x) (Xint2fp y) ..) to (Xint2fp (build_vec
x y)) to reduce the strength on FP conversion unit.
Thanks for your review