Vectors of length 3 as first-class types

I'm trying the following code on X86:

define <3 x i32> @retThree( ) {
  ret <3 x i32> <i32 1, i32 2, i32 3 >

expecting it to load the three first lanes of %xmm0. (If returning a vector of four, %xmm0 is used). But the generated assembly seems to be using the method of return by hidden pointer. This despite that the generated assembly seems to have allocated the vector with padding preparing for this:

.LCPI1_0: # constant pool <4 x i32>
  .long 1 # 0x1
  .long 2 # 0x2
  .long 3 # 0x3
  .zero 4

Should this data not be loaded as is into %xmm0 like in the case of vector of four?:

retFour: # @retFour
# BB#0:
  movaps .LCPI1_0, %xmm0

This would of course leave the responsibility of ignoring the 4th lane to the caller.

Debugging the code generation, I notice that the v3i32 is widened to v4i32, but when X86TargetLowering::CanLowerReturn, the v3i32 is seems to be split up into three MVT::i32s. If trying with a function that returns a vector of 2 or 4, CanLowerReturn-function gets a MVT::v2i32 or MVT::v4i32, respectively and return by pointer is not used.

Where is the v3i32, widened to v4i32, split up into (three?) separate i32s?

And BTW, I see similar behaviour on the SPU back end.


The problem is in TargetLowering::getVectorTypeBreakdown and related
code, which pre-dates vector widening was implemented, and which
isn't yet aware of widening. Ideally, it should follow the same sequence
that the regular Legalize code uses, including widening.