Hi Dale, I think Bob is right: the type legalizer shouldn't be turning v16i8
into v16i32, what should happen is that the return type of the BUILD_VECTOR
continues to be v16i8, but the type of the operands changes to i32, so you
end up with a BUILD_VECTOR that takes 16 lots of i32, and produces a v16i8.
It does that.
The target then has all the info it needs to produce the best code, but needs
to be careful not the use the operand type (i32) when it really wants the vector
element type (i8).
I don't think it's target dependent. This is also broken on Neon; the breakage is introduced when lowering a BUILD_VECTOR to a load from ConstantPool, and the call that builds the ConstantPool does not currently pass enough information to DTRT, it just passes a vector of i32's. Try the following with -march=arm -mattr=+neon . (It is possible that there's no way to get the FE to generate this on Neon, however.)
; ModuleID = 'small.c'
target datalayout = "E-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f128:64:128-n32"
@baz = common global <16 x i8> zeroinitializer ; <<16 x i8>*> [#uses=1]
define void @foo(<16 x i8> %x) nounwind ssp {
entry:
%x_addr = alloca <16 x i8> ; <<16 x i8>*> [#uses=2]
%temp = alloca <16 x i8> ; <<16 x i8>*> [#uses=2]
%"alloca point" = bitcast i32 0 to i32 ; <i32> [#uses=0]
store <16 x i8> %x, <16 x i8>* %x_addr
store <16 x i8> <i8 22, i8 21, i8 20, i8 3, i8 25, i8 24, i8 23, i8 3, i8 28, i8 27, i8 26, i8 3, i8 31, i8 30, i8 29, i8 3>, <16 x i8>* %temp, align 16
%0 = load <16 x i8>* %x_addr, align 16 ; <<16 x i8>> [#uses=1]
%1 = load <16 x i8>* %temp, align 16 ; <<16 x i8>> [#uses=1]
%tmp = add <16 x i8> %0, %1 ; <<16 x i8>> [#uses=1]
store <16 x i8> %tmp, <16 x i8>* @baz, align 16
br label %return
return: ; preds = %entry
ret void
}
To make things more concrete here is the patch I was trying out:
Index: lib/CodeGen/SelectionDAG/LegalizeDAG.cpp