I found out from here how to finally do this correctly:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20150921/301818.html
You can combine ext_vector_type and the aligned to get a load of the
right vector type with the correct alignment. With this, the IR
for vload3 looks like:
define <3 x i32> @81(i32 %offset, i32* nocapture readonly %x) #0 {
entry:
%mul = mul i32 %offset, 3
%arrayidx = getelementptr inbounds i32, i32* %x, i32 %mul
%castToVec4 = bitcast i32* %arrayidx to <4 x i32>*
%loadVec4 = load <4 x i32>, <4 x i32>* %castToVec4, align 4
%extractVec = shufflevector <4 x i32> %loadVec4, <4 x i32> %undef, <3 x i32> <i32 0, i32 1, i32 2>
ret <3 x i32> %extractVec
}
The load of <4 x i32> instead of <3 x i32> is somewhat surprising to me,
but this is much better than the previous mess from doing a load of
the first 2 components, a separate load of the 3rd and a sequence
to recombine them.
Old:
define <3 x i32> @81(i32 %offset, i32* nocapture readonly %x) #0 {
entry:
%mul = mul i32 %offset, 3
%arrayidx = getelementptr inbounds i32, i32* %x, i32 %mul
%0 = bitcast i32* %arrayidx to <2 x i32>*
%1 = load <2 x i32>, <2 x i32>* %0, align 4, !tbaa !1
%2 = extractelement <2 x i32> %1, i32 0
%3 = insertelement <3 x i32> undef, i32 %2, i32 0
%4 = extractelement <2 x i32> %1, i32 1
%5 = insertelement <3 x i32> %3, i32 %4, i32 1
%add = add i32 %mul, 2
%arrayidx3 = getelementptr inbounds i32, i32* %x, i32 %add
%6 = load i32, i32* %arrayidx3, align 4, !tbaa !6
%7 = insertelement <3 x i32> %5, i32 %6, i32 2
ret <3 x i32> %7
}
0001-Fix-vload3-vstore3-to-emit-only-one-IR-load.patch (5.54 KB)