Hi all,
I am confused with the definition of splat mask in shuffle vector. I have searched the topics of splat in llvm-dev, and here is the most valuable one: What is “splat” in BUILD_VECTOR?. From the topic I know that splat means all elements are the same. But when I read the codes of isSplatMask in SelectionDAG.cpp, I notice that the implementation will treat <i32 undef, i32 2, i32 2, i32 undef> as splat mask. In this implement, the vector_shuffle will be combined to build_vector or lowered to target special instruction (for example, vdup in arm).
here is a simple example:
test.cl
int8 test(int r) {
int8 b;
b.s762 = r;
return b;
}
IR with O1:
define <8 x i32> @test(i32 %r) {
entry:
%splat.splatinsert = insertelement <3 x i32> undef, i32 %r, i32 0
%0 = shufflevector <3 x i32> %splat.splatinsert, <3 x i32> undef, <8 x i32> <i32 undef, i32 undef, i32 0, i32 undef, i32 undef, i32 undef, i32 0, i32 0>
ret <8 x i32> %0
}
log
Combining: t12: v9i32 = vector_shuffle<u,u,0,u,u,u,0,0,u> t20, undef:v9i32
Creating new node: t21: v9i32 = BUILD_VECTOR t4, t4, t4, t4, t4, t4, t4, t4, t4
… into: t21: v9i32 = BUILD_VECTOR t4, t4, t4, t4, t4, t4, t4, t4, t4
ASM
dup.32 q8, r1
…
According to the codes and logs, each element in b will be r. I don’t think this is what I expect.
I have tried to modify the implementation of isSplatMask, just testing Mask[i] == Mask[0], and llvm-check failed 30+ cases:
LLVM :: CodeGen/AArch64/arm64-neon-copy.ll
LLVM :: CodeGen/AArch64/arm64-vmul.ll
LLVM :: CodeGen/AArch64/dag-combine-trunc-build-vec.ll
LLVM :: CodeGen/AArch64/expand-select.ll
LLVM :: CodeGen/AArch64/mul_by_elt.ll
LLVM :: CodeGen/AArch64/neon-scalar-copy.ll
LLVM :: CodeGen/AArch64/trunc-v1i64.ll
LLVM :: CodeGen/AArch64/vecreduce-fmax-legalization-nan.ll
LLVM :: CodeGen/ARM/2009-11-02-NegativeLane.ll
LLVM :: CodeGen/ARM/vdup.ll
LLVM :: CodeGen/ARM/vzip.ll
LLVM :: CodeGen/PowerPC/qpx-bv-sint.ll
LLVM :: CodeGen/Thumb2/LowOverheadLoops/fast-fp-loops.ll
LLVM :: CodeGen/Thumb2/mve-shufflemov.ll
LLVM :: CodeGen/Thumb2/mve-vecreduce-fminmax.ll
LLVM :: CodeGen/Thumb2/mve-vecreduce-loops.ll
LLVM :: CodeGen/Thumb2/mve-vld3.ll
LLVM :: CodeGen/Thumb2/mve-vld4.ll
LLVM :: CodeGen/Thumb2/mve-vst3.ll
LLVM :: CodeGen/X86/haddsub-shuf.ll
LLVM :: CodeGen/X86/insertelement-duplicates.ll
LLVM :: CodeGen/X86/pr42905.ll
LLVM :: CodeGen/X86/pr46189.ll
LLVM :: CodeGen/X86/shuffle-of-splat-multiuses.ll
LLVM :: CodeGen/X86/split-extend-vector-inreg.ll
LLVM :: CodeGen/X86/sse3.ll
LLVM :: CodeGen/X86/trunc-subvector.ll
LLVM :: CodeGen/X86/var-permute-512.ll
LLVM :: CodeGen/X86/vector-narrow-binop.ll
LLVM :: CodeGen/X86/vector-shift-ashr-sub128.ll
LLVM :: CodeGen/X86/vector-shift-lshr-sub128.ll
LLVM :: CodeGen/X86/vector-shift-shl-sub128.ll
LLVM :: CodeGen/X86/vector-shuffle-128-v16.ll
LLVM :: CodeGen/X86/vector-shuffle-128-v4.ll
LLVM :: CodeGen/X86/vector-shuffle-combining-avx2.ll
LLVM :: CodeGen/X86/vector-shuffle-combining-avx512bwvl.ll
LLVM :: CodeGen/X86/vector-zext.ll
LLVM :: CodeGen/X86/vshift-4.ll
LLVM :: CodeGen/X86/widen_shuffle-1.ll
Obviously, this implementation becomes the de facto definition, but I don’t think it’s accurate. It’s relatively easy to fix the back-end codes and the test case, but I’m worried that the running applications that depend on this definition will be affected by my change.
So, what’s your opinion?