Was the SSE2 _mm_slli_si128 / _mm_srli_si128 behavior change in r272246 intentional?

Craig et al -

We recently ran into a code bug that was revealed by the way that r272246 (“[X86] Handle AVX2 pslldqi and psrldqi intrinsics shufflevector creation directly in the header file instead of in CGBuiltin.cpp. Simplify the sse2 equivalents as well.”) changed the behavior of the SSE2 _mm_slli_si128 and _mm_srli_si128 intrinsics.

Previously, these intrinsics worked correctly with all values of shift operands, and would produce a result of zero for any shift value outside of the [0-15] range.

With this change, any value of the shift operand outside of the range [0-16] – note, that’s 16, not 15 – will cause a compilation error.

Now, I can see good arguments for why limiting the range to [0-16] would be a good idea; after all, it did reveal a code bug in our case. However, this behavior change wasn’t mentioned in the change description, and I can also see good arguments for why one wouldn’t want this limit – so I figure it’s a good idea to raise the question explicitly: Was this behavior change intentional? If not, is it nonetheless the desired behavior?

  • Brooks

That was not intentional at all. I foolishly thought that sending everything about 15 to the true leg of this ternary operator would prevent needing a mask on each of the immediate uses in the shufflevector. I’ll fix it right away. AVX2 and AVX512 are similarly broken.

((char)(imm)&0xF0) ? _mm_setzero_si128() :
(__m128i)__builtin_shufflevector(
(__v16qi)_mm_setzero_si128(),
(__v16qi)(__m128i)(a),
16 - (char)(imm),
17 - (char)(imm),
18 - (char)(imm),
19 - (char)(imm),
20 - (char)(imm),
21 - (char)(imm),
22 - (char)(imm),
23 - (char)(imm),
24 - (char)(imm),
25 - (char)(imm),
26 - (char)(imm),
27 - (char)(imm),
28 - (char)(imm),
29 - (char)(imm),
30 - (char)(imm),
31 - (char)(imm)); })