Vector trunc code generation difference between llvm-3.9 and 4.0

Correction in the C snippet:

typedef signed short v8i16_t attribute((ext_vector_type(8)));

v8i16_t foo (v8i16_t a, int n)
{
return a >> n;

}

I think this is caused by a front-end change (cc’ing clang-dev) because the IR with “-Xclang -disable-llvm-optzns” shows the difference.

But independently of that, there’s a missing IR canonicalization - instcombine doesn’t currently do anything with either version.

And the version where we trunc later survives through the backend and produces worse code even for x86 with AVX2:
before:
vmovd %edi, %xmm1
vpmovzxwq %xmm1, %xmm1
vpsraw %xmm1, %xmm0, %xmm0
retq

after:
vmovd %edi, %xmm1
vpbroadcastd %xmm1, %ymm1
vmovdqa LCPI1_0(%rip), %ymm2
vpshufb %ymm2, %ymm1, %ymm1
vpermq $232, %ymm1, %ymm1
vpmovzxwd %xmm1, %ymm1
vpmovsxwd %xmm0, %ymm0
vpsravd %ymm1, %ymm0, %ymm0
vpshufb %ymm2, %ymm0, %ymm0
vpermq $232, %ymm0, %ymm0
vzeroupper

So this example may have won the bug lottery by exposing all of front-, middle-, back-end bugs. :slight_smile:

Thanks Sanjay. Interestingly for me, disable-llvm-optmzns did not make a difference in the way the shift was handled. Does the initial IR generated for you show this difference when the option is passed?

Best regards
Saurabh

Yes, there is an IR difference between clang 3.9.1 and clang trunk before any IR transforms are done:
https://godbolt.org/g/FuBqIb

We can’t solve this problem (moving a trunc ahead of other vector ops) in general in IR because we take a conservative approach to vector transforms in IR. That means the burden for solving the general problem falls on the front-end or the back-end. If you can bisect to find the clang commit where this changed, that would be very helpful.

However, I think we can handle a very specific case (a too fat splat) in IR in instcombine, and it will resolve this exact example. This will take a couple of patches to restore your example. Here’s a proposal for the first one:
https://reviews.llvm.org/D30123

The regression for the reported case should be avoided after:
https://reviews.llvm.org/rL297232
https://reviews.llvm.org/rL297242
https://reviews.llvm.org/rL297280

It would still be good to understand if the clang change was intentional or if that was a side effect that can be limited.

There were several patches (r278501 was the first) that fixed vector shift bugs. I don’t think the IR changes were intentional.

I’m not sure if it’s the right solution, but inserting an integral cast before the CK_VectorSplat cast in checkVectorShift makes IRGen emit the trunc before the splat.

Thanks, Akira.

I don’t know enough about vectors in the front-end to be much use here. cc’ing authors/reviewers of some of the patches that might be related:
https://reviews.llvm.org/rL284579
https://reviews.llvm.org/rL281669
https://reviews.llvm.org/rL278501