Question on BlendSplat Code - LLVM Commit 72753f87f2b80d66cfd7ca7c7b6c0db6737d4b24

Hey Chandler,

I’m working on a modification of the Power LLVM backend and I have some questions about the ‘BlendSplat’ code in SelectionDAG::GetVectorShuffle(). Basically, I’m wondering if you can give a little more detail about the goal of this function? It seems like your code is increasing the chances of the mask matching the subsequent checks for an identity shuffle or all LHS/RHS, which is clearly beneficial. Are you also claiming the altered mask is easier to match, even if it’s not caught by those special cases?

I attached a tarball with .cl & .ll source for one case where the altered mask seems much more difficult to match; the shufflevector instruction in the IR is a fairly straightforward interleave of two variables, but your blend code eliminates this pattern when building the dag. Like I said, I’m targetting power here, so I want the shufflevector instructions to match vmrghb & vmrglb. I’m assuming x86 has similar instructions? Is the altered mask in the .ps file really easier to match on x86? I attached the power assembly generated for this function with & without the blendsplat code and I think its clear that, at least in the case of power, the altered mask is not preferable. Agreed? I’d like to understand the intent of your code better so I can either (a) figure out how to properly avoid modification of the mask in this case or (b) invert this modification in the power backend so we can match this to vmrg* instructions and avoid the use of vperm.

Thanks,
Tyler

blend-splat-test.tar.gz (11.4 KB)

Hi Tyler,

First, as a procedural note, we always refer to commits by their svn revision number, what is the corresponding svn revision number to 72753f87f2b80d66cfd7ca7c7b6c0db6737d4b24 on the git mirror?

Second, as I read your message, you sound skeptical about the utility of the transformation, even for x86. That seems unjustified, however, because running your test case through llc -mtriple=x86_64 -mcpu=corei7-avx generates vpunpckhbw and vpmovzxbw (which seem to be the two corresponding shuffle instructions).

That having been said, Chandler, could you please explain the general strategy that x86 uses here? We obviously might wish to emulate it in the PowerPC backend.

-Hal

Hal,

Sorry about that, svn rev = 229308.

I really can’t comment on the utility of the transformation when targetting x86 as I am not familiar with the instruction set. I am, however, skeptical of the transformation’s utility in this specific test case for general compilation. Seeing as this code is in the target independent code generator, I think whether or not this is a generally useful transformation is the question here.

To be clear, the transformation in this case is:

<0,32,2,33,4,34,6,35,8,36,10,37,12,38,14,39,16,40,18,41,20,42,22,43,24,44,26,45,28,46,30,47>

to:
<0,33,2,35,4,37,6,39,8,41,10,43,12,45,14,47,16,40,18,41,20,42,22,43,24,44,26,45,28,46,30,47>

In my test case, this instruction is preceded by a shufflevector with the following mask:

<0,undef,1,undef,2,undef,3,undef,4,undef,5,undef,6,undef,7,undef,8,undef,9,undef,10,undef,11,undef,12,undef,13,undef,14,undef,15,undef>

When the blend-splat code is ifdef’d out, the dag combiner can combine these two masks into a straightforward interleave.

Like you said, the x86 backend does still seem to do a good job recognizing the merge/interleave operation here, so I can take a look at the x86 lowering code to see how they handle this case. However, Chandler if you could still offer some insight into why the transformed mask is preferred in this case, I’d appreciate it.

Tyler

-----Hal Finkel <hfinkel@anl.gov> wrote: -----