I’m working on a modification of the Power LLVM backend and I have some questions about the ‘BlendSplat’ code in SelectionDAG::GetVectorShuffle(). Basically, I’m wondering if you can give a little more detail about the goal of this function? It seems like your code is increasing the chances of the mask matching the subsequent checks for an identity shuffle or all LHS/RHS, which is clearly beneficial. Are you also claiming the altered mask is easier to match, even if it’s not caught by those special cases?
I attached a tarball with .cl & .ll source for one case where the altered mask seems much more difficult to match; the shufflevector instruction in the IR is a fairly straightforward interleave of two variables, but your blend code eliminates this pattern when building the dag. Like I said, I’m targetting power here, so I want the shufflevector instructions to match vmrghb & vmrglb. I’m assuming x86 has similar instructions? Is the altered mask in the .ps file really easier to match on x86? I attached the power assembly generated for this function with & without the blendsplat code and I think its clear that, at least in the case of power, the altered mask is not preferable. Agreed? I’d like to understand the intent of your code better so I can either (a) figure out how to properly avoid modification of the mask in this case or (b) invert this modification in the power backend so we can match this to vmrg* instructions and avoid the use of vperm.
blend-splat-test.tar.gz (11.4 KB)