Instruction selection problems due to SelectionDAGBuilder

Hello.
     I'm having problems at instruction selection with my back end with the following basic-block due to a vector add with immediate constant vector (obtained by vectorizing a simple C program doing vector sum map):
     vector.ph: ; preds = %vector.memcheck50
       %.splatinsert = insertelement <8 x i64> undef, i64 %i.07.unr, i32 0
       %.splat = shufflevector <8 x i64> %.splatinsert, <8 x i64> undef, <8 x i32> zeroinitializer
       %induction = add <8 x i64> %.splat, <i64 0, i64 1, i64 2, i64 3, i64 4, i64 5, i64 6, i64 7>
       %.splatinsert56 = insertelement <8 x i64> undef, i64 %xtraiter, i32 0
       %.splat57 = shufflevector <8 x i64> %.splatinsert56, <8 x i64> undef, <8 x i32> zeroinitializer
       %induction58 = add <8 x i64> %.splat57, <i64 0, i64 -1, i64 -2, i64 -3, i64 -4, i64 -5, i64 -6, i64 -7>
       br label %vector.body25

     The exact problem reported is:
         Selecting: t51: v8i64,ch = load<LD64[ConstantPool]> t0, ConstantPool:i64<<8 x i64> <i64 0, i64 1, i64 2, i64 3, i64 4, i64 5, i64 6, i64 7>> 0, undef:i64
         ISEL: Starting pattern match on root node: t51: v8i64,ch = load<LD64[ConstantPool]> t0, ConstantPool:i64<<8 x i64> <i64 0, i64 1, i64 2, i64 3, i64 4, i64 5, i64 6, i64 7>> 0, undef:i64
           Initial Opcode index to 268
           Match failed at index 277
           Continuing at 396
           Match failed at index 398
           Continuing at 422
         LLVM ERROR: Cannot select: t51: v8i64,ch = load<LD64[ConstantPool]> t0, ConstantPool:i64<<8 x i64> <i64 0, i64 1, i64 2, i64 3, i64 4, i64 5, i64 6, i64 7>> 0, undef:i64
           t50: i64 = ConstantPool<<8 x i64> <i64 0, i64 1, i64 2, i64 3, i64 4, i64 5, i64 6, i64 7>> 0
           t48: i64 = undef
         In function: foo

     The reason is that for the basic-block my back end generates the following Selection DAG:
         (From 201_LoopVectorize/25_GOOD_map/NEW/6/1/NEW/STDerr3_wo_getSetCCResultType)
         Initial selection DAG: BB#15 'foo:vector.ph'
         SelectionDAG has 41 nodes:
           t0: ch = EntryToken
           t4: i32 = Constant<0>
                       t3: i64,ch = CopyFromReg t0, Register:i64 %vreg12
                     t6: v8i64 = insert_vector_elt undef:v8i64, t3, Constant:i64<0>
                   t7: v8i64 = vector_shuffle<0,0,0,0,0,0,0,0> t6, undef:v8i64
                   t15: v8i64 = BUILD_VECTOR Constant:i64<0>, Constant:i64<1>, Constant:i64<2>, Constant:i64<3>, Constant:i64<4>, Constant:i64<5>, Constant:i64<6>, Constant:i64<7>
                 t16: v8i64 = add t7, t15
               t18: ch = CopyToReg t0, Register:v8i64 %vreg16, t16
                         t20: i64,ch = CopyFromReg t0, Register:i64 %vreg5
                       t22: i64 = AssertSext t20, ValueType:ch:i8
                     t23: v8i64 = insert_vector_elt undef:v8i64, t22, Constant:i64<0>
                   t24: v8i64 = vector_shuffle<0,0,0,0,0,0,0,0> t23, undef:v8i64
                   t32: v8i64 = BUILD_VECTOR Constant:i64<0>, Constant:i64<-1>, Constant:i64<-2>, Constant:i64<-3>, Constant:i64<-4>, Constant:i64<-5>, Constant:i64<-6>, Constant:i64<-7>
                 t33: v8i64 = add t24, t32
               t35: ch = CopyToReg t0, Register:v8i64 %vreg17, t33
               t37: ch = CopyToReg t0, Register:i64 %vreg117, Constant:i64<0>
             t39: ch = TokenFactor t18, t35, t37
           t40: ch = br t39, BasicBlock:ch<vector.body25 0x1d07660>

     However, when using the mips64 back end (subtarget) we get this correct selection DAG:
         (From 201_LoopVectorize/25_GOOD_map/NEW/6/1/NEW/Mips64/STDerr_llc_mips64)
         Initial selection DAG: BB#15 'foo:vector.ph'
         SelectionDAG has 87 nodes:
           t0: ch = EntryToken
           t4: i32 = Constant<0>
                 t3: i64,ch = CopyFromReg t0, Register:i64 %vreg12
               t6: v8i64 = insert_vector_elt undef:v8i64, t3, Constant:i64<0>
             t7: v8i64 = vector_shuffle<0,0,0,0,0,0,0,0> t6, undef:v8i64
             t15: v8i64 = BUILD_VECTOR Constant:i64<0>, Constant:i64<1>, Constant:i64<2>, Constant:i64<3>, Constant:i64<4>, Constant:i64<5>, Constant:i64<6>, Constant:i64<7>
           t16: v8i64 = add t7, t15
                   t43: i64,ch = CopyFromReg t0, Register:i64 %vreg5
                 t45: i64 = AssertSext t43, ValueType:ch:i8
               t46: v8i64 = insert_vector_elt undef:v8i64, t45, Constant:i64<0>
             t47: v8i64 = vector_shuffle<0,0,0,0,0,0,0,0> t46, undef:v8i64
             t55: v8i64 = BUILD_VECTOR Constant:i64<0>, Constant:i64<-1>, Constant:i64<-2>, Constant:i64<-3>, Constant:i64<-4>, Constant:i64<-5>, Constant:i64<-6>, Constant:i64<-7>
           t56: v8i64 = add t47, t55
                   t17: i64 = extract_vector_elt t16, Constant:i64<0>
                 t26: ch = CopyToReg t0, Register:i64 %vreg16, t17
                   t18: i64 = extract_vector_elt t16, Constant:i64<1>
                 t28: ch = CopyToReg t0, Register:i64 %vreg17, t18
                   t19: i64 = extract_vector_elt t16, Constant:i64<2>
                 t30: ch = CopyToReg t0, Register:i64 %vreg18, t19
                   t20: i64 = extract_vector_elt t16, Constant:i64<3>
                 t32: ch = CopyToReg t0, Register:i64 %vreg19, t20
                   t21: i64 = extract_vector_elt t16, Constant:i64<4>
                 t34: ch = CopyToReg t0, Register:i64 %vreg20, t21
                   t22: i64 = extract_vector_elt t16, Constant:i64<5>
                 t36: ch = CopyToReg t0, Register:i64 %vreg21, t22
                   t23: i64 = extract_vector_elt t16, Constant:i64<6>
                 t38: ch = CopyToReg t0, Register:i64 %vreg22, t23
                   t24: i64 = extract_vector_elt t16, Constant:i64<7>
                 t40: ch = CopyToReg t0, Register:i64 %vreg23, t24
               t41: ch = TokenFactor t26, t28, t30, t32, t34, t36, t38, t40
                   t57: i64 = extract_vector_elt t56, Constant:i64<0>
                 t66: ch = CopyToReg t0, Register:i64 %vreg24, t57
                   t58: i64 = extract_vector_elt t56, Constant:i64<1>
                 t68: ch = CopyToReg t0, Register:i64 %vreg25, t58
                   t59: i64 = extract_vector_elt t56, Constant:i64<2>
                 t70: ch = CopyToReg t0, Register:i64 %vreg26, t59
                   t60: i64 = extract_vector_elt t56, Constant:i64<3>
                 t72: ch = CopyToReg t0, Register:i64 %vreg27, t60
                   t61: i64 = extract_vector_elt t56, Constant:i64<4>
                 t74: ch = CopyToReg t0, Register:i64 %vreg28, t61
                   t62: i64 = extract_vector_elt t56, Constant:i64<5>
                 t76: ch = CopyToReg t0, Register:i64 %vreg29, t62
                   t63: i64 = extract_vector_elt t56, Constant:i64<6>
                 t78: ch = CopyToReg t0, Register:i64 %vreg30, t63
                   t64: i64 = extract_vector_elt t56, Constant:i64<7>
                 t80: ch = CopyToReg t0, Register:i64 %vreg31, t64
               t81: ch = TokenFactor t66, t68, t70, t72, t74, t76, t78, t80
               t83: ch = CopyToReg t0, Register:i64 %vreg209, Constant:i64<0>
             t85: ch = TokenFactor t41, t81, t83
           t86: ch = br t85, BasicBlock:ch<vector.body25 0x1bd35f0>

     I am curious what is wrong - I've tried to match the Mips' back end: I have put most of the vector splat instructions and the vextract and INSERT_D_DESC instruction, etc .
     I also don't get enough DEBUG information to understand where exactly the problem comes from (probably I missed some TableGen record).

     Please let me know if you have any idea.

   Thank you very much,
     Alex

I’m not an expert on this at all and there isn’t enough information shown to see how the “Good” back end performs the BUILD_VECTOR operation with a constant vector, but it is clear that your back end does that with a Constant Pool load. Furthermore, your back end probably does not specify a matcher in the target description file for the respective load.

As far as debugging is concerned - you can find exactly where the matching fails by opening $LLVM_BUILD/lib//GenDAGISel.inc and finding the indices listed above (268, 277, etc.).

So I think that if you don’t want BUILD_VECTOR for MVT::v8i64 with constant elements to be legalized as a constant pool load, you should not have the following line in your TargetLowering instance:
setOperationAction(ISD::BUILD_VECTOR, MVT::v8i64, Expand)

At least I think that is a rough description of some of the issues causing this.

N

Hi Alex,

However, when using the mips64 back end (subtarget) we get this correct selection DAG:

t55: v8i64 = BUILD_VECTOR Constant:i64<0>, Constant:i64<-1>, Constant:i64<-2>, Constant:i64<-3>, Constant:i64<-4>, Constant:i64<-5>, Constant:i64<-6>, Constant:i64<-7>
t56: v8i64 = add t47, t55

v8i64 isn’t a legal type on MIPS64 with MSA so I think you must be looking at the SelectionDAG before type legalization. This can be very different from the SelectionDAG used for instruction selection which may explain the confusion. You can see the DAG that the instruction selector sees using –view-isel-dags.

Eli Bendersky has a good high-level overview of the code generator at http://eli.thegreenplace.net/2012/11/24/life-of-an-instruction-in-llvm but the relevant bit can be roughly summarized as:

· LLVM-IR is converted to an equivalent SelectionDAG which will almost certainly contain types and operations the target won’t be able to handle

· The legalizer hacks away at the SelectionDAG until it fits the target

· The ‘legal’ SelectionDAG nodes are replaced with instructions for the target.

So for your example on Mips, we start with the LLVM-IR:

%0 = add <8 x i64> %a, <i64 0, i64 -1, i64 -2, i64 -3, i64 -4, i64 -5, i64 -6, i64 -7>

This is converted to a SelectionDAG that looks something like:

t1: v8i64 = BUILD_VECTOR Constant:i64<0>, Constant:i64<-1>, Constant:i64<-2>, Constant:i64<-3>, Constant:i64<-4>, Constant:i64<-5>, Constant:i64<-6>, Constant:i64<-7>

t2: v8i64 = add t0, t2
This SelectionDAG contains illegal vector types (they have too many elements for our target) so the vectors are split:

t11: v4i64 = BUILD_VECTOR Constant:i64<0>, Constant:i64<-1>, Constant:i64<-2>, Constant:i64<-3>

t20: v4i64 = add t10, t11

t12: v4i64 = BUILD_VECTOR Constant:i64<-4>, Constant:i64<-5>, Constant:i64<-6>, Constant:i64<-7>

t21: v4i64 = add t10, t12

which still has illegal vector types so it splits them again to get something like:

t31: v2i64 = BUILD_VECTOR Constant:i64<0>, Constant:i64<-1>

t53: v2i64 = add t44, t31

t32: v2i64 = BUILD_VECTOR Constant:i64<-2>, Constant:i64<-3>

t54: v2i64 = add t45, t32

t33: v2i64 = BUILD_VECTOR Constant:i64<-4>, Constant:i64<-5>

t55: v2i64 = add t45, t33

t34: v2i64 = BUILD_VECTOR Constant:i64<-6>, Constant:i64<-7>

t56: v2i64 = add t46, t34

At this point we have legal types but some illegal operations so the operation legalizer steps in. The ‘add’ operations are fine since we’ll be able to select the addv.d instruction for these but we can’t pick instructions for the ‘BUILD_VECTOR’ nodes. If the constants were different then these nodes might be legal (see MipsSETargetLowering::lowerBUILD_VECTOR() for the code that decides which nodes are ok and which aren’t, and also the ‘setOperationAction(ISD::BUILD_VECTOR, Ty, Custom)’ call that tells SelectionDAG the rules are non-trivial) but we’ll have to replace the BUILD_VECTOR’s we have with something we can handle. The operation legalizer therefore changes them to something like:

t61: v2i64,ch = load<LD64[ConstantPool]> t65, ConstantPool:i64<<2 x i64> <i64 0, i64 -1>> 0, undef:i64

t83: v2i64 = add t74, t61

t62: v2i64,ch = load<LD64[ConstantPool]> t65, ConstantPool:i64<<2 x i64> <i64 -2, i64 -3>> 0, undef:i64

t84: v2i64 = add t75, t62

t63: v2i64,ch = load<LD64[ConstantPool]> t65, ConstantPool:i64<<2 x i64> <i64 -4, i64 -5>> 0, undef:i64

t85: v2i64 = add t75, t63

t64: v2i64,ch = load<LD64[ConstantPool]> t65, ConstantPool:i64<<2 x i64> <i64 -6, i64 -7>> 0, undef:i64

t86: v2i64 = add t76, t64

At this point, the SelectionDAG is suitable for Mips64 with MSA so the instruction selector runs all the rules defined in tablegen to convert the DAG to a DAG of target instructions (MachineSDNode’s).

Hope this helps.