Odd Vector Widen/Split supoort during creating SelectionDAG

Hi, For a odd vector crossing basic blocks, it would be extracted and producing many scalar elements,
this would cause too many register spillings during register allocation, finally spilling failed in my backend;
I have found the root cause is odd vector widen/split is not supported in function TargetLowering::getVectorTypeBreakdown;
This issue was discussed in the github issue but without a conclution;
How to workaround with this?

define <40 x i32> @test_0(i1 %a) nounwind {
entry:
  %.pre12 = load <40 x i32>, <40 x i32> addrspace(100)* inttoptr (i32 1024 to <40 x i32> addrspace(100)*), align 512
  br i1 %a, label %label1, label %label0

label0:
  %.pre22 = load <40 x i32>, <40 x i32> addrspace(100)* inttoptr (i32 2048 to <40 x i32> addrspace(100)*), align 512
  br label %label1

label1:
  %0 = phi <40 x i32> [ %.pre12, %entry ], [ %.pre22, %label0 ]
  %1 = ashr <40 x i32> %0, %.pre12
  ret <40 x i32> %1
}
Creating new node: t2: i1,ch = CopyFromReg t0, Register:i1 %120
Creating constant: t3: i32 = Constant<1024>
Creating constant: t4: i32 = Constant<0>
Creating new node: t5: i32 = undef
Creating new node: t6: v40i32,ch = load<(load (s1280) from `<40 x i32> addrspace(100)* inttoptr (i32 1024 to <40 x i32> addrspace(100)*)`, align 512, addrspace 100)> t0, Constant:i32<1024>, undef:i32
Creating new node: t7: i32 = extract_vector_elt t6, Constant:i32<0>
Creating constant: t8: i32 = Constant<1>
Creating new node: t9: i32 = extract_vector_elt t6, Constant:i32<1>
Creating constant: t10: i32 = Constant<2>
Creating new node: t11: i32 = extract_vector_elt t6, Constant:i32<2>
......
Creating constant: t82: i32 = Constant<38>
Creating new node: t83: i32 = extract_vector_elt t6, Constant:i32<38>
Creating constant: t84: i32 = Constant<39>
Creating new node: t85: i32 = extract_vector_elt t6, Constant:i32<39>
Creating new node: t87: ch = CopyToReg t0, Register:i32 %0, t7
Creating new node: t89: ch = CopyToReg t0, Register:i32 %1, t9
......
Creating new node: t163: ch = CopyToReg t0, Register:i32 %38, t83
Creating new node: t165: ch = CopyToReg t0, Register:i32 %39, t85
Creating new node: t166: ch = TokenFactor t87, t89, t91, t93, t95, t97, t99, t101, t103, t105, t107, t109, t111, t113, t115, t117, t119, t121, t123, t125, t127, t129, t131, t133, t135, t137, t139, t141, t143, t145, t147, t149, t151, t153, t155, t157, t159, t161, t163, t165
Creating new node: t168: ch = brcond t166, t2, BasicBlock:ch<label1 0x140c748>
Creating new node: t170: ch = br t168, BasicBlock:ch<label0 0x140c660>
nitial selection DAG: %bb.0 'test_0:entry'
SelectionDAG has 171 nodes:
  t0: ch = EntryToken
  t6: v40i32,ch = load<(load (s1280) from `<40 x i32> addrspace(100)* inttoptr (i32 1024 to <40 x i32> addrspace(100)*)`, align 512, addrspace 100)> t0, Constant:i32<1024>, undef:i32
          t7: i32 = extract_vector_elt t6, Constant:i32<0>
        t87: ch = CopyToReg t0, Register:i32 %0, t7
          t9: i32 = extract_vector_elt t6, Constant:i32<1>
        t89: ch = CopyToReg t0, Register:i32 %1, t9
          t11: i32 = extract_vector_elt t6, Constant:i32<2>
        ......
        t161: ch = CopyToReg t0, Register:i32 %37, t81
          t83: i32 = extract_vector_elt t6, Constant:i32<38>
        t163: ch = CopyToReg t0, Register:i32 %38, t83
          t85: i32 = extract_vector_elt t6, Constant:i32<39>
        t165: ch = CopyToReg t0, Register:i32 %39, t85
      t166: ch = TokenFactor t87, t89, t91, t93, t95, t97, t99, t101, t103, t105, t107, t109, t111, t113, t115, t117, t119, t121, t123, t125, t127, t129, t131, t133, t135, t137, t139, t141, t143, t145, t147, t149, t151, t153, t155, t157, t159, t161, t163, t165
      t2: i1,ch = CopyFromReg t0, Register:i1 %120
    t168: ch = brcond t166, t2, BasicBlock:ch<label1 0x140c748>
  t170: ch = br t168, BasicBlock:ch<label0 0x140c660>
LLVM ERROR: Error while trying to spill R5 from class GPR: Cannot scavenge register without an emergency spill slot!
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: ./bin/llc -march=nnp -debug -filetype=asm -o - test.ll
1.	Running pass 'Function Pass Manager' on module 'test.ll'.
2.	Running pass 'Prologue/Epilogue Insertion & Frame Finalization' on function '@test_0'
 ...
 #8 0x00007f4d1115c031 llvm::RegScavenger::spill(llvm::Register, llvm::TargetRegisterClass const&, int, llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>, llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>&) ../lib/CodeGen/RegisterScavenging.cpp:0:7
 #9 0x00007f4d1115d04d llvm::RegScavenger::scavengeRegisterBackwards(llvm::TargetRegisterClass const&, llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>, bool, int, bool) ../lib/CodeGen/RegisterScavenging.cpp:619:18
#10 0x00007f4d1115e5d6 scavengeVReg(llvm::MachineRegisterInfo&, llvm::RegScavenger&, llvm::Register, bool) ../lib/CodeGen/RegisterScavenging.cpp:675:22
#11 0x00007f4d1115dba4 scavengeFrameVirtualRegsInBlock(llvm::MachineRegisterInfo&, llvm::RegScavenger&, llvm::MachineBasicBlock&) ../lib/CodeGen/RegisterScavenging.cpp:715:25
#12 0x00007f4d1115d8df llvm::scavengeFrameVirtualRegs(llvm::MachineFunction&, llvm::RegScavenger&) ../lib/CodeGen/RegisterScavenging.cpp:775:10

I fix this issue by promte the function: getVectorTypeBreakdown;
This change makes the odd vector widen to a larger power-of-2-sized vector, then split it to serval legal vector size;
Dose this work for every case?

unsigned TargetLoweringBase::getVectorTypeBreakdown(LLVMContext &Context,
                                                    EVT VT, EVT &IntermediateVT,
                                                    unsigned &NumIntermediates,
                                                    MVT &RegisterVT) const {
  ElementCount EltCnt = VT.getVectorElementCount();

  // If there is a wider vector type with the same element type as this one,
  // or a promoted vector type that has the same number of elements which
  // are wider, then we should convert to that legal vector type.
  // This handles things like <2 x float> -> <4 x float> and
  // <4 x i1> -> <4 x i32>.
  LegalizeTypeAction TA = getTypeAction(Context, VT);
  if (!EltCnt.isScalar() &&
      (TA == TypeWidenVector || TA == TypePromoteInteger)) {
    EVT RegisterEVT = getTypeToTransformTo(Context, VT);
    if (isTypeLegal(RegisterEVT)) {
      IntermediateVT = RegisterEVT;
      RegisterVT = RegisterEVT.getSimpleVT();
      NumIntermediates = 1;
      return 1;
    }
    /// {@ Nafeng: Support odd vector widen/split;
    EltCnt = RegisterEVT.getVectorElementCount(); 
    /// @}
  }
...
Creating new node: t2: i1,ch = CopyFromReg t0, Register:i1 %12
Creating constant: t3: i32 = Constant<1024>
Creating constant: t4: i32 = Constant<0>
Creating new node: t5: i32 = undef
Creating new node: t6: v40i32,ch = load<(load (s1280) from `<40 x i32> addrspace(100)* inttoptr (i32 1024 to <40 x i32> addrspace(100)*)`, align 512, addrspace 100)> t0, Constant:i32<1024>, undef:i32
Creating new node: t7: i32 = extract_vector_elt t6, Constant:i32<0>
Creating constant: t8: i32 = Constant<1>
Creating new node: t9: i32 = extract_vector_elt t6, Constant:i32<1>
Creating constant: t10: i32 = Constant<2>
Creating new node: t11: i32 = extract_vector_elt t6, Constant:i32<2>
...
Creating constant: t82: i32 = Constant<38>
Creating new node: t83: i32 = extract_vector_elt t6, Constant:i32<38>
Creating constant: t84: i32 = Constant<39>
Creating new node: t85: i32 = extract_vector_elt t6, Constant:i32<39>
Creating new node: t86: v64i32 = BUILD_VECTOR t7, t9, t11, t13, t15, t17, t19, t21, t23, t25, t27, t29, t31, t33, t35, t37, t39, t41, t43, t45, t47, t49, t51, t53, t55, t57, t59, t61, t63, t65, t67, t69, t71, t73, t75, t77, t79, t81, t83, t85, undef:i32, undef:i32, undef:i32, undef:i32, undef:i32, undef:i32, undef:i32, undef:i32, undef:i32, undef:i32, undef:i32, undef:i32, undef:i32, undef:i32, undef:i32, undef:i32, undef:i32, undef:i32, undef:i32, undef:i32, undef:i32, undef:i32, undef:i32, undef:i32
Creating new node: t87: v16i32 = extract_subvector t86, Constant:i32<0> // Split to legal vector type
Creating new node: t88: v16i32 = extract_subvector t86, Constant:i32<16>
Creating new node: t89: v16i32 = extract_subvector t86, Constant:i32<32>
Creating constant: t90: i32 = Constant<48>
Creating new node: t91: v16i32 = extract_subvector t86, Constant:i32<48>
Creating new node: t93: ch = CopyToReg t0, Register:v16i32 %0, t87
Creating new node: t95: ch = CopyToReg t0, Register:v16i32 %1, t88
Creating new node: t97: ch = CopyToReg t0, Register:v16i32 %2, t89
Creating new node: t99: ch = CopyToReg t0, Register:v16i32 %3, t91
Creating new node: t100: ch = TokenFactor t93, t95, t97, t99
Creating new node: t102: ch = brcond t100, t2, BasicBlock:ch<label1 0x1d158e8>
Creating new node: t104: ch = br t102, BasicBlock:ch<label0 0x1d15800>

I don’t know much about this code, but I do know it hasn’t had much attention over the years. Its also used by the vector calling convention code so any changes could have nasty side effects there as well.

It looks like you’re just avoiding this code slightly further down:

  // FIXME: We don't support non-power-of-2-sized vectors for now.  Ideally
  // we could break down into LHS/RHS like LegalizeDAG does.
  if (!isPowerOf2_32(EltCnt.getKnownMinValue())) {
    NumVectorRegs = EltCnt.getKnownMinValue();
    EltCnt = ElementCount::getFixed(1);
  }

We probably want to avoid changing the default implementation of getVectorTypeBreakdownForCallingConv, yes, since that would impact the calling convention of an unknown number of targets, and I don’t really want to go there.

For the non-calling-convention case, there isn’t any ABI significance to how we break down vectors; it just needs to be self-consistent. I think nobody has really looked at it because vectors with a non-power-of-two number of elements are usually small (like 3 element vectors in OpenCL).

Yes,this change can make vector type breakdown works just like the the vector legalize done;