Specifying conditional blocks for the back end

Hello.
     For my back end for the Connex SIMD research processor I want to implement conditional blocks (I guess the better term is predicated blocks). Predicated blocks are bordered by two instructions WHEREEQ (or WHERELT, etc) and ENDWHERE.
     For example, the following code executes the instructions inside the WHERE block only for the lanes where R0 == R1:
         EQ R0, R1;
         WHEREEQ
           vector_asm_instr1;
           ...
           vector_asm_instrk;
         ENDWHERE

     I was able to generate at instruction selection such a block by writing custom C++ selection code, but I don't know how can I inform the back end that the instructions inside the WHERE block get executed conditionally, not always.
     This matters it seems only for optimization levels in llc -O1/2/3, but not for O0. For levels of optimization O1/2/3, I experienced cases where the WHEREEQ and ENDWHERE instructions were simply removed and the vector_asm_instr1..k became executed unconditionally, etc - and this is NOT good.

     Could you please tell me how can I inform the back end that the instructions inside my WHERE blocks get executed conditionally, not always.

   Thank you very much,
     Alex

There's some existing infrastructure in the backend for predication; see lib/CodeGen/IfConversion.cpp (and the target hooks PredicateInstruction etc.). For forming blocks, you might want to follow what the ARM backend does for Thumb2; see Thumb2ITBlockPass.cpp .

-Eli

Hello.
     Because I experience optimizations (DCE, OoO schedule) which mess the correct semantics of the list of instructions lowered in ISelLowering from the VSELECT LLVM instruction, and these bad transformations happen even before scheduling, at later I-sel subpasses, I try to fix this problem by lowering VSELECT to only one pseudo-instruction and LATER translate it to a list of instructions and use bundles and maybe also PredicateInstruction(), which is employed also in IfConversion.cpp.
     More exactly I'm trying to use a pseudo-instruction that will get translated to a sequence of 4 MachineInstr, namely:
         // These 4 instructions replace the pseudo-instruction I use for LLVM's VSELECT
         R31 = OR srcVselectFalse, srcVselectFalse
         WHEREEQ
            R31 = OR srcVselectTrue, srcVselectTrue
         ENDWHERE
     I plan to do this as early as possible, in a pass registered in addInstSelector() normally, which gets executed immediately after the first scheduling phase.
     If anybody sees a problem with this, please let me know.

     I think it is OK to specify an empty semantics (empty DAG pattern in TableGen) for my WHEREEQ/ENDWHERE instructions delimiting the predication/conditional block.

     Eli, thank you for the pointers. The "it" ARM Thumb2 instruction is very interesting, maybe even unique among mainstream processors, handling predicated execution of 2 contiguous blocks of instructions; I found some specs for it at https://community.arm.com/processors/b/blog/posts/condition-codes-3-conditional-execution-in-thumb-2. This instruction is quite similar to my conditional-block instructions WHERExy/ENDWHERE (xy can be EQ, LT, CRY).

   Thank you,
     Alex

Hello.
     I wanted to tell you that I managed to codegen correctly the LLVM VSELECT instruction by doing the steps described below.
     Can somebody help me with the problems with the PredicateInstruction() method I describe below at point 3? Although I managed to avoid using PredicateInstruction(), I am curious why it doesn't work.

     To codegen correctly the LLVM VSELECT instruction (I will be very explicit, so bare with me if you have similar issues):
       - 1. I declare in TableGen an instruction WHERE_EQ (I assume without loss of generality that VSELECT has a seteq predicate), which will implement the VSELECT in terms of my processor's WHERE blocks.
       - 2. in ISelLowering::Lower() I replace the VSELECT with WHERE_EQ. (note that before I was generating the entire list of MachineSDNode instructions equivalent to VSELECT in ISelLowering::Lower(), but the scheduler and the DCE (Dead Code Elimination) pass were messing up the order of instructions resulting in incorrect semantics). Note that I give to WHERE_EQ as inputs the SDNode operands of VSELECT, in order to be able to access them later in the PassCreateWhereBlocks pass mentioned below;

       - 3. I registered a pass PassCreateWhereBlocks in addInstSelector() in [Target]TargetMachine.cpp, which gets executed immediately after instruction selection followed by a first scheduling phase.
         Even if I predicate in PassCreateWhereBlocks the instructions inside the WHERE block, the method PredicateInstruction() fails by returning false, which means the method did not add a predicated flag to the instructions I wanted to. This results, as I said before, in incorrect program optimizations such as useful instructions being removed, because the compiler does not understand that code in my WHERE blocks are predicated (conditional), so it assumes they are always being executed. As a side not, I see the ARM and SystemZ back ends are overriding the PredicateInstruction() method, but their code is a bit complex and I did not bother much to understand how they manage to predicate their instructions e.g., for ARM Thumb2 "it" instruction - are there some links documenting their work?
         Therefore I started using bundles instead of making predicated instructions - as far as I can see DCE cannot be performed inside bundled instructions (see also http://llvm.org/docs/doxygen/html/DeadMachineInstructionElim_8cpp_source.html which does NOT treat bundles, which implies it is not looking at the instruction inside a bundle and can only see the "header" instruction of a bundle; therefore, I believe it is safe to bundle instructions to avoid DCE as long as at least we can infer the "header" instruction of the bundle is not going to be ever DCE-ed). Using bundles also avoids that the scheduler changes the order of the bundled instructions. To create the bundle I use MIBundleBuilder, since using directly in this pass (PassCreateWhereBlocks) the finalizeBundle() method results in an error like "llc: /llvm/lib/CodeGen/MachineInstrBundle.cpp:149: void llvm::finalizeBundle(llvm::MachineBasicBlock&, llvm::MachineBasicBlock::instr_iterator, llvm::MachineBasicBlock::instr_iterator): Assertion `TargetRegisterInfo::isPhysicalRegister(Reg)' failed."
         So I create for VSELECT pred, Vreg_true, Vreg_false an equivalent sequence of MachineInstr:
           // pred is computed before
           R31 = OR Rfalse, Rfalse // copy Rfalse to R31
           WHERE_EQ
             R31 = OR Rtrue, Rtrue // copy Rtrue to R31
           ENDWHERE

         Note that I create a physical register (R31, a vector register; I also reserve this register in [Target]RegisterInfo::getReservedRegs(), to avoid an error which sometimes happened due to MachineVerifier.cpp like "Bad machine code: Using an undefined physical register"). I cannot use instead of R31 a virtual register in PassCreateWhereBlocks (and ISelLowering::Lower()) since I need to assign to it twice (for both the then and else branches of the VSELECT instruction) and virtual registers follow the SSA rule of single-assignment (so I get the following error if assigning twice to a virtual register: <<MachineRegisterInfo.cpp:339 [...] "getVRegDef assumes a single definition or no definition"' failed.>>). Also I tried without success using MachineRegisterInfo::leaveSSA() to avoid this problem with single-assignment, but then other passes like MachineLICM will give an error in llc like <<MachineLICM.cpp:409: [...] Assertion `TargetRegisterInfo::isPhysicalRegister(Reg) && "Not expecting virtual register!"' failed.>>, because MachineRegisterInfo::isSSA() returns false, which makes the pass assume that register allocation has finished and we have only physical registers, which unfortunately is NOT the case.

       - 4. I also register a pass PassFinalizeBundles, in the addPreSched2() method [Target]TargetMachine.cpp and use finalizeBundle() on the instruction bundle I created earlier in PassCreateWhereBlocks because I want to avoid later errors like <</llvm/lib/CodeGen/PostRASchedulerList.cpp:357: virtual bool {anonymous}::PostRAScheduler::runOnMachineFunction(llvm::MachineFunction&): Assertion `Count == 0 && "Instruction count mismatch!"' failed.>> (IIRC)

   Best regards,
     Alex

  Hello.
    I wanted to tell you that I managed to codegen correctly the LLVM VSELECT instruction by doing the steps described below.
    Can somebody help me with the problems with the PredicateInstruction() method I describe below at point 3? Although I managed to avoid using PredicateInstruction(), I am curious why it doesn't work.

    To codegen correctly the LLVM VSELECT instruction (I will be very explicit, so bare with me if you have similar issues):
      - 1. I declare in TableGen an instruction WHERE_EQ (I assume without loss of generality that VSELECT has a seteq predicate), which will implement the VSELECT in terms of my processor's WHERE blocks.
      - 2. in ISelLowering::Lower() I replace the VSELECT with WHERE_EQ. (note that before I was generating the entire list of MachineSDNode instructions equivalent to VSELECT in ISelLowering::Lower(), but the scheduler and the DCE (Dead Code Elimination) pass were messing up the order of instructions resulting in incorrect semantics). Note that I give to WHERE_EQ as inputs the SDNode operands of VSELECT, in order to be able to access them later in the PassCreateWhereBlocks pass mentioned below;

      - 3. I registered a pass PassCreateWhereBlocks in addInstSelector() in [Target]TargetMachine.cpp, which gets executed immediately after instruction selection followed by a first scheduling phase.
        Even if I predicate in PassCreateWhereBlocks the instructions inside the WHERE block, the method PredicateInstruction() fails by returning false, which means the method did not add a predicated flag to the instructions I wanted to.

PredicateInstruction is a virtual method, and the default implementation always returns false; your target is supposed to override it.

This results, as I said before, in incorrect program optimizations such as useful instructions being removed, because the compiler does not understand that code in my WHERE blocks are predicated (conditional), so it assumes they are always being executed. As a side not, I see the ARM and SystemZ back ends are overriding the PredicateInstruction() method, but their code is a bit complex and I did not bother much to understand how they manage to predicate their instructions e.g., for ARM Thumb2 "it" instruction - are there some links documenting their work?

Thumb2 models its predicated instructions the same way as non-Thumb ARM does until very late in the backend. Basically, the predicate is just an operand of the MachineInstr. But it's a bit simpler because we don't predicate instructions until after register allocation.

        Therefore I started using bundles instead of making predicated instructions - as far as I can see DCE cannot be performed inside bundled instructions (see also http://llvm.org/docs/doxygen/html/DeadMachineInstructionElim_8cpp_source.html which does NOT treat bundles, which implies it is not looking at the instruction inside a bundle and can only see the "header" instruction of a bundle; therefore, I believe it is safe to bundle instructions to avoid DCE as long as at least we can infer the "header" instruction of the bundle is not going to be ever DCE-ed). Using bundles also avoids that the scheduler changes the order of the bundled instructions. To create the bundle I use MIBundleBuilder, since using directly in this pass (PassCreateWhereBlocks) the finalizeBundle() method results in an error like "llc: /llvm/lib/CodeGen/MachineInstrBundle.cpp:149: void llvm::finalizeBundle(llvm::MachineBasicBlock&, llvm::MachineBasicBlock::instr_iterator, llvm::MachineBasicBlock::instr_iterator): Assertion `TargetRegisterInfo::isPhysicalRegister(Reg)' failed."
        So I create for VSELECT pred, Vreg_true, Vreg_false an equivalent sequence of MachineInstr:
          // pred is computed before
          R31 = OR Rfalse, Rfalse // copy Rfalse to R31
          WHERE_EQ
            R31 = OR Rtrue, Rtrue // copy Rtrue to R31
          ENDWHERE

        Note that I create a physical register (R31, a vector register; I also reserve this register in [Target]RegisterInfo::getReservedRegs(), to avoid an error which sometimes happened due to MachineVerifier.cpp like "Bad machine code: Using an undefined physical register"). I cannot use instead of R31 a virtual register in PassCreateWhereBlocks (and ISelLowering::Lower()) since I need to assign to it twice (for both the then and else branches of the VSELECT instruction) and virtual registers follow the SSA rule of single-assignment (so I get the following error if assigning twice to a virtual register: <<MachineRegisterInfo.cpp:339 [...] "getVRegDef assumes a single definition or no definition"' failed.>>). Also I tried without success using MachineRegisterInfo::leaveSSA() to avoid this problem with single-assignment, but then other passes like MachineLICM will give an error in llc like <<MachineLICM.cpp:409: [...] Assertion `TargetRegisterInfo::isPhysicalRegister(Reg) && "Not expecting virtual register!"' failed.>>, because MachineRegisterInfo::isSSA() returns false, which makes the pass assume that register allocation has finished and we have only physical registers, which unfortunately is NOT the case.

The right way to model this in SSA form would be something like this:

Rresult1 = OR Rfalse, Rfalse
Rresult2 = WHERE_EQ_OR flags, Rresult1, Rtrue, Rtrue

You then tie the two virtual registers together so the register allocator knows they have to be allocated to same physical register (something like `let Constraints = "$Rresult1 = $Rresult2"` in TableGen).

-Eli