Hello.
I wanted to tell you that I managed to codegen correctly the LLVM VSELECT instruction by doing the steps described below.
Can somebody help me with the problems with the PredicateInstruction() method I describe below at point 3? Although I managed to avoid using PredicateInstruction(), I am curious why it doesn't work.
To codegen correctly the LLVM VSELECT instruction (I will be very explicit, so bare with me if you have similar issues):
- 1. I declare in TableGen an instruction WHERE_EQ (I assume without loss of generality that VSELECT has a seteq predicate), which will implement the VSELECT in terms of my processor's WHERE blocks.
- 2. in ISelLowering::Lower() I replace the VSELECT with WHERE_EQ. (note that before I was generating the entire list of MachineSDNode instructions equivalent to VSELECT in ISelLowering::Lower(), but the scheduler and the DCE (Dead Code Elimination) pass were messing up the order of instructions resulting in incorrect semantics). Note that I give to WHERE_EQ as inputs the SDNode operands of VSELECT, in order to be able to access them later in the PassCreateWhereBlocks pass mentioned below;
- 3. I registered a pass PassCreateWhereBlocks in addInstSelector() in [Target]TargetMachine.cpp, which gets executed immediately after instruction selection followed by a first scheduling phase.
Even if I predicate in PassCreateWhereBlocks the instructions inside the WHERE block, the method PredicateInstruction() fails by returning false, which means the method did not add a predicated flag to the instructions I wanted to. This results, as I said before, in incorrect program optimizations such as useful instructions being removed, because the compiler does not understand that code in my WHERE blocks are predicated (conditional), so it assumes they are always being executed. As a side not, I see the ARM and SystemZ back ends are overriding the PredicateInstruction() method, but their code is a bit complex and I did not bother much to understand how they manage to predicate their instructions e.g., for ARM Thumb2 "it" instruction - are there some links documenting their work?
Therefore I started using bundles instead of making predicated instructions - as far as I can see DCE cannot be performed inside bundled instructions (see also http://llvm.org/docs/doxygen/html/DeadMachineInstructionElim_8cpp_source.html which does NOT treat bundles, which implies it is not looking at the instruction inside a bundle and can only see the "header" instruction of a bundle; therefore, I believe it is safe to bundle instructions to avoid DCE as long as at least we can infer the "header" instruction of the bundle is not going to be ever DCE-ed). Using bundles also avoids that the scheduler changes the order of the bundled instructions. To create the bundle I use MIBundleBuilder, since using directly in this pass (PassCreateWhereBlocks) the finalizeBundle() method results in an error like "llc: /llvm/lib/CodeGen/MachineInstrBundle.cpp:149: void llvm::finalizeBundle(llvm::MachineBasicBlock&, llvm::MachineBasicBlock::instr_iterator, llvm::MachineBasicBlock::instr_iterator): Assertion `TargetRegisterInfo::isPhysicalRegister(Reg)' failed."
So I create for VSELECT pred, Vreg_true, Vreg_false an equivalent sequence of MachineInstr:
// pred is computed before
R31 = OR Rfalse, Rfalse // copy Rfalse to R31
WHERE_EQ
R31 = OR Rtrue, Rtrue // copy Rtrue to R31
ENDWHERE
Note that I create a physical register (R31, a vector register; I also reserve this register in [Target]RegisterInfo::getReservedRegs(), to avoid an error which sometimes happened due to MachineVerifier.cpp like "Bad machine code: Using an undefined physical register"). I cannot use instead of R31 a virtual register in PassCreateWhereBlocks (and ISelLowering::Lower()) since I need to assign to it twice (for both the then and else branches of the VSELECT instruction) and virtual registers follow the SSA rule of single-assignment (so I get the following error if assigning twice to a virtual register: <<MachineRegisterInfo.cpp:339 [...] "getVRegDef assumes a single definition or no definition"' failed.>>). Also I tried without success using MachineRegisterInfo::leaveSSA() to avoid this problem with single-assignment, but then other passes like MachineLICM will give an error in llc like <<MachineLICM.cpp:409: [...] Assertion `TargetRegisterInfo::isPhysicalRegister(Reg) && "Not expecting virtual register!"' failed.>>, because MachineRegisterInfo::isSSA() returns false, which makes the pass assume that register allocation has finished and we have only physical registers, which unfortunately is NOT the case.
- 4. I also register a pass PassFinalizeBundles, in the addPreSched2() method [Target]TargetMachine.cpp and use finalizeBundle() on the instruction bundle I created earlier in PassCreateWhereBlocks because I want to avoid later errors like <</llvm/lib/CodeGen/PostRASchedulerList.cpp:357: virtual bool {anonymous}::PostRAScheduler::runOnMachineFunction(llvm::MachineFunction&): Assertion `Count == 0 && "Instruction count mismatch!"' failed.>> (IIRC)
Best regards,
Alex