Enforcing in post-RA scheduling to keep (two) MachineInstrs together

Hello.
     I am using the post-RA (Register Allocation) scheduler to avoid data hazards by inserting other USEFUL instructions from the program (besides NOPs) and it breaks apart some sequences of instructions which should remain "glued" together.
     More exactly, in my [Target]ISelDAGToDAG.cpp it is possible that I replace for example a BUILD_VECTOR with a machine SDNode called VLOAD_D_WO_IMM and an INLINEASM, the latter having a simple dataflow dependence (black solid edge when outputting the DAG as a .DOT after instruction selection) on the result of the former instruction. (I can present the .DOT after instruction selection obtained with llc -view-sched-dags).
     When I run the default pre-RA scheduler (which seems to be a "List Scheduling" algorithm) I always obtain the ASM generated code where the string of the INLINEASM follows immediately after the associated asm instruction for the VLOAD_D_WO_IMM. But when I use also the post-RA scheduler (llc -post-RA-scheduler ...) I get some different instructions inserted between the VLOAD_D_WO_IMM and the INLINEASM, which is not correct semantically.

     How can I avoid these 2 instructions being separated by the post-RA scheduler? Can I customize the behavior of the post-RA scheduler (I found some documentation at http://llvm.org/docs/doxygen/html/PostRASchedulerList_8cpp.html)?

     The first natural idea was to use SelectionDAG glue edges, but I noticed that they are not very reliable (sometimes I even have difficulties in creating them for example in the classes [Target]ISelDAGToDAG, [Target]ISelLowering). Also I understood that anyhow the scheduler can disregard the glue edges between SelectionDAG nodes. For example:
         - from http://lists.llvm.org/pipermail/llvm-dev/2014-June/074046.html
             <<You can't Glue the two nodes together forever. All Glue really does is
             keep them together long enough for LLVM to put together a data
             dependency through "Uses" and "Defs" implicit operands. Once the
             MachineInstrs have been created, the two instructions are at the whim
             of the scheduler as much as any others.
             If you really need them to remain together, you have to either create
             a pseudo-instruction and expand it extremely late, or create a bundle
             (depending on what's natural for your target).>>
         - from http://lists.llvm.org/pipermail/llvm-dev/2016-June/100885.html:
             <<If you want to have these nodes stick together, using glue may not be
             sufficient. After the machine instructions are generated, the scheduler
             may place instructions between the interrupt disable/restore and the
             atomic load itself. Also, the register allocator may insert some spills
             there---there are ways that this sequence may get separated.
             For this, the best approach may be to define a pseudo-instruction, which
             will be expanded into real instruction in the post-RA expansion pass.>>

     Also, I don't want to use MachineInstr bundles or pseudo-instructions. MachineInstr bundles seem to difficult to use and too late in the code generation (I prefer working at the level of instruction selection). Also, I found little information about pseudo-instructions - there is some API support, namely expandPostRAPseudo() described at http://llvm.org/docs/doxygen/html/classllvm_1_1TargetInstrInfo.html. Also, some documentation at http://llvm.org/devmtg/2014-04/PDFs/Talks/Building%20an%20LLVM%20backend.pdf, slide 55 (and 53, 54).

    Please let me know if I can customize the post-RA scheduler to avoid scheduling in non-consecutive cycles my two SDNodes created "together" or if you recommend a different approach.

   Thank you very much,
     Alex

Well if it is two instructions, then there is always a chance that some pass moves them around or inserts new instructions in between (esp. regalloc may insert spills/reloads/copies). The only guaranteed solution is indeed to a pseudo instruction or an instruction bundle so the instructions look like a single unit to codegen.

That said, if you use the PostMachineScheduler you can insert a schedule dag mutation in createPostMachineScheduler() that adds a cluster edge between the two nodes so the scheduler tries hard to keep them together. Unfortunately this doesn't work always today because the schedulemodel is always checked for stalls first (Pending vs. Available lists in the MachineScheduler) before the scheduler even checks its usual cost function with the cluster heuristic.

- Matthias

You can do that with the regular post-RA scheduler as well via "TargetSubtargetInfo::getPostRAMutations".

-Krzysztof

Hello.
     After looking at the debug information from llc, it seems actually the pre-RA scheduler (NOT the post-RA scheduler) is the one breaking my INLINEASM SDNodes from the "associated" instructions in my program, (there is a simple dataflow edge between the INLINEASM and the associated node).

     Is it possible to generate instruction bundles (or pseudo-instructions) in the pre-RA scheduler pass? At http://llvm.org/docs/CodeGenerator.html#machineinstr-bundles it is written that: "Packing / bundling of MachineInstr’s should be done as part of the register allocation super-pass.", etc.

     Matthias, thank you for pointing out that at least the register allocator can move around my 2 instructions - but note that a MachineSDNode with one destination register and an immediate value and a consecutive INLINEASM (which has no register) should NOT be separated by the register allocator. What other passes from llc (llc -O3) would you believe could separate my 2 instructions?

     I will read about mutations in the documentation (for example, http://llvm.org/docs/doxygen/html/classllvm_1_1ScheduleDAGMI.html and http://llvm.org/docs/doxygen/html/MachineScheduler_8h_source.html) .

   Thank you,
     Alex

That said, if you use the PostMachineScheduler you can insert a schedule dag mutation
in createPostMachineScheduler() that adds a cluster edge between the two nodes so
the scheduler tries hard to keep them together. Unfortunately this doesn't work
always today because the schedulemodel is always checked for stalls first (Pending
vs. Available lists in the MachineScheduler) before the scheduler even checks its
usual cost function with the cluster heuristic.

You can do that with the regular post-RA scheduler as well via
"TargetSubtargetInfo::getPostRAMutations".

-Krzysztof

With best regards,
     Alex Susu

Hello.
     I would like to report how I managed to solve my problem with bundling groups of two MachineInstrs together, after the pre-RA scheduler pass. In detail I do the following:
       - I override the [Target]InstrInfo::expandPostRAPseudo(MachineInstr &MI) method. Note that just giving MI.bundleWithPred(), for example, seems NOT to work. More exactly:
           bool [Target]InstrInfo::expandPostRAPseudo(MachineInstr &MI) const {
               ...
               // We now know that MI is the INLINEASM instruction that needs to be bundled with the previous instruction, predMI.
               // I have to iterate through the MBB (Machine Basic Block) to obtain pred and succ of the MI.
               /*
               We do NOT use MIBundleBuilder, just predMI and succMI iterators.
                   Note that succMI is required if we want to bundle instructions
                   in the interval
                   predMI..MI, where succMI = succ(MI).
               */
               llvm::finalizeBundle(MBB,
                               (MachineBasicBlock::instr_iterator)predMI,
                               (MachineBasicBlock::instr_iterator)succMI);
               ...
           }

       - we need to unpack these bundles by adding in the following method this code:
     void [Target]AsmPrinter::EmitInstruction(const MachineInstr *MI) {
  // Inspired from lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
  if (MI->isBundle()) {
             const MachineBasicBlock *MBB = MI->getParent();
      MachineBasicBlock::const_instr_iterator I = MI->getIterator();
      I++;
      while (I != MBB->instr_end() && I->isInsideBundle()) {
      EmitInstruction(& (*I) );
      ++I;
      }
             return;
  }
         ...
     }

       - then, in InstPrinter/[Target]InstPrinter.cpp we adjust the following method to handle the INLINEASMs I bundle and then unpack:
       void [Target]InstPrinter::printInst(const MCInst *MI, raw_ostream &O,
                                StringRef Annot, const MCSubtargetInfo &STI) {
         /* For some reason, [Target]GenAsmWriter.inc cannot print INLINEASM from the
             MachineInstr bundles I create in [Target]InstrInfo.cpp, expandPostRAPseudo(),
             and then unpack in [Target]AsmPrinter::EmitInstruction().
            So I handle these INLINEASMs myself here.
         */
         if (MI->getOpcode() == 1) {
           printOperand(MI, 0, O);
         }
         else {
           printInstruction(MI, O);
         }

         printAnnotation(O, Annot);
       }

     Doing these changes in the back end allows me to stick some INLINEASMs with their previous instructions, in order to remain next to each other after the post-RA scheduler that I apply to avoid some data hazards in my resulting ASM program.

   Best regards,
     Alex