Need advice on writing scheduling pass

Hello LLVM developers,

I have a few questions regarding the passes that are run after instruction selection and before register allocation. I am writing a scheduling pass (modulo scheduling). Before I ask my questions, I will first try to explain the approach I am taking.

  • Currently, I am running the passes in the following order.

(-debug-pass=Structure output)
Remove unreachable machine basic blocks
Live Variable Analysis
Eliminate PHI nodes for register allocation
Two-Address instruction pass
Process Implicit Definitions.
MachineDominator Tree Construction
Machine Natural Loop Construction
Modulo scheduing <== modulo scheduling pass inserted here
Slot index numbering
Live Interval Analysis
MachineDominator Tree Construction
Machine Natural Loop Construction
Simple Register Coalescing
Calculate spill weights
Live Stack Slot Analysis
Virtual Register Map
Linear Scan Register Allocator

  • The scheduling pass can schedule only single basic block loops. It only looks for loops that have 1 or 2 basic blocks (the number of BBs depends on whether or not the loop header and the latch are the same MBB). Basic blocks outside the loop remain unchanged except for the ones that preceed and succeed the loop. Also, basic blocks for prologue and epilogue are added to the CFG.

  • Prior to scheduling, redundant moves that were generated by the phi-elimination pass and two-address instruction pass are removed and the basic block in the loop is simplified as much as possible. For example, the header BB of a loop is transformed as follows (note that information in LiveVariables is not updated, so there may exist inconsistencies):

BB2: preheader, BB3: header & latch, BB4: exit

(before transformation)
BB#2: derived from LLVM BB %entry.bb_crit_edge
Predecessors according to CFG: BB#0
%reg1025 = MOVr %reg1034, pred:14, pred:%reg0, opt:%reg0
%reg1024 = MOVr %reg1033, pred:14, pred:%reg0, opt:%reg0
%reg1036 = MOVi 0, pred:14, pred:%reg0, opt:%reg0
%reg1038 = MOVr %reg1024, pred:14, pred:%reg0, opt:%reg0
%reg1039 = MOVr %reg1025, pred:14, pred:%reg0, opt:%reg0
%reg1040 = MOVr %reg1036, pred:14, pred:%reg0, opt:%reg0
Successors according to CFG: BB#3

BB#3: derived from LLVM BB %bb
Predecessors according to CFG: BB#2 BB#3
%reg1026 = MOVr %reg1038, pred:14, pred:%reg0, opt:%reg0
%reg1027 = MOVr %reg1039, pred:14, pred:%reg0, opt:%reg0
%reg1028 = MOVr %reg1040, pred:14, pred:%reg0, opt:%reg0
%reg1030 = MOVr %reg1027, pred:14, pred:%reg0, opt:%reg0
%reg1037, %reg1030 = LDR_POST %reg1030, %reg0, 4, pred:14, pred:%reg0
%reg1029 = ADDrr %reg1037, %reg1028, pred:14, pred:%reg0, opt:%reg0
%reg1031 = SUBri %reg1026, 1, pred:14, pred:%reg0, opt:%reg0
CMPzri %reg1031, 0, pred:14, pred:%reg0, %CPSR
%reg1038 = MOVr %reg1031, pred:14, pred:%reg0, opt:%reg0
%reg1039 = MOVr %reg1030, pred:14, pred:%reg0, opt:%reg0
%reg1040 = MOVr %reg1029, pred:14, pred:%reg0, opt:%reg0
Bcc <BB#3>, pred:1, pred:%CPSR
Successors according to CFG: BB#4 BB#3

BB#4: derived from LLVM BB %bb.bb2_crit_edge
Predecessors according to CFG: BB#3
%reg1041 = MOVr %reg1028, pred:14, pred:%reg0, opt:%reg0
Successors according to CFG: BB#5

(after transformation)
BB#3:
%reg1028 = MOVr %reg1040, pred:14, pred:%reg0, opt:%reg0
%reg1037, %reg1039 = LDR_POST %reg1039, %reg0, 4, pred:14, pred:%reg0
%reg1040 = ADDrr %reg1037, %reg1028, pred:14, pred:%reg0, opt:%reg0
%reg1038 = SUBri %reg1038, 1, pred:14, pred:%reg0, opt:%reg0
CMPzri %reg1038, 0, pred:14, pred:%reg0, %CPSR
Bcc <BB#3>, pred:1, pred:%CPSR
$138 = void

Here are my questions:

  1. Which passes after the scheduling pass can be run without modification? I suspect LiveIntervalAnalysis will not be able to handle the transformed BB judging from the way it handles two-address code and phijoins. Will the other passes need to be changed as well?

  2. Is the scheduling pass inserted in the right position? Currently the scheduling pass is run right before Slot index numbering and LiveInterval analysis, since I thought it would required a lot of work to fix the indexes and intervals if the scheduling pass were run after these two passes.

  3. If the scheduling pass does local register allocation too, is there a way to tell the register allocation pass that is run later not to touch it?

Any advice, comments and suggestions are appreciated.

Thank you.

    Remove unreachable machine basic blocks
    Live Variable Analysis
    Eliminate PHI nodes for register allocation
    Two-Address instruction pass
    Process Implicit Definitions.
    MachineDominator Tree Construction
    Machine Natural Loop Construction
    Modulo scheduing <== modulo scheduling pass inserted here
    Slot index numbering
    Live Interval Analysis
    MachineDominator Tree Construction
    Machine Natural Loop Construction
    Simple Register Coalescing
    Calculate spill weights
    Live Stack Slot Analysis
    Virtual Register Map
    Linear Scan Register Allocator

[...]

Here are my questions:
1. Which passes after the scheduling pass can be run without modification? I suspect LiveIntervalAnalysis will not be able to handle the transformed BB judging from the way it handles two-address code and phijoins. Will the other passes need to be changed as well?

"Simple Register Coalescing" can handle any code, but the live intervals must be correct.

2. Is the scheduling pass inserted in the right position? Currently the scheduling pass is run right before Slot index numbering and LiveInterval analysis, since I thought it would required a lot of work to fix the indexes and intervals if the scheduling pass were run after these two passes.

I recommend that you do not edit machine code between "Live Variable Analysis" and "Live Interval Analysis". LiveIntervals cannot handle general code, it requires something that is SSA form except for the specific edits from phi-elim and 2-addr. It also requires kill flags and the live variable analysis information to be correct.

If you insert your pass before LiveVariables, you must preserve SSA form.

If you insert your pass after LiveIntervals, you must update the intervals manually and correctly. If you don't, everything breaks. It's a pain, sorry!

3. If the scheduling pass does local register allocation too, is there a way to tell the register allocation pass that is run later not to touch it?

Yes, simply replace the virtual registers with the allocated physical registers. Then the register allocator won't touch them. Remember to create live intervals for the physical registers. That is how the register allocator detects interference.

Any advice, comments and suggestions are appreciated.

It is much easier to edit machine code while it is in SSA form. That is before LiveVariables.

/jakob