Need advice on writing scheduling pass

Hi (Jakob),

in reference to the prior message below, I have the following follow-up questions, as I also need a scheduling pass 
prior to regalloc. I need to do this in order to set VLIW-flags, so that the RA is aware of several MI's
per cycle with a redefined LiveRange::overlap-function. On a multiple-issue cycle, a register that gets killed
can be reused by another MI - these live ranges do not then overlap.

Well, I would like to schedule the VLIW code after SimpleRegisterCoalescer, so that I get more or less the 
final code to work with. As the instructions are rearrange, I suppose I must run the SlotIndexes and 
LiveIntervals again. LiveVariables should also be refreshed as a register might get killed with a different
MI if two users change place, etc, I suppose.

I would like to just rerun these passes, but you said below that LiveIntervals do not work after SSA form is
abandoned.

I wonder how you mean to update the live intervals after scheduling -- the slot indexes must surely be useless
with a changed order of MI's?

Do you know of a scheme to rewrite these passes so that they can be rerun as needed? Is all but LiveIntervals
ok with this as of now?

Thanks

Jonas

> *Remove unreachable machine basic blocks*
> *Live Variable Analysis*
> *Eliminate PHI nodes for register allocation*
> *Two-Address instruction pass*
> *Process Implicit Definitions.*
> *MachineDominator Tree Construction*
> *Machine Natural Loop Construction*
> *Modulo scheduing  <== modulo scheduling pass inserted here*
> *Slot index numbering*
> *Live Interval Analysis*
> *MachineDominator Tree Construction*
> *Machine Natural Loop Construction*
> *Simple Register Coalescing*
> *Calculate spill weights*
> *Live Stack Slot Analysis*
> *Virtual Register Map*
> *Linear Scan Register Allocator*

[...]

> *Here are my questions:*
> *1. Which passes after the scheduling pass can be run without modification? I suspect LiveIntervalAnalysis will not be able to handle the transformed BB judging from the way it handles two-address code and phijoins. Will the other passes need to be changed as well?*

"Simple Register Coalescing" can handle any code, but the live intervals must be correct.

> *2. Is the scheduling pass inserted in the right position? Currently the scheduling pass is run right before Slot index numbering and LiveInterval analysis, since I thought it would required a lot of work to fix the indexes and intervals if the scheduling pass were run after these two passes.* 

I recommend that you do not edit machine code between "Live Variable Analysis" and "Live Interval Analysis". LiveIntervals cannot handle general code, it requires something that is SSA form except for the specific edits from phi-elim and 2-addr. It also requires kill flags and the live variable analysis information to be correct.

If you insert your pass before LiveVariables, you must preserve SSA form.

If you insert your pass after LiveIntervals, you must update the intervals manually and correctly. If you don't, everything breaks. It's a pain, sorry!

> *3. If the scheduling pass does local register allocation too, is there a way to tell the register allocation pass that is run later not to touch it?* 

Yes, simply replace the virtual registers with the allocated physical registers. Then the register allocator won't touch them. Remember to create live intervals for the physical registers. That is how the register allocator detects interference.

> *Any advice, comments and suggestions are appreciated.*

It is much easier to edit machine code while it is in SSA form. That is before LiveVariables.

/jakob

Hi (Jakob),

in reference to the prior message below, I have the following follow-up questions, as I also need a scheduling pass
prior to regalloc. I need to do this in order to set VLIW-flags, so that the RA is aware of several MI's
per cycle with a redefined LiveRange::overlap-function. On a multiple-issue cycle, a register that gets killed
can be reused by another MI - these live ranges do not then overlap.

Redefining overlap() won't work for that. There is other code assuming that overlap means overlap, for example the LiveIntervalUnion used by the new register allocators.

For VLIW, you probably want to number your packets instead of individual instructions. We don't have any VLIW support, so nobody has thought about how best to do it.

Well, I would like to schedule the VLIW code after SimpleRegisterCoalescer, so that I get more or less the
final code to work with. As the instructions are rearrange, I suppose I must run the SlotIndexes and
LiveIntervals again. LiveVariables should also be refreshed as a register might get killed with a different
MI if two users change place, etc, I suppose.

I would like to just rerun these passes, but you said below that LiveIntervals do not work after SSA form is
abandoned.

I wonder how you mean to update the live intervals after scheduling -- the slot indexes must surely be useless
with a changed order of MI's?

Do you know of a scheme to rewrite these passes so that they can be rerun as needed? Is all but LiveIntervals
ok with this as of now?

So the good news is that we are slowly moving towards a similar design. The bad news is that we are *slowly* moving...

Currently, the register allocator super-pass contains these passes:

- LiveVars
- PhiElim
- TwoAddr
- LiveIntervals
- Coalescing
- RegAlloc

Currently, LiveVars requires SSA form, and LiveIntervals only works with simple multi-defs as produced by PhiElim and TwoAddr. That means the pass order is fixed.

The plan is to teach PhiELim and TwoAddr how to update LiveIntervals so it can run earlier:

- LiveVars
- LiveIntervals
- PhiElim
- TwoAddr
- Coalescing
- RegAlloc

Then we can eliminate LiveVars completely:

- LiveIntervals
- PhiElim
- TwoAddr
- Coalescing
- RegAlloc

This version of LiveIntervals will compute liveness autonomously, but it is very likely to also require SSA form because of the possible speedups.

The next step is to merge PhiElim and TwoAddr into a SuperCoalescer pass:

- LiveIntervals
- SuperCoalescing
- RegAlloc

Finally, Andy also wants to schedule after coalescing:

- LiveIntervals
- SuperCoalescing
- Schedule
- RegAlloc

We don't have any concrete plans for how that scheduler would look. It would likely benefit from the existing LiveIntervals, at least the embedded SSA form is essential.

It is not clear if such a scheduler should preserve live intervals or if everything should be recomputed. There are also phase ordering issues; the scheduler would be severely constrained by our very aggressive coalescing, and it would expose further coalescing opportunities that RegAlloc probably wouldn't be able to exploit fully.

But if you want to add a VLIW scheduler tomorrow, I don't know what to tell you. It's a big project.

/jakob

hi Jonas,

Hi (Jakob),

in reference to the prior message below, I have the following follow-up questions, as I also need a scheduling pass
prior to regalloc. I need to do this in order to set VLIW-flags, so that the RA is aware of several MI's
per cycle with a redefined LiveRange::overlap-function. On a multiple-issue cycle, a register that gets killed
can be reused by another MI - these live ranges do not then overlap.

Redefining overlap() won't work for that. There is other code assuming that overlap means overlap, for example the LiveIntervalUnion used by the new register allocators.

For VLIW, you probably want to number your packets instead of individual instructions. We don't have any VLIW support, so nobody has thought about how best to do it.

People had discussed VLIW support before, you may have a look at this:
http://old.nabble.com/VLIW-Scheduling-td857833.html

I implemented the VLIW scheduling/register allocate in llvm backend
like the way described in the above thread, and it work without any
problem.

best regards
ether

hi Jonas,

For VLIW, you probably want to number your packets instead of individual instructions. We don't have any VLIW support, so nobody has thought about how best to do it.

People had discussed VLIW support before, you may have a look at this:
http://old.nabble.com/VLIW-Scheduling-td857833.html

Wow, that's old, but it is still a good way of solving the problem. It is the easiest way of smuggling packets through regalloc.

I implemented the VLIW scheduling/register allocate in llvm backend
like the way described in the above thread, and it work without any
problem.

I assume you put your VLIW scheduler before the register allocator super-pass?

/jakob

I assume you put your VLIW scheduler before the register allocator super-pass?

right

What about the coalescing - How could you schedule if you did not know which COPY’s would remain/eliminated? This is why I would like to do scheduling
after SimpleRegCoalesc.

Jonas

hi Jonas

What about the coalescing - How could you schedule if you did not know
which COPY's would remain/eliminated? This is why I would like to do

oops! I had never thought about this before, i simply remain them all.

scheduling
after SimpleRegCoalesc.

I will try this later :slight_smile:

best regards
ether

This is only worth doing if these assumptions are valid:

- The SSA form embedded in liveintervals permits efficient DAG construction

- liveintervals are easy and cheap to update within an extended basic block (contiguously laid out blocks with no merges)

- Inserting copies to break interferences is easy to do prior to physical regalloc, and the corresponding liveinterval update is cheap. This is very different from creating physreg copies, when you may not have any free physregs!

Presenting the scheduler with coalesced virtual registers doesn't need to constrain the schedule. It really provides better information to the schedule. post-RA-sched is only left to cleanup split/spill code--a localized problem.

-Andy

Hi,

thank you for your explanations.

In order to get a pre-RA scheduling, I would need something like:

- LiveVars
- PhiElim
- TwoAddr
- LiveIntervals
- Coalescing
- Scheduler       (new)
- SlotIndexing
- LiveIntervals2  (new)
- RegAlloc

My qeustion then is, is it really so difficult to create the live intervals information, with modifications to the original algorithm, or even from scratch? Normally, it should not have to be difficult to do a non-SSA live analysis pass. The format of the LiveInterval ought to serve well for any Live analysis, or? What are the pitfalls relating to the LLVM classes?

One thing that is somewhat unclear to me is why phys-regs are only considered live to end of block? Is this because the RA later ignores these registers? How come
this information is not needed?

Jonas

In order to get a pre-RA scheduling, I would need something like:
- LiveVars
- PhiElim
- TwoAddr
- LiveIntervals
- Coalescing
- Scheduler (new)
- SlotIndexing
- LiveIntervals2 (new)
- RegAlloc
My qeustion then is, is it really so difficult to create the live intervals information, with modifications to the original algorithm, or even from scratch?

I've done something similar to this in the past, though without the
coalescing part. As I was interested in spilling, coalescing didn't matter
to me. I approached this in a simple but possibly inelegant way, by
integrating it into the LiveIntervals pass itself.

In LiveIntervals::runOnMachineFunction, after the computeIntervals() call
that does the actual work of the live interval analysis, I call my
scheduler. After scheduling, I simply do the following:

  // Fix up kill information for live intervals. Rescheduling may often have
  // changed which instruction is a value's last use, and we must update
  // kill flags and the kill information in the VarInfo objects.
  fixKillInformation();

  // Now, release the stuff we computed, and recompute it!
  releaseMemory();

  // Then, recompute the slot indexes. There is a function renumberIndexes,
  // but it doesn't respect our new ordering of instructions, so do this by
  // completely clearing the results of the slot index analysis and simply
  // calling it again.
  indexes_->releaseMemory();
  indexes_->runOnMachineFunction(*mf_);

  // Now compute live intervals again. That's it!
  computeIntervals();

The fixKillInformation() function is also mine; it updates the isKill flags
of MachineOperands and the lists of killing instructions maintained by
LiveVariables.

This setup isn't quite in line with LLVM's pass architecture, but it war
easy to do as a quick-and-dirty thing. Contact me off-list if you are
interested in more information of my work, my (ugly) source and my results.
I would also be interested in more details about what you are planning to
do!

One thing that is somewhat unclear to me is why phys-regs are only considered live to end of block? Is this because the RA later ignores these registers? How come
this information is not needed?

My understanding is that physregs are only ever used for very brief, local
operations; for instance, an argument register is loaded before a function
call but is dead afterwards. Return registers are immediately copied to
virtual registers and can then be considered dead.

This way, you don't need to track liveness and reaching definitions for such
registers across blocks. But probably someone with a better understanding of
this design should weigh in.

Gergo

That’s right.

It it possible to have physregs live across blocks, but they must be explicitly added to the live-in list for each block.

/jakob