Reserved/Unallocatable Registers

Lately I have had a few discussions of what it means for a register to be unallocatable or reserved. As this comes up every now and again and I often struggled answering such questions I decided to write down some definite rules and codify the current usage and assumptions. I plan to put the rules below into the doxygen comments of MachineRegisterInfo etc. And I also hope that people will correct me if I am wrong or miss something here!

= Reserved Registers =

== Rules ==
1) The value read from a reserved register cannot be predicted. Reading a reserved register twice may each time produce a different result.
2) Writing to a reserved register may affect more than just the register.
3) Nonetheless reading/writing reserved registers imposes no constraints on reordering instructions.

== Motivation ==
Generic backend code, especially register allocators make assumptions about how registers behave. These include things like the value in a register only changes in instruction with a def or regmask/clobber operand for that register or writing to the register changes its value but has no further effects. There are often cases where we need exceptions to these rules, typical examples of this are:
   - zero registers (e.g. SPARC g0): They always stay zero even if we write other values.
   - program counters (e.g. ARM PC): Their value changes with every instruction and writing into them cause control flow to change.
   - Stack pointer, Frame pointer: They mostly behave like normal registers but we do not want to impose scheduling constraints; Even if the stack pointers value changes because we reordered an instruction we can usually fix this up by adjusting offsets in load/store operations.

So obviously we exclude these registers from register allocation and cannot make too many assumptions about them. However regardless of the alien semantic we still want to model them as registers because that is how they are modeled in most instruction encodings.

== Implications ==
- Register allocators will never assign reserved registers to a virtual register. A reserved register is always unallocatable, but an unallocatable register is not necessary a reserved one!
- Liveness analysis makes no sense for reserved registers.
- The rules above do not free us from modeling the instruction effects properly! Instructions writing to PC must be marked as terminators, we need to add barrier flags if we want to restrict the reordering of time stamp counters, ...

== Examples ==
Assume r0 is a normal register r1 is a reserved register:

- We cannot remove the 2nd COPY here because we may read a different value from r1:
   r0 = COPY r1
       ... use v0
   r0 = COPY r1
       ... use v0

- We can remove this COPY because r0 is unused:
    r0 = COPY r1
    return

- We cannot remove this COPY even if r1 appears unused afterwards. We also cannot replace r1 with a different register.
    r1 = COPY r0

- We can reorder these instructions in any way:

   STORE r0 to r1+12
   STORE r0 to r1+8
   ... = LOAD from r1 + 20

= Unallocatable Registers =

A reserved register is not allocatable, however there are also registers which are unallocatable just because they are explicitely excluded from allocation orders, they are not reserved registers. This can be confusing so I added this section talking about those!

== Rules ==
They behave like normal registers, the only difference is that:
1) The register allocator will never assign an unallocatable register to a virtual register.

== Motivation ==
Typical examples of unallocatable but no reserved registers are:
- A CPUs flag register: The scheduler has to respect the order! We are interested in liveness, but we do not necessarily want to spill/reload them or perform register allocation on a single register.
- X87 floating point stack registers need special handling on top of the generic register allocation

== Impliciations ==
Except for the register allocator not using them they behave like normal registers:
- We track the liveness of unallocatable registers
- The scheduler respects data/output dependencies for unallocatable registers

== Examples ==
Assume r0 is a normal register, r1 is an unallocatable register (but not a reserved one):

- We can remove the 2nd COPY here:
   r0 = COPY r1
       ... use v0
   r0 = COPY r1
       ... use v0

- We can remove remove the following two COPYs because the r0/r1 are not used afterwards:
   r0 = COPY r1
   r1 = COPY r0
   return

- We can replace r1 with a different (normal register) here (provided we replace all following uses)
  r1 = ...
      // ...
      = use r1

Lately I have had a few discussions of what it means for a register to be unallocatable or reserved. As this comes up every now and again and I often struggled answering such questions I decided to write down some definite rules and codify the current usage and assumptions. I plan to put the rules below into the doxygen comments of MachineRegisterInfo etc. And I also hope that people will correct me if I am wrong or miss something here!

Thanks Matthias, much appreciated.

= Reserved Registers =

== Rules ==
1) The value read from a reserved register cannot be predicted. Reading a reserved register twice may each time produce a different result.
2) Writing to a reserved register may affect more than just the register.
3) Nonetheless reading/writing reserved registers imposes no constraints on reordering instructions.

I never thought of it this way. I think of reserved registers as having unbounded, continual liveness. Consequently, the register allocator can never “reuse” them. More precisely, they are live out of the current frame, so reserved register writes can still be reordered with nearby instructions that *don’t* access that register.

Dependencies still need to be tracked!

I think copies could be optimized, but reserved registers are implicitly live out of the frame. Reserved register writes should not move across calls. See http://reviews.llvm.org/D15667.

== Motivation ==
Generic backend code, especially register allocators make assumptions about how registers behave. These include things like the value in a register only changes in instruction with a def or regmask/clobber operand for that register or writing to the register changes its value but has no further effects. There are often cases where we need exceptions to these rules, typical examples of this are:
  - zero registers (e.g. SPARC g0): They always stay zero even if we write other values.
  - program counters (e.g. ARM PC): Their value changes with every instruction and writing into them cause control flow to change.
  - Stack pointer, Frame pointer: They mostly behave like normal registers but we do not want to impose scheduling constraints; Even if the stack pointers value changes because we reordered an instruction we can usually fix this up by adjusting offsets in load/store operations.

Reserved registers could be for out-of-band argument passing that doesn’t follow platform ABI rules.

-Andy

Hi Matthias,

This pretty much matches my memory. I think that the rules are a bit ad hoc and not followed to the letter everywhere. It would be good to codify something concrete.

I thought that I added some way of distinguishing between constant registers and other reserved registers but I can’t find it now. We do some register coalescing that is not consistent with your rules here: If a virtual register is defined as a copy of a constant register, we will replace the virtual register with the constant register. See RegisterCoalescer::canJoinPhys(). This can mean the the register is read multiple times. This optimization was added for the ARM64 zero register.

Thanks,
/jakob

Lately I have had a few discussions of what it means for a register to be unallocatable or reserved. As this comes up every now and again and I often struggled answering such questions I decided to write down some definite rules and codify the current usage and assumptions. I plan to put the rules below into the doxygen comments of MachineRegisterInfo etc. And I also hope that people will correct me if I am wrong or miss something here!

Thanks Matthias, much appreciated.

= Reserved Registers =

== Rules ==
1) The value read from a reserved register cannot be predicted. Reading a reserved register twice may each time produce a different result.
2) Writing to a reserved register may affect more than just the register.
3) Nonetheless reading/writing reserved registers imposes no constraints on reordering instructions.

I never thought of it this way. I think of reserved registers as having unbounded, continual liveness. Consequently, the register allocator can never “reuse” them. More precisely, they are live out of the current frame, so reserved register writes can still be reordered with nearby instructions that *don’t* access that register.

Dependencies still need to be tracked!

I think copies could be optimized, but reserved registers are implicitly live out of the frame. Reserved register writes should not move across calls. See http://reviews.llvm.org/D15667.

Indeed the current code respects ordering dependencies for reserved registers, so rule 3) above can just be removed and we treat reserved regsiters like any other for reordering.
We can handle them like any other register for calls as well then I guess? Moving through calls should be legal if the calls regmask shows they are preserved, or do you have an example where this would be bad?

- Matthias

Hi Matthias,

Thanks for doing this. Each time we talk about it, it takes us 10 min to rebuild those rules form our recollection, so definitely useful to write them down.

I am in agreement with what you wrote down.

I just think we need additional rules for the constant registers like Jakob mentioned:
- Their value is constant (i.e., copy propagation is fine, unlike regular reserved registers).
- In particular, writing to them does not change their value.

Cheers,
-Quentin

There is MachineRegisterInfo::isConstantPhysReg(), in the current implementation this just returns true if it cannot find any def operand for the register (or on of its aliases). I think we also write to zero registers at times and then this function would return false... For this to work reliably targets would need to provide the constant information explicitely.

For the "writing to them does not change their value": As long as we do not make any assumptions about the values in the register anyway (rule 1 below) knowing this fact doesn't help... Though knowing that we have a zero register would indeed allow us to do some copy propagation, coalescing and removing unnecessary data dependencies.

- Matthias

There is MachineRegisterInfo::isConstantPhysReg(), in the current implementation this just returns true if it cannot find any def operand for the register (or on of its aliases). I think we also write to zero registers at times and then this function would return false... For this to work reliably targets would need to provide the constant information explicitely.

For the "writing to them does not change their value": As long as we do not make any assumptions about the values in the register anyway (rule 1 below) knowing this fact doesn't help…

That’s the thing, with really constant register, rule one does not apply, right?
I.e., we could do dead code and such.

That’s funny, I thought like Jakob we add something different than the isConstantPhysReg thing, hmm…

I’m not sure if they should be marked call clobbered or not. I think that depends on how the register is used.

But either way, reserved register writes should never move across a call because the callee or runtime may read that register. It’s an out-of-band call argument.

Andy

This seems broken to me that treating another copy should be assumed to produce a different result. This seems like it should be optimized, and have a special volatile_copy instruction for the special cases where the reserved register may randomly change.

-Matt

Let's try this again after some longer offline discussions:

= Reserved Registers =
The primary use of reserved registers is to hold values required by runtime conventions. Typical examples are the stack pointer, frame pointer maybe TLS base address, GOT address ...
Zero registers and program counters are an odd special case for which we may be able to provide looser rules.

== Rules ==
1) Reserved registers are always life: They are live-in and live-out of functions. There are no dead-defs, they are still alive after being clobbered by a regmask.
2) It is not legal to add or remove definitions of a reserved register; This implies that we cannot replace it with a different register or temporarily spill/reload it.
3) Calls are considered uses of reserved registers. That means you cannot reorder a write to a reserved register over a call, even if there is no explicit use operand on the call
4) The value of the reserved register can only change for instructions with a Def operand or regmask clobbering the register. This rule is just for clarification, all registers behave like this. See [1] for a note on program counter/time stamp registers.

== Implications ==
- We skip Liveness analysis because we know a reserved register is live anyway.
- Register allocators cannot use a reserved registers: It is never free and therefore considered unallocatable.
- Scheduling has to consider the implicit use on calls
- No special considerations necessary for copy propagation
- Writes to a reserved register are not dead code, because the value is always live out!

== Examples ==
Assume r0 is a normal register r1 is a reserved register:

- We can remove the 2nd COPY here:
  r0 = COPY r1
      ... use v0
  r0 = COPY r1
      ... use v0
- We can remove this COPY because r0 is unused:
    r0 = COPY r1
    return
- We cannot remove this COPY because r1 is live-out:
    r1 = COPY r0
    return
- We cannot reorder the add before the call. The call reads r1 so it has an anti dependency on the add.
    call foobar
    r1 = add r1, 10

== [1] Constant Registers ==
The rules above are designed for the case of normal registers which are reserved for runtime conventions. We also have the case of zero register. We have the concept of a constant register for them which allows us to ignore any reordering constraints and assume all uses read the same value.

We should even be able to fit the program counter into the class of constant registers: The only practical use of reading the program counter is to find relative positions in position independent code (PIC), it is always used in combination with a relocation, which is adjusted to the actual position of the instruction. The value after adding this relocation is constant in the function!

= Unallocatable Registers =

A reserved register is not allocatable, however there are also registers which are unallocatable just because they are explicitely excluded from allocation orders, they are not reserved registers. This can be confusing so I added this section talking about those!

== Rules ==
They behave like normal registers, the only difference is that:
1) The register allocator will never assign an unallocatable register to a virtual register.

== Motivation ==
Typical examples of unallocatable but no reserved registers are:
- A CPUs flag register: The scheduler has to respect the order! We are interested in liveness, but we do not necessarily want to spill/reload them or perform register allocation on a single register.
- X87 floating point stack registers need special handling on top of the generic register allocation

== Impliciations ==
Except for the register allocator not using them they behave like normal registers:
- We track the liveness of unallocatable registers
- The scheduler respects data/output dependencies for unallocatable registers

== Examples ==
Assume r0 is a normal register, r1 is an unallocatable register (but not a reserved one):

- We can remove the 2nd COPY here:
  r0 = COPY r1
      ... use v0
  r0 = COPY r1
      ... use v0

- We can remove remove the following two COPYs because the r0/r1 are not used afterwards:
  r0 = COPY r1
  r1 = COPY r0
  return

- We can replace r1 with a different (normal register) here (provided we replace all following uses)
r1 = ...
     // ...
     = use r1

Let’s try this again after some longer offline discussions:

= Reserved Registers =
The primary use of reserved registers is to hold values required by runtime conventions. Typical examples are the stack pointer, frame pointer maybe TLS base address, GOT address …
Zero registers and program counters are an odd special case for which we may be able to provide looser rules.

== Rules ==

  1. Reserved registers are always life: They are live-in and live-out of functions. There are no dead-defs, they are still alive after being clobbered by a regmask.
  2. It is not legal to add or remove definitions of a reserved register; This implies that we cannot replace it with a different register or temporarily spill/reload it.

Zero registers don’t follow that rule. AArch64 has a pass that set XZR for unused results and this is fine.
I.e., we need to add a note like you did for #4.

  1. Calls are considered uses of reserved registers. That means you cannot reorder a write to a reserved register over a call, even if there is no explicit use operand on the call
  2. The value of the reserved register can only change for instructions with a Def operand or regmask clobbering the register. This rule is just for clarification, all registers behave like this. See [1] for a note on program counter/time stamp registers.

Hmm, I don’t see how pc can fit this rule.

== Implications ==

  • We skip Liveness analysis because we know a reserved register is live anyway.
  • Register allocators cannot use a reserved registers: It is never free and therefore considered unallocatable.
  • Scheduling has to consider the implicit use on calls
  • No special considerations necessary for copy propagation

Ditto for pc.

  • Writes to a reserved register are not dead code, because the value is always live out!

== Examples ==
Assume r0 is a normal register r1 is a reserved register:

  • We can remove the 2nd COPY here:
    r0 = COPY r1
    … use v0
    r0 = COPY r1
    … use v0

If r1 is pc, r0 is a different value now.

  • We can remove this COPY because r0 is unused:
    r0 = COPY r1
    return
  • We cannot remove this COPY because r1 is live-out:
    r1 = COPY r0
    return
  • We cannot reorder the add before the call. The call reads r1 so it has an anti dependency on the add.
    call foobar
    r1 = add r1, 10

== [1] Constant Registers ==
The rules above are designed for the case of normal registers which are reserved for runtime conventions. We also have the case of zero register. We have the concept of a constant register for them which allows us to ignore any reordering constraints and assume all uses read the same value.

We should even be able to fit the program counter into the class of constant registers: The only practical use of reading the program counter is to find relative positions in position independent code (PIC), it is always used in combination with a relocation, which is adjusted to the actual position of the instruction. The value after adding this relocation is constant in the function!

That sounds like a far stretch to me. How pc can be considered constant?
To summarize my thoughts, I believe reserved registers were introduced to fill the gap of want we don’t model. E.g., for pc for instance, each instruction should implicitly define it, then the actual use are predictable. Since we don’t do that, we need to conservatively assume that the value of a reserved register is unknown and that rule #4 is not true.

Cheers,
-Quentin

Maybe we need to extend the model then? From the point of view of register allocation, none of these registers are allocatable, so nothing would change, but from the point of view of scheduling, for example, certain moves involving reserved registers are legal, while others are not.

For example:
1. Status (flags) register, such as the one in x86: it's a special register, but it cannot be modified except by an instruction that is known to alter it. (I know there was no direct copy from flags to a register on x86, but I can't think of a better example.) Dependencies on this register need to be respected.
   a. Special cases: Hexagon has USR (user status register), where one of the bits indicates "overflow". This bit is sticky and can only go from 0 to 1, except where it's cleared explicitly. Stores to this bit of the register can be reordered (except the "clear bit" case).[1]

2. Timer/cycle count registers, PC, etc.: these registers are "volatile" in the sense that they are modified in a way that does not depend on the semantics of the executed code in a predictable or controllable manner. Dependencies on such registers can be ignored, and two subsequent reads of their value are not guaranteed to be equal.
   a. Special case: one could imagine a special register, the reading of which in itself can have side-effects. In such case the reads should not be elinminated or duplicated.

[1] In our local repository we have a hook in the subtarget info, which is called with the DAG as its argument after its construction, and where we remove the write-write dependencies on the overflow bit (which we model as a subregister of USR). This is really important for us for performance reasons, and it would be great to have an "official" way of handling such cases.

-Krzysztof

Hi Krzysztof,

To summarize my thoughts, I believe reserved registers were introduced
to fill the gap of want we don’t model. E.g., for pc for instance, each
instruction should implicitly define it, then the actual use are
predictable. Since we don’t do that, we need to conservatively assume
that the value of a reserved register is unknown and that rule #4 is not
true.

Maybe we need to extend the model then?

This was our conclusion with Matthias from our latest offline discussion.
I believe he will send an updated version to reflect that.

Cheers,
-Quentin