anti-dependency breaking and mask/shift dependencies

On PowerPC (at least versions which predate the mfocrf instruction),
while there are multiple independent condition registers, the only way
to move those registers into a general-purpose register is to use mfcr,
which transfers all of the (concatenated) condition registers into one
general purpose register, followed by a mask/shift operation to extract
the desired pieces.

I would like to know if there is any way to model this which is
compatible with using anti-dependency breaking on the associated
condition-register class. I might be able to define a late-expanded
pseudo-instruction which represents both the mfcr and the mask/shift
operations, but then those two things would not be independently
schedulable.

Thanks in advance,
Hal

It's not exactly what you ask for, but you can set the hasExtraSrcRegAllocReq flag on the mfcr instruction. That will tell the anti-dep breaker to leave that instruction alone while still being able to break other anti-dependencies on condition code registers.

If that is not enough, you will very soon be able to inject a target-dependent pass between register allocation and virtual register rewriting. You can use the new LiveRegMatrix interface to change the virtual register assignments, for example to loosen anti-dependencies. I expect to have this ready within the next week.

/jakob

> On PowerPC (at least versions which predate the mfocrf instruction),
> while there are multiple independent condition registers, the only
> way to move those registers into a general-purpose register is to
> use mfcr, which transfers all of the (concatenated) condition
> registers into one general purpose register, followed by a
> mask/shift operation to extract the desired pieces.
>
> I would like to know if there is any way to model this which is
> compatible with using anti-dependency breaking on the associated
> condition-register class. I might be able to define a late-expanded
> pseudo-instruction which represents both the mfcr and the mask/shift
> operations, but then those two things would not be independently
> schedulable.

It's not exactly what you ask for, but you can set the
hasExtraSrcRegAllocReq flag on the mfcr instruction. That will tell
the anti-dep breaker to leave that instruction alone while still
being able to break other anti-dependencies on condition code
registers.

Interesting; I could set this attribute on a special form of the
comparison instruction, and that would be a work-around for now.

If that is not enough, you will very soon be able to inject a
target-dependent pass between register allocation and virtual
register rewriting. You can use the new LiveRegMatrix interface to
change the virtual register assignments, for example to loosen
anti-dependencies. I expect to have this ready within the next week.

Sounds good; let's talk about this when it is ready.

Also, I think the following might work well: If we add a special kind
of register dependency called a 'remembered' register. This is not a
real dependency meaning that that the instruction does not actually
read or write to the register, but it means that if the register
allocator (or anything else) swaps the referenced register for another
one (or a new virtual register), then the 'remembered' register needs to
be swapped as well. Using this I can create a late-expanded pseuso
which represents the necessary mask/shift operation. This operation has
real read/write dependencies on the GPRs being used, but also needs to
'remember' from which cr the input originally came. On the other
hand, this might create a bunch of dead-register-dependency
special cases in CodeGen which would not be worth the effort. What do
you think?

Thanks again,
Hal

I am not sure I follow completely, but it sounds like it would be quite fragile?

/jakob

> Also, I think the following might work well: If we add a special
> kind of register dependency called a 'remembered' register. This is
> not a real dependency meaning that that the instruction does not
> actually read or write to the register, but it means that if the
> register allocator (or anything else) swaps the referenced register
> for another one (or a new virtual register), then the 'remembered'
> register needs to be swapped as well. Using this I can create a
> late-expanded pseuso which represents the necessary mask/shift
> operation. This operation has real read/write dependencies on the
> GPRs being used, but also needs to 'remember' from which cr the
> input originally came. On the other hand, this might create a bunch
> of dead-register-dependency special cases in CodeGen which would
> not be worth the effort. What do you think?

I am not sure I follow completely, but it sounds like it would be
quite fragile?

I think you're right, I retract my suggestion.

Thanks again,
Hal

>
>
> > Also, I think the following might work well: If we add a special
> > kind of register dependency called a 'remembered' register. This
> > is not a real dependency meaning that that the instruction does
> > not actually read or write to the register, but it means that if
> > the register allocator (or anything else) swaps the referenced
> > register for another one (or a new virtual register), then the
> > 'remembered' register needs to be swapped as well. Using this I
> > can create a late-expanded pseuso which represents the necessary
> > mask/shift operation. This operation has real read/write
> > dependencies on the GPRs being used, but also needs to 'remember'
> > from which cr the input originally came. On the other hand, this
> > might create a bunch of dead-register-dependency special cases in
> > CodeGen which would not be worth the effort. What do you think?
>
> I am not sure I follow completely, but it sounds like it would be
> quite fragile?

I think you're right, I retract my suggestion.

I think that I changed my mind again :wink: -- Let me explain more
concretely:

The problem is a code sequence such as:
crX = compare gprA, gprB
grpC = move_all_crs_to_gpr
gprD = shift_and_mask_crX gprC

In the current implementation, we can form the shift_and_mask_crX (with
all of its immediate constants) at lowering time because we fix crX (to
cr7 specifically). It would be better to be able to let the register
allocator assign crX to any available condition register. The problem
is just that in order to correctly form shift_and_mask_crX, we need to
know which physical cr was chosen.

One possible "solution" is to make shift_and_mask_crX a pseudo
instruction:
gprD = shift_and_mask_pseudo crX, gprC

But this is less than optimal because in this sequence:
crX = compare gprA, gprB
grpC = move_all_crs_to_gpr
(*)
gprD = shift_and_mask_pseudo crX, gprC

at (*), which could be arbitrarily long, the physical register assigned
to crX would be held live by the register allocator (because the
shift_and_mask_pseudo depends on it). But during (*) we don't need the
value of the cr (the value has already been copied into gprC), we just
need to know the identity of the physical register assigned to crX.

What I had proposed above was to allow tagging the pseudo's dependence
on crX as only some kind of "remembered" dependence so that the live
interval of crX would actually end at the move_all_crs_to_gpr
instruction even though the shift_and_mask_pseudo would get to see the
correct physical register assignment.

Thanks again,
Hal