TwoAddressInstructionPass::isProfitableToConv3Addr()

Hi,

I have cases of instruction pairs, where one is cheaper 2-address, and the other 3-address. I would like to select the 2-addr instruction during isel, but use the 3-addr instruction to avoid a copy if possible. I find that TwoAddressInstructionPass::isProfitableToConv3Addr() is only checking
for the case of a physreg copy, and so leaves the majority of cases as they are (2-address).

I would like to say "If 3-addr version would avoid a copy, use it!". Does anyone else have a similar situation?

To do this, one would need to check the kill-flag on the tied use operand. If it is not killed, one can assume that the use and dst registers overlap, and therefore the copy is needed for the two-address form. The kill flags would however need to be recomputed by TwoAddr pass, since
LiveVariables clear them.

An alternative approach might be to have something like TII->handleMachineFunctionPostCoalescer() at the end of RegisterCoalescer.cpp::runOnMachineFunction(). There, one could look for instructions and query live intervals for overlap. This hook might also be useful for other things, since this is the point just before mi-sched/regalloc, where one could do things like estimate register pressure.

Any comments on this anyone?

/Jonas Paulsson

I have cases of instruction pairs, where one is cheaper 2-address, and the
other 3-address. I would like to select the 2-addr instruction during isel,
but use the 3-addr instruction to avoid a copy if possible.
I would like to say "If 3-addr version would avoid a copy, use it!". Does
anyone else have a similar situation?

Hi Jonas - Not what you asked for, but stick with 3-addr instructions,
then convert opportunistically to two-addr as a late pass. This
approach reduces complexity since you need no longer worry about
surrounding instructions to make the 3->2 conversion. In other words,
convert FOO A,B,B ---> FOO A,B where you find them. Worked great in
my target.

To do this, one would need to check the kill-flag on the tied use operand.
If it is not killed, one can assume that the use and dst registers overlap,
and therefore the copy is needed for the two-address form. The kill flags
would however need to be recomputed by TwoAddr pass, since
LiveVariables clear them.

All this is the complexity you can avoid.

HTH,
-steve

From: "Jonas Paulsson via llvm-dev" <llvm-dev@lists.llvm.org>
To: "llvm-dev" <llvm-dev@lists.llvm.org>
Sent: Tuesday, September 29, 2015 4:00:51 AM
Subject: [llvm-dev] TwoAddressInstructionPass::isProfitableToConv3Addr()

Hi,

I have cases of instruction pairs, where one is cheaper 2-address,
and
the other 3-address. I would like to select the 2-addr instruction
during isel, but use the 3-addr instruction to avoid a copy if
possible.
I find that TwoAddressInstructionPass::isProfitableToConv3Addr() is
only
checking
for the case of a physreg copy, and so leaves the majority of cases
as
they are (2-address).

I would like to say "If 3-addr version would avoid a copy, use it!".
Does anyone else have a similar situation?

I'm not sure how similar this is, but we have lib/Target/PowerPC/PPCVSXFMAMutate.cpp which changes the form of FMA instructions with tied operands in order to avoid copies. It might be sufficiently-similar to what you need to be useful.

-Hal

Hi Jonas,

Hi,

I have cases of instruction pairs, where one is cheaper 2-address, and the other 3-address. I would like to select the 2-addr instruction during isel, but use the 3-addr instruction to avoid a copy if possible. I find that TwoAddressInstructionPass::isProfitableToConv3Addr() is only checking
for the case of a physreg copy, and so leaves the majority of cases as they are (2-address).

I would like to say "If 3-addr version would avoid a copy, use it!". Does anyone else have a similar situation?

I think this is what it is supposed to do right now :). Though I reckon the test is probably over conservative in the sense that it returns true only if it can prove this is going to save a copy.

To do this, one would need to check the kill-flag on the tied use operand. If it is not killed, one can assume that the use and dst registers overlap, and therefore the copy is needed for the two-address form. The kill flags would however need to be recomputed by TwoAddr pass, since
LiveVariables clear them.

An alternative approach might be to have something like TII->handleMachineFunctionPostCoalescer() at the end of RegisterCoalescer.cpp::runOnMachineFunction(). There, one could look for instructions and query live intervals for overlap. This hook might also be useful for other things, since this is the point just before mi-sched/regalloc, where one could do things like estimate register pressure.

Any comments on this anyone?

We could try to fix the check in two-address pass first. I believe a hook like you describe might be useful but this is yet another thing to teach the coalescer, which is already complex enough IMO. Moreover, I like the separation of concerns that 2- and 3-addr conversions are made within a dedicated pass.
That being said, if getting the best code involves teaching the coalescer about this transformation, sure!

Cheers,
Q.

A similar setting occurs with ARM Thumb code which for many instructions has a short 2-address encoding and a longer 3 address form. As far as I know this is done by selecting the 3 address form and rewriting them to 2-address after register allocation where possible. See lib/Target/ARM/Thumb2SizeReduction.cpp.

- Matthias

/ Jonas

From: "Jonas Paulsson" <paulsson@linux.vnet.ibm.com>
To: "Matthias Braun" <mbraun@apple.com>, "Quentin Colombet" <qcolombet@apple.com>, "Steve King"
<steve@metrokings.com>, hfinkel@anl.gov
Cc: "llvm-dev" <llvm-dev@lists.llvm.org>
Sent: Wednesday, September 30, 2015 4:15:25 AM
Subject: Re: [llvm-dev] TwoAddressInstructionPass::isProfitableToConv3Addr()

A similar setting occurs with ARM Thumb code which for many
instructions has a short 2-address encoding and a longer 3 address
form. As far as I know this is done by selecting the 3 address form
and rewriting them to 2-address after register allocation where
possible. See lib/Target/ARM/Thumb2SizeReduction.cpp.

- Matthias

The late oportunistic conversion is simple, but can only work in
cases where regalloc happens to put the new definition in the same
register as one of the source operands. In case regalloc would try
to do this as much as possible, this might work most of the time,
however I have no idea if this is the case. Some targets may want
round-robin allocation, while others would prefer reuse of
registers. Steve says he gets most of it handled, how about ARM
Thumb? Is this with RAGreedy?

Hi Jonas,

Hi,

I have cases of instruction pairs, where one is cheaper 2-address,
and the other 3-address. I would like to select the 2-addr
instruction during isel, but use the 3-addr instruction to avoid a
copy if possible. I find that
TwoAddressInstructionPass::isProfitableToConv3Addr() is only
checking
for the case of a physreg copy, and so leaves the majority of cases
as they are (2-address).

I would like to say "If 3-addr version would avoid a copy, use it!".
Does anyone else have a similar situation? I think this is what it
is supposed to do right now :). Though I reckon the test is probably
over conservative in the sense that it returns true only if it can
prove this is going to save a copy.

Yes, it is very much overconservative because it only checks the
cases of phys-reg copies around calls and returns, meaning that all
other cases are never transformed. I believe it should ask target to
convert to 3-address in all cases no source register is killed.

PPCVSXFMAMutate is indeed doing something towards the same goal. I
wonder if this pass could be removed/simplified if TwoAddress would
be aware of kill flags and eliminate more copys?

PPCVSXFMAMutate runs in between MI scheduling and register allocation. In this way, it only eliminates copies that actually remain after scheduling. TwoAddress runs prior to MI scheduling.

If I have:

c1 = c
x = a*b + c <tied>
y = d*e + c1 <tied>

I might mutate the first instruction so that I have:

x = a*(b <tied>) + c
y = d*e + c <tied>

to eliminate the copy. However, in order to hide latency, the schedule might prefer to flip the two FMA instructions, which it now cannot do if I've mutated first without re-introducing the copy I was trying to avoid.

-Hal