From: "Jonas Paulsson" <firstname.lastname@example.org>
To: "Matthias Braun" <email@example.com>, "Quentin Colombet" <firstname.lastname@example.org>, "Steve King"
Cc: "llvm-dev" <email@example.com>
Sent: Wednesday, September 30, 2015 4:15:25 AM
Subject: Re: [llvm-dev] TwoAddressInstructionPass::isProfitableToConv3Addr()
A similar setting occurs with ARM Thumb code which for many
instructions has a short 2-address encoding and a longer 3 address
form. As far as I know this is done by selecting the 3 address form
and rewriting them to 2-address after register allocation where
possible. See lib/Target/ARM/Thumb2SizeReduction.cpp.
The late oportunistic conversion is simple, but can only work in
cases where regalloc happens to put the new definition in the same
register as one of the source operands. In case regalloc would try
to do this as much as possible, this might work most of the time,
however I have no idea if this is the case. Some targets may want
round-robin allocation, while others would prefer reuse of
registers. Steve says he gets most of it handled, how about ARM
Thumb? Is this with RAGreedy?
I have cases of instruction pairs, where one is cheaper 2-address,
and the other 3-address. I would like to select the 2-addr
instruction during isel, but use the 3-addr instruction to avoid a
copy if possible. I find that
TwoAddressInstructionPass::isProfitableToConv3Addr() is only
for the case of a physreg copy, and so leaves the majority of cases
as they are (2-address).
I would like to say "If 3-addr version would avoid a copy, use it!".
Does anyone else have a similar situation? I think this is what it
is supposed to do right now :). Though I reckon the test is probably
over conservative in the sense that it returns true only if it can
prove this is going to save a copy.
Yes, it is very much overconservative because it only checks the
cases of phys-reg copies around calls and returns, meaning that all
other cases are never transformed. I believe it should ask target to
convert to 3-address in all cases no source register is killed.
PPCVSXFMAMutate is indeed doing something towards the same goal. I
wonder if this pass could be removed/simplified if TwoAddress would
be aware of kill flags and eliminate more copys?
PPCVSXFMAMutate runs in between MI scheduling and register allocation. In this way, it only eliminates copies that actually remain after scheduling. TwoAddress runs prior to MI scheduling.
If I have:
c1 = c
x = a*b + c <tied>
y = d*e + c1 <tied>
I might mutate the first instruction so that I have:
x = a*(b <tied>) + c
y = d*e + c <tied>
to eliminate the copy. However, in order to hide latency, the schedule might prefer to flip the two FMA instructions, which it now cannot do if I've mutated first without re-introducing the copy I was trying to avoid.