I’m looking at RegBankSelect’s partially implemented support for deciding to split a value between multiple registers and I’m wondering if it’s actually intended to solve the problem I’m trying to use it for. RegisterBankInfo.h has this example mapping table:
/// E.g.,
/// Let say we have a 32-bit add and a <2 x 32-bit> vadd. We
/// can expand the
/// <2 x 32-bit> add into 2 x 32-bit add.
///
/// Currently the TableGen-like file would look like:
/// \code
/// PartialMapping = {
/// /*32-bit add*/ {0, 32, GPR},
/// /*2x32-bit add*/ {0, 32, GPR}, {0, 32, GPR}, // <-- Same entry 3x
/// /*<2x32-bit> vadd {0, 64, VPR}
/// }; // PartialMapping duplicated.
///
/// ValueMapping {
/// /*plain 32-bit add*/ {&PartialMapping[0], 1},
/// /*expanded vadd on 2xadd*/ {&PartialMapping[1], 2},
/// /*plain <2x32-bit> vadd*/ {&PartialMapping[3], 1}
/// };
This looks almost like the problem I want to solve for AMDGPU. There are 2 main register banks. On the SALU, some 64-bit operation are available which can only be 32-bit on the VALU. For example, if all of the input operands aren’t in the scalar bank, a 64-bit and needs to be split into 2 32-bit ands. It’s illegal to copy from the vector to the scalar bank, since these don’t mean what vector and scalar mean on other targets.
The current code seems very operand centric and computes costs only based on copies. Decomposing the operation into 2 pieces requires rewriting the entire instruction, not just copying from one offending operand. Is this intended to handle this kind of case, or do I need to introduce a separate register bank aware legalizer pass?
Your use case falls definitely in what RegBankSelect meant to solve.
That said, the support you need is not implemented because we didn't
have use cases to test the code against.
Regarding the cost, if the mapping produces more than 1 partial value,
right now RegBankSelect::getRepairCost will say this is too expensive
and this is actually where you need to patch the pass to add a target
hook to compute something that would use instruction to decompose the
value.
Hi,
I’m looking at RegBankSelect’s partially implemented support for deciding to split a value between multiple registers and I’m wondering if it’s actually intended to solve the problem I’m trying to use it for. RegisterBankInfo.h has this example mapping table:
/// E.g.,
/// Let say we have a 32-bit add and a <2 x 32-bit> vadd. We
/// can expand the
/// <2 x 32-bit> add into 2 x 32-bit add.
///
/// Currently the TableGen-like file would look like:
/// \code
/// PartialMapping = {
/// /*32-bit add*/ {0, 32, GPR},
/// /*2x32-bit add*/ {0, 32, GPR}, {0, 32, GPR}, // <-- Same entry 3x
/// /*<2x32-bit> vadd {0, 64, VPR}
/// }; // PartialMapping duplicated.
///
/// ValueMapping {
/// /*plain 32-bit add*/ {&PartialMapping[0], 1},
/// /*expanded vadd on 2xadd*/ {&PartialMapping[1], 2},
/// /*plain <2x32-bit> vadd*/ {&PartialMapping[3], 1}
/// };
This looks almost like the problem I want to solve for AMDGPU. There are 2 main register banks. On the SALU, some 64-bit operation are available which can only be 32-bit on the VALU. For example, if all of the input operands aren’t in the scalar bank, a 64-bit and needs to be split into 2 32-bit ands. It’s illegal to copy from the vector to the scalar bank, since these don’t mean what vector and scalar mean on other targets.
The current code seems very operand centric and computes costs only based on copies. Decomposing the operation into 2 pieces requires rewriting the entire instruction,
So the copy part cost I covered it. For the cost of rewriting the
instruction completely, this is captured by
InstructionMapping::getCost.
The idea of InstructionMapping::getCost is to reflect the cost for
transforming the current instruction into the instruction after we
apply this mapping. Then the RepairCost is here to account for the
cost of "bringing" every operand to the right place for this mapping
using copy or some target specific sequence.
Like the cost computation, the target specific sequences are not
implemented, but should happen in RegBankSelect::repairReg. Right now,
this will assert that the number of break downs should be == 1 but the
code to decompose the operand should happen there.
Finally, the rewriting of the current instruction is supposed to
happen in RegisterBankInfo::applyMapping.
If you have an example (.mir) that you can share, we can work together
to make this happen.
Your use case falls definitely in what RegBankSelect meant to solve.
That said, the support you need is not implemented because we didn’t
have use cases to test the code against.
Regarding the cost, if the mapping produces more than 1 partial value,
right now RegBankSelect::getRepairCost will say this is too expensive
and this is actually where you need to patch the pass to add a target
hook to compute something that would use instruction to decompose the
value.
Yes, this is what happens with greedy. With fast I get a little further.
So the copy part cost I covered it. For the cost of rewriting the
instruction completely, this is captured by
InstructionMapping::getCost.
The idea of InstructionMapping::getCost is to reflect the cost for
transforming the current instruction into the instruction after we
apply this mapping. Then the RepairCost is here to account for the
cost of “bringing” every operand to the right place for this mapping
using copy or some target specific sequence.
Like the cost computation, the target specific sequences are not
implemented, but should happen in RegBankSelect::repairReg.
This seems to contradict the comment on repairReg?
/// \note The caller is supposed to do the rewriting of op if need be.
/// I.e., Reg = op … => = NewOp …
Right now,
this will assert that the number of break downs should be == 1 but the
code to decompose the operand should happen there.
Finally, the rewriting of the current instruction is supposed to
happen in RegisterBankInfo::applyMapping.
If you have an example (.mir) that you can share, we can work together
to make this happen.
Cheers,
-Quentin
The simplest case is this, where there’s only one register bank involved. The cost of the unmerge and merge should be 0, there’s only a real cost from the fact that it is now 2 operations.
Your use case falls definitely in what RegBankSelect meant to solve.
That said, the support you need is not implemented because we didn’t
have use cases to test the code against.
Regarding the cost, if the mapping produces more than 1 partial value,
right now RegBankSelect::getRepairCost will say this is too expensive
and this is actually where you need to patch the pass to add a target
hook to compute something that would use instruction to decompose the
value.
Yes, this is what happens with greedy. With fast I get a little further.
So the copy part cost I covered it. For the cost of rewriting the
instruction completely, this is captured by
InstructionMapping::getCost.
The idea of InstructionMapping::getCost is to reflect the cost for
transforming the current instruction into the instruction after we
apply this mapping. Then the RepairCost is here to account for the
cost of “bringing” every operand to the right place for this mapping
using copy or some target specific sequence.
Like the cost computation, the target specific sequences are not
implemented, but should happen in RegBankSelect::repairReg.
This seems to contradict the comment on repairReg?
/// \note The caller is supposed to do the rewriting of op if need be.
/// I.e., Reg = op … => = NewOp …
I see the confusion. No this is not contradicting. What this means is:
repairReg is responsible for materializing as in:
Reg = copy/decompose/etc
Or = copy/decompose/etc Reg
the caller is responsible for rewriting Reg into in the operation that is being rewritten.
Does it make sense?
Right now,
this will assert that the number of break downs should be == 1 but the
code to decompose the operand should happen there.
Finally, the rewriting of the current instruction is supposed to
happen in RegisterBankInfo::applyMapping.
If you have an example (.mir) that you can share, we can work together
to make this happen.
Cheers,
-Quentin
The simplest case is this, where there’s only one register bank involved. The cost of the unmerge and merge should be 0, there’s only a real cost from the fact that it is now 2 operations.
name: and_i64_vv
legalized: true
body: |
bb.0:
; Should turn into something like this, although the merge_values and unmerge_values can be optimized out
; %0:vgpr(s64) = COPY $vgpr0_vgpr1
; %1:vgpr(s64) = COPY $vgpr2_vgpr3
; %2:vgpr(s32), %3:vgpr(s32) = G_UNMERGE_VALUES %0
; %4:vgpr(s32), %5:vgpr(s32) = G_UNMERGE_VALUES %1
; %6:vgpr(s32) = G_AND %2, %3
; %7:vgpr(s32) = G_AND %4, %5
; %8:vgpr(s64) = G_MERGE_VALUES %6, %7
Part of my confusion about the operand focus is the use of RepairPts. In this case the inputs %0 and %1 have been trivially assigned already, but I kind of expected those to be present as something to handle here if that makes sense.
The repair points identifies the insertion points where repairing code will be inserted. Given it is operand based, you don’t need any for the ones that are trivially assigned already.