Hi all! I posted some PRs that implement TableGen-driven MacroFusion framework, please feel free to send feedbacks to me!
- [MacroFusion] Support multiple predicators by wangpc-pp · Pull Request #72219 · llvm/llvm-project · GitHub
- [TableGen] Add a backend to generate MacroFusion predicators by wangpc-pp · Pull Request #72222 · llvm/llvm-project · GitHub
- [TableGen] Add MacroFusions to SchedMachineModel by wangpc-pp · Pull Request #72223 · llvm/llvm-project · GitHub
- [RISCV] Use TableGen-based macro fusion by wangpc-pp · Pull Request #72224 · llvm/llvm-project · GitHub
- [Sched] Auto add MacroFusion mutation if enabled by wangpc-pp · Pull Request #72227 · llvm/llvm-project · GitHub
What’s MacroFusion
Informally, MacroFusion
is a technique that multiple instructions (or uops) are fused into one in decoding phase before issuing to backend units. For example, lui+addi
can be fused to one uop load i32 immediate
for some RISC-V implementations. This technique is normal to increase backend thoughput for modern CPU designs.
This requires these instructions are glued together.
LLVM’s MacroFusion implmentation
Simply speaking, MacroFusion implmentation in LLVM is a DAG mutation that adds Artificial
dependency to fused instruction pairs so that they will not be separated by scheduler.
For targets that support MacroFusion, they need to implement a predicator with type:
using ShouldSchedulePredTy = std::function<bool(const TargetInstrInfo &TII,
const TargetSubtargetInfo &TSI,
const MachineInstr *FirstMI,
const MachineInstr &SecondMI)>;
And then create a MacroFusion mutation and add it to scheduler:
std::unique_ptr<ScheduleDAGMutation> llvm::createRISCVMacroFusionDAGMutation() {
return createMacroFusionDAGMutation(shouldScheduleAdjacent);
}
ScheduleDAGInstrs *
createMachineScheduler(MachineSchedContext *C) const override {
const RISCVSubtarget &ST = C->MF->getSubtarget<RISCVSubtarget>();
if (ST.hasMacroFusion()) {
ScheduleDAGMILive *DAG = createGenericSchedLive(C);
DAG->addMutation(createRISCVMacroFusionDAGMutation());
return DAG;
}
return nullptr;
}
TableGen-driven MacroFusion predicators generator
For current implementation, we need to write C++ code to test if the instruction pair can be fused. These code are usually tedious and not easy to maintain.
Another problem is that macro fusion is related to specific processor, but now we abstract macro fusions to different subtarget features for all targets. As far as I know, there may be up to 30+ instruction pairs that can be fused for some macroarchitecture (for example, , Xiangshan - Decode Unit, Chinese). If we define subtarget features for all of them, that would be a disaster. And there will be a lot of branches that are evalated to false
in predicator like below:
static bool shouldScheduleAdjacent(const TargetInstrInfo &TII,
const TargetSubtargetInfo &TSI,
const MachineInstr *FirstMI,
const MachineInstr &SecondMI) {
const RISCVSubtarget &ST = static_cast<const RISCVSubtarget &>(TSI);
// false if not supported.
if (ST.hasLUIADDIFusion() && isLUIADDI(FirstMI, SecondMI))
return true;
return false;
}
So I am trying to introduce a TableGen-based MacroFusion framework to simplify the code and I will explain how I implement it.
TableGen Part
The TableGen part is a MacroFusion
class which contains four predicates:
class MacroFusion<MCInstPredicate first, MCInstPredicate second,
list<MacroFusionPredicateBase> prolog = [],
list<MacroFusionPredicateBase> epilog = []> {
MCInstPredicate First = first;
MCInstPredicate Second = second;
list<MacroFusionPredicateBase> Prolog = prolog;
list<MacroFusionPredicateBase> Epilog = epilog;
}
first
and second
are predicates for FirstMI
and SecondMI
. They can be any predicate classes defined in TargetInstrPredicate.td
.
And prolog
and epilog
are predicates that will be added before/after predicates for first
and second
. They can be any MacroFusionPredicateBase
class. For example:
// Base class of MacroFusionPredicate, etc. The avaliable variables are:
// * const TargetInstrInfo &TII
// * const TargetSubtargetInfo &STI
// * const MachineRegisterInfo &MRI
// * const MachineInstr *FirstMI
// * const MachineInstr &SecondMI
class MacroFusionPredicateBase;
// MacroFusionPredicate with raw code predicate.
class MacroFusionPredicate<code pred> : MacroFusionPredicateBase {
code Predicate = pred;
}
// Binds firstOpIdx and secondOpIdx. The operand of `FirstMI` at position
// `firstOpIdx` should be the same as the operand of `SenondMI` at position
// `secondOpIdx`.
class BindReg<int firstOpIdx, int secondOpIdx> : MacroFusionPredicateBase {
int FirstOpIdx = firstOpIdx;
int SecondOpIdx = secondOpIdx;
}
// A predicate for wildcard. The generated code will be like:
// ```
// if (!FirstMI)
// return ReturnValue;
// ```
class WildcardPred<bit ret> : MacroFusionPredicateBase {
bit ReturnValue = ret;
}
def WildcardFalse : WildcardPred<0>;
def WildcardTrue : WildcardPred<1>;
// Indicates that the destination register of `FirstMI` should be have one
// use if it is an virtual register.
class OneUsePred : MacroFusionPredicateBase;
def OneUse : OneUsePred;
The generated predicator will be like:
bool isNAME(const TargetInstrInfo &TII,
const TargetSubtargetInfo &STI,
const MachineInstr *FirstMI,
const MachineInstr &SecondMI) {
auto &MRI = SecondMI.getMF()->getRegInfo();
/* Prolog */
/* Predicate for `FirstMI` */
/* Predicate for `SecondMI` */
/* Epilog */
return true;
}
A minimal example for RISC-V’s lui+addi
fusion can be described like:
def LUIADDI : MacroFusion<CheckOpcode<[LUI]>, CheckOpcode<[ADDI]>;
The generated predicator will be like:
bool isLUIADDI(
const TargetInstrInfo &TII,
const TargetSubtargetInfo &STI,
const MachineInstr *FirstMI,
const MachineInstr &SecondMI) {
auto &MRI = SecondMI.getMF()->getRegInfo();
auto matchFirst = [&](const MachineInstr* MI) {
return ( MI->getOpcode() == RISCV::LUI );
};
auto matchSecond = [&](const MachineInstr* MI) {
return ( MI->getOpcode() == RISCV::ADDI );
};
if(!matchFirst(FirstMI))
return false;
if(!matchSecond(&SecondMI))
return false;
return true;
}
(Please see below for the whole example)
SchedMachineModel Part
A list of MacroFusion
is added to SchedMachineModel
to indicate supported MacroFusions:
class SchedMachineModel {
...
// List of MacroFusion.
list<MacroFusion> MacroFusions = [];
}
And a new function which returns all supported MacroFusion predicators is added to TargetSubtargetInfo
:
/// Get the list of MacroFusion predicates.
virtual std::vector<MacroFusionPredTy> getMacroFusions() const { return {}; }
This virtual function will be overrided by TableGen-generated XXXGenSubtargetInfo
and returns predicator list by SchedModel’s ID:
std::vector<MacroFusionPredTy> RISCVGenSubtargetInfo::getMacroFusions() const {
switch(getSchedModel().getProcessorID()) {
case 2: // SiFive7Model
return {isLUIADDI};
}
return {};
}
So now we can create MacroFusion mutation like below (note that I have changed the createMacroFusionDAGMutation
to accept a list of predicator pointer):
ScheduleDAGMILive *DAG = createGenericSchedLive(C);
DAG->addMutation(createMacroFusionDAGMutation(ST.getMacroFusions()));
SubtargetInfo Part
For visibility of supported MacroFusion
s, C++ enums of all MacroFusion
s are generated and new hasMacroFusion
function is added to MCSubtargetInfo
to test if a MacroFusion
is supported. So, we don’t need to add subtarget features for macro fusions now.
#ifdef GET_MACRO_FUSION_ENUM
namespace llvm {
namespace RISCV {
enum {
LUIADDI = 0,
};
} // end namespace RISCV
} // end namespace llvm
// Test if we support `lui+addi` fusion.
ST.hasMacroFusion(RISCV::LUIADDI);
This is implemented by adding a Bitset
to MCSchedModel
:
const unsigned MaxMacroFusions = 256;
using MacroFusionBitset = Bitset<MaxMacroFusions>;
struct MCSchedModel {
...
const MacroFusionBitset *MacroFusionBits;
}
The hasMacroFusion
is simply implemented as querying the Bitset
:
bool hasMacroFusion(unsigned MacroFusion) const {
const MacroFusionBitset *MacroFusionBits =
CPUSchedModel->getMacroFusionBits();
return MacroFusionBits && MacroFusionBits->test(MacroFusion);
}
And of cource, the Bitset
is generated by TableGen (note that we can get supported MacroFusions from SchedMachineModel.MacroFusions
):
static const MacroFusionBitset XXXModelMacroFusionBits = {
RISCV::LUIADDI,
...
};
static const llvm::MCSchedModel XXXModel = {
...
&XXXModelMacroFusionBits,
};
Example
Let take RISC-V’s lui+addi
fusion as an example:
def LUIADDI: MacroFusion<CheckOpcode<[LUI]>,
CheckAll<[
CheckOpcode<[ADDI, ADDIW]>,
CheckAny<[
CheckIsVRegOperand<0>,
CheckSameRegOperand<0, 1>
]>
]>,
[WildcardTrue],
[OneUse, BindReg<0, 1>]>;
It generates code like:
bool isLUIADDI(
const TargetInstrInfo &TII,
const TargetSubtargetInfo &STI,
const MachineInstr *FirstMI,
const MachineInstr &SecondMI) {
auto &MRI = SecondMI.getMF()->getRegInfo();
auto matchFirst = [&](const MachineInstr* MI) {
return ( MI->getOpcode() == RISCV::LUI );
};
auto matchSecond = [&](const MachineInstr* MI) {
return (
(
MI->getOpcode() == RISCV::ADDI
|| MI->getOpcode() == RISCV::ADDIW
)
&& (
MI->getOperand(0).getReg().isVirtual()
|| MI->getOperand(0).getReg() == MI->getOperand(1).getReg()
)
);
};
if (!FirstMI)
return true;
if(!matchFirst(FirstMI))
return false;
if(!matchSecond(&SecondMI))
return false;
Register FirstDest = FirstMI->getOperand(0).getReg();
if (FirstDest.isVirtual() && !MRI.hasOneNonDBGUse(FirstDest))
return false;
if (!(FirstMI->getOperand(0).isReg() &&
SecondMI.getOperand(1).isReg() &&
FirstMI->getOperand(0).getReg() == SecondMI.getOperand(1).isReg()))
return false;
return true;
}
And if we add MacroFusions
to SchedMachineModel
:
def TestModel : SchedMachineModel {
let MicroOpBufferSize = 0;
let IssueWidth = 2;
let LoadLatency = 3;
let MispredictPenalty = 3;
let CompleteModel = 0;
let MacroFusions = [LUIADDI]; // Supported macro fusions
}
It generates:
static const MacroFusionBitset TestModelMacroFusionBits = {
RISCV::LUIADDI
};
static const llvm::MCSchedModel TestModel = {
2, // IssueWidth
0, // MicroOpBufferSize
MCSchedModel::DefaultLoopMicroOpBufferSize,
3, // LoadLatency
MCSchedModel::DefaultHighLatency,
3, // MispredictPenalty
false, // PostRAScheduler
false, // CompleteModel
false, // EnableIntervals
0, // Processor ID
TestModelProcResources,
TestModelSchedClasses,
10,
4733,
nullptr, // No Itinerary
nullptr, // No extra processor descriptor
&TestModelMacroFusionBits,
};
I have written this RFC in a hurry, so please feel free to discuss with me if there are any questions!