What are the uses of `SDTypeProfile`, `SDNode`, `Operand`, `PatLeaf`, and `ComplexPattern` in adding new target for LLVM backend?

Hi I’m learning LLVM backend by following this case. But when I read the .td files, there are lots of new objects that I do not know what they do.

I search through the internet and cannot find any illustration of them.

Thus I would like to ask here, please help!!!

In the file of chapters/Chapter2/Cpu0/Cpu0InstrInfo.td, it defines lots of instance using the class of SDTypeProfile, SDNode, Operand, PatLeaf, and ComplexPattern as follow:

//===----------------------------------------------------------------------===//
// Cpu0 profiles and nodes
//===----------------------------------------------------------------------===//

def SDT_Cpu0Ret          : SDTypeProfile<0, 1, [SDTCisInt<0>]>;

// Return
def Cpu0Ret : SDNode<"Cpu0ISD::Ret", SDTNone,
                     [SDNPHasChain, SDNPOptInGlue, SDNPVariadic]>;

//===----------------------------------------------------------------------===//
// Instruction format superclass
//===----------------------------------------------------------------------===//

include "Cpu0InstrFormats.td"

//===----------------------------------------------------------------------===//
// Cpu0 Operand, Complex Patterns and Transformations Definitions.
//===----------------------------------------------------------------------===//
// Instruction operand types

// Signed Operand
def simm16      : Operand<i32> {
  let DecoderMethod= "DecodeSimm16";
}

// Address operand
def mem : Operand<iPTR> {
  let PrintMethod = "printMemOperand";
  let MIOperandInfo = (ops GPROut, simm16);
  let EncoderMethod = "getMemEncoding";
}

// Node immediate fits as 16-bit sign extended on target immediate.
// e.g. addi, andi
def immSExt16  : PatLeaf<(imm), [{ return isInt<16>(N->getSExtValue()); }]>;

// Cpu0 Address Mode! SDNode frameindex could possibily be a match
// since load and store instructions from stack used it.
def addr : 
  ComplexPattern<iPTR, 2, "SelectAddr", [frameindex], [SDNPWantParent]>;

//===----------------------------------------------------------------------===//
// Pattern fragment for load/store
//===----------------------------------------------------------------------===//

class AlignedLoad<PatFrag Node> :
  PatFrag<(ops node:$ptr), (Node node:$ptr), [{
  LoadSDNode *LD = cast<LoadSDNode>(N);
  return LD->getMemoryVT().getSizeInBits()/8 <= LD->getAlignment();
}]>;

class AlignedStore<PatFrag Node> :
  PatFrag<(ops node:$val, node:$ptr), (Node node:$val, node:$ptr), [{
  StoreSDNode *SD = cast<StoreSDNode>(N);
  return SD->getMemoryVT().getSizeInBits()/8 <= SD->getAlignment();
}]>;

// Load/Store PatFrags.
def load_a          : AlignedLoad<load>;
def store_a         : AlignedStore<store>;

My question is simple and stright:

  1. Why define them? Are these definitions necessary?
  2. Where they will be used relate to the instruction??

Thanks a lot!!!

1 Like

First of all, TableGen is used to generate many different things, so you can see many definitions used in .td files that are not necessarily closely related to one another. In this case most of the things you mentioned are a part of the instruction selection process though, partially except Operand.

There are two principal methods for instruction selection: GlobalISel, and Selection DAG. GlobalISel is newer, and I don’t have a lot of experience with it, but the types you listed apply to the DAG method.

The way that the “SDAG” instruction selection process works is that each basic block in a function is translated into a graph, where each node corresponds to an operation. Edges in the graph represent connections between outputs of some nodes and inputs of other nodes. Each node can have multiple inputs and multiple outputs. Identical nodes are represented by the same node, so in “add x, x”, both operands of the “add” will be the same node (instead of two identical copies).

The SDNode type corresponds to a node in the DAG. In general there is no 1-1 correspondence between LLVM IR instructions and the selection DAG nodes. All that is guaranteed is that the basic block in LLVM and the corresponding DAG are functionally equivalent. There is a predefined set of SDNodes for common operations, but each target can implement their own SDNodes, this is what the SDNode definitions in .td files are for.

In order to define a new DAG node you need to specify types of the input and output values. You do that using SDTypeProfile: it basically lists the types for each input and output.

The DAG undergoes a series of transformations, and in the end it is translated into machine instructions. The way that works is that you define which patterns in the DAG should be replaced with what instructions. A lot of the transformations of the DAG I mentioned are done with the goal of ending up with a DAG which only has patterns that can be replaced with machine instructions. A selection pattern is roughly something like this (add (mul fp:$x, fp:$y), fp:$z), which then can be associated with fma fp:$x, fp:$y, fp:$z. There are two ways this association takes place: either by supplying the DAG pattern in instruction definition, or by creating a Pat class which contains input DAG pattern, and the corresponding output that uses machine instruction(s).

Since some (sub)patterns can occur more frequently (for example with different value types), you can create something like a macro that represents that (sub)pattern via PatFrag (pattern fragment). PatLeaf is just a fragment that does not have any inputs, i.e. is a leaf.

The idea is that you can translate the entire DAG into machine instructions doing only pattern matching and replacing the matched patterns with their corresponding machine instructions. In some cases this may not be possible though, and this is where ComplexPattern can be used—it simply delegates the matching to a custom C++ function. If you want to use complex patterns, you have to write the C++ function that will do the matching.

Look in the existing backends to see how these things are used, it will give you a better idea of how it all connects together. Also look in llvm/include/llvm/Target/Target.td, llvm/include/llvm/Target/TargetSelectionDAG.td, plus llvm/lib/Target/SomeTarget/SomeTargetInstrInfo.td, etc.

2 Likes

Thank you so much for answering!!!

I’ve read some more doc, and based on my own guess I may summarize the following points. I don’t know if they are right or not so please correct me if you found any mistakes.

  1. To set an instruction, we need 3 set of DAGs, one forFor InOperandList, OutOperandList and another for pattern

  2. InOperandList and OutOperandList usually has the format of (ins, entry1, entry2,...) and (outs entry1, entry2,...)

  3. pattern: usually has the format of (operator operand1, operand2,...).

  4. operator: any subclass record of SDPatternOperator, Note that include SDNODE and PatFrags. The difference between them are SDNODE need a SDTypeProfile, it only rules out the operand and results number, while PatFrags allows to provide a list of DAGs for it to match.

  5. An operand of DAG usually have the format of value:$name

  6. A value can be:

    1. Defined record of DAGOperand’s subclass. in this case, the name will give value to the fields of record without a concrete value. As illustrated in " Instruction Operand Mapping" section of the document.
    2. Another DAG, for example: (ins (MEMrr $rs1, $rs2):$addr)
    3. Defined record of ComplexPattern’s subclass, in this case will require a handwritten function to match the instruction selection.

With the above in mind, I’ll try to explain what is happening in the file:

def SDT_Cpu0Ret          : SDTypeProfile<0, 1, [SDTCisInt<0>]>;

// Return
def Cpu0Ret : SDNode<"Cpu0ISD::Ret", SDTNone,
                     [SDNPHasChain, SDNPOptInGlue, SDNPVariadic]>;

The purpose is to define a DAG operator, which is Cpu0Ret, in order to define a matching pattern for return instruction.

The operand type SDT_Cpu0Ret I guess is supposed to be the type of Cpu0Ret. But for some reason it is not used.

// Signed Operand
def simm16      : Operand<i32> {
  let DecoderMethod= "DecodeSimm16";
}

// Address operand
def mem : Operand<iPTR> {
  let PrintMethod = "printMemOperand";
  let MIOperandInfo = (ops GPROut, simm16);
  let EncoderMethod = "getMemEncoding";
}

// Node immediate fits as 16-bit sign extended on target immediate.
// e.g. addi, andi
def immSExt16  : PatLeaf<(imm), [{ return isInt<16>(N->getSExtValue()); }]>;

// Cpu0 Address Mode! SDNode frameindex could possibily be a match
// since load and store instructions from stack used it.
def addr : 
  ComplexPattern<iPTR, 2, "SelectAddr", [frameindex], [SDNPWantParent]>;

simm16 is defined to use as operand of InOperandList, it is defined as Operand I guess is because it would value a field of record without a concrete value (illustrated here)

immSExt16 is a DAG without operand, used in pattern of instruction select as a DAG operand. It shares the same name with simm16 and the reason of not using simm16 is that it added a constraint that the node has to be a 16bit int.

Same as mem and addr, mem is defined as Operand for valuing unvalued field. addr is actually a DAG without any operand, the reason of not using mem is the need of adding complex matching method, which is named SelectAddr. They also share the same name.

class AlignedLoad<PatFrag Node> :
  PatFrag<(ops node:$ptr), (Node node:$ptr), [{
  LoadSDNode *LD = cast<LoadSDNode>(N);
  return LD->getMemoryVT().getSizeInBits()/8 <= LD->getAlignment();
}]>;

class AlignedStore<PatFrag Node> :
  PatFrag<(ops node:$val, node:$ptr), (Node node:$val, node:$ptr), [{
  StoreSDNode *SD = cast<StoreSDNode>(N);
  return SD->getMemoryVT().getSizeInBits()/8 <= SD->getAlignment();
}]>;

// Load/Store PatFrags.
def load_a          : AlignedLoad<load>;
def store_a         : AlignedStore<store>;

Here load_a and store_a is defined as a special case of SDNODE record load and store (defined in llvm/include/llvm/Target/TargetSelectionDAG.td). It simply added alignment check on the original load and store.

To add to this—a type of an operand is like a type of a variable, it states that this operand has this type. If you want to create that instruction, the operand has to have the right type. PatFrag/PatLeaf are somewhat different in the sense that they are used to check if the condition is true or false. The purpose of PatFrag/PatLeaf is to indicate whether given DAG fragment matches the condition from the definition of the Pat*, so both answers, yes/no, are legitimate.

Great job on the analysis! The selection DAG is not conceptually difficult, but there are a lot of quirks in the instruction selection process that you learn as you go. I usually just look at the sources if I’m not sure about something.

1 Like

Thanks so much for your reply!!!

The source code is kind of hard for me since I’m really not familiar with the source code of llvm, it will cost me lots of time to find where it goes.

But I do keep learning and try to debug with the source code.

One more question from me is that I found there is a ValueType element in class ComplexPattern. This makes me wonder that if class ComplexPattern also have a value (same as class Operand)?? But I cannot find any process of valuing the complex pattern node…

Complex pattern is a selection pattern where the matching and selection is done by a custom function. When you define a complex pattern, one of the arguments is the string with the name of a member function in your “ISelDAGToDAG.cpp”. You need to write that function, it will not be automatically generated.

The “value type” is the value type of the root of the matched/replaced part of the DAG.

1 Like