Instruction Enumeration Explosion and Operation To Instruction Mapping

I tried to write some comprehensive text about this topic but it got too big and cryptic. I’m trying to make it short and give an example, which is a little bit ARM-ish.

Simply said, I want to generate instructions for the operation ISD::ADDC: add two values and generate a carry-bit. For my target I can use the standard ADD instruction and set a flag to update the flags register (including the carry flag); this is similar to ARM.

Currently, the instructions for the operation ISD::ADDC are generated in the table-gen file as they are generated for ISD::ADD, but these instructions are basically copies of the real ADD instructions, take up space in the matcher table and instruction name buffer. And I have to handle these instructions in later transformations.

In the ARM back-end the hasPostISelHook hook is used to replace the not-existent ADDS instruction (that matches the ISD::ADDC operation) by a real ADD instruction with the s-bit set. But all ADDS instructions have to be enumerated, which is no problem on RISC architectures, because there are only two instructions depending on whether the last operand is a register or immediate.

Well, I do have a CISC architecture and therefore I have around 100 scalar ADDC and around 300 vector ADDC instructions caused by the operands’s addressing modes capabilities. Enumerating them all? Nope.

This instruction enumeration explosion is a recurring problem that I solved in the past with attributes in instructions. But not this case.

After a while I have the impression that I don’t want to select an instruction for operation ADDC (and similar for [U|S]ADDSAT), but for ADD and set some registers after a successful match. For an ARM this could roughly look like:

void TargetDAGToDAGISel::Select(SDNode *N) {
  switch (N->getOpcode()) {
  case ISD::ADDC:
    Node->setOpcode(ISD::ADD);            // method not supported
    MachineSDNode *MN = SelectCode(Node); // no return value
    if (MN) {
      MN->getOperand(ccOUTIdx)->setReg(ARM::CPSR);
      return;
    } else {
      // reset
      Node->setOpcode(ISD::ADDC);
    }
    break;
  }
  // std isel
  SelectCode(N);
}

Obviously this is not correct and requires changes to code generation interface.
But I can see the following benefits:

  1. save memory by not storing not-existent instructions in matcher-table, etc
  2. save compile-time by avoiding transformations
  3. closer to hardware

Is it possible to achieve this right now?