Incompatible type assertion from llvm-tblgen

I’m getting this incompatible type assertion when I run tblgen on my .td files:

llvm/include/llvm/Support/Casting.h:237: typename llvm::cast_retty<X, Y*>::ret_type llvm::cast(Y*) [with X = llvm::DefInit; Y = llvm::Init; typename llvm::cast_retty<X, Y*>::ret_type = llvm::DefInit*]: Assertion `isa(Val) && “cast() argument of incompatible type!”’ failed.

Looks like the incompatible types are DefInit and Init. The offending line is in this definition:

class LoadOpIdx< bits<7> op,
string instr_asm,
OperandInfo info,
InstrItinClass itin=II_LOAD1_RR >
//
// load: r1 = mem[r2 + (r3 << sizeof(operand) ]
//
: FR3< op,
(outs info.regClass:$r1),
(ins ADDR_SHLI:$addr), //<<-this line causes assert
instr_asm # "\t\t$r1, $addr, " # info.sizeStr,
[(set info.regClass:$r1, (load ADDR_SHLI:$addr))],
itin > {
}

The other related definitions are:

// This class provides load/store address format selection support
//
class Addr< int numArgs, string funcName, dag opInfo >
: Operand,
ComplexPattern< i64, numArgs, funcName, , [SDNPWantParent] > {
let MIOperandInfo = opInfo;
}

let PrintMethod = “printMemOperand” in {
def ADDR_RR : Addr< 2, “SelectAddrRegReg”,
(ops GPRC:$base, GPRC:$offsetreg) >;
def ADDR_RI : Addr< 2, “SelectAddrRegImm”,
(ops GPRC:$base, i64imm:$offsetimm) >;
def ADDR_SHLI : Addr< 2, “SelectAddrShlImm”,
(ops GPRC:$base, ( shl GPRC:$offsetreg, (i64 3))) >;
}

If I change the LoadOpIdx definition to:

class LoadOpIdx< bits<7> op,
string instr_asm,
OperandInfo info,
InstrItinClass itin=II_LOAD1_RR >
//
// load: r1 = mem[r2 + (r3 << sizeof(operand) ]
//
: FR3< op,
(outs info.regClass:$r1),
(ins ADDR_RR:$addr), //<<-this is fine
instr_asm # "\t\t$r1, $addr, " # info.sizeStr,
[(set info.regClass:$r1, (load ADDR_SHLI:$addr))],
itin > {
}

I’m not sure what the difference is between the ADDR_RR and ADDR_SHLI defs that’s causing this - any ideas? Is the DAG portion for ADDR_SHLI ok? ((ops GPRC:$base, ( shl GPRC:$offsetreg, (i64 3))))

Phil

You have a dag in the list of operands. That won't work.

-Krzysztof

But don’t the defs for ADDR_RR and ADDR_RI also contain dags?

def ADDR_RR : Addr< 2, “SelectAddrRegReg”,
(ops GPRC:$base, GPRC:$offsetreg) >;
def ADDR_RI : Addr< 2, “SelectAddrRegImm”,
(ops GPRC:$base, i64imm:$offsetimm) >;

Do I need to create some other intermediate node type for a shifted address?

Phil

Technically yes, but the list of allowed types is limited. "RegisterClass" (e.g GPRC) is allowed, as is "Operand" (e.g. i64imm).

You can create a subclass of "Operand" and provide your EncoderMethod for it.

-Krzysztof

But don't the defs for ADDR_RR and ADDR_RI also contain dags?

  def ADDR_RR : Addr< 2, "SelectAddrRegReg",
                      (ops GPRC:$base, GPRC:$offsetreg) >;
  def ADDR_RI : Addr< 2, "SelectAddrRegImm",
                      (ops GPRC:$base, i64imm:$offsetimm) >;

Do I need to create some other intermediate node type for a shifted
address?

Technically yes, but the list of allowed types is limited. "RegisterClass"
(e.g GPRC) is allowed, as is "Operand" (e.g. i64imm).

You can create a subclass of "Operand" and provide your EncoderMethod for
it.

Followup question:

I was thinking that in order to match this DAG:

         0x30d29c0: i64 = Constant<3>

        0x30d2e00: i64 = shl 0x30d2be0, 0x30d29c0 [ORD=3]

      0x30d2f10: i64 = add 0x30d2cf0, 0x30d2e00 [ORD=3]

      0x30d28b0: <multiple use>
    0x30d3570: i32,ch = load 0x30a3ec0, 0x30d2f10,
0x30d28b0<LD4[%arrayidx16(addrspace=4)](align=8)(tbaa=<0x3082188>)> [ORD=4]

And map it to a load.idx instruction with the following semantics:
load.idx r1,r2,r3,SIZE r1 <- mem[r2 + (r3 << sizeof(operand))]

That somehow the pattern matching dag fragment would need to be something
like I had in ADDR_SHLI definition:
  def ADDR_SHLI : Addr< 2, "SelectAddrShlImm",
                      (ops GPRC:$base, ( shl GPRC:$offsetreg, (i64 3))) >;

Now If I have to create a subclass of Operand and define it's EncoderMethod
in C++, does that mean the pattern matching (matching the shift left and
add) now happens on the C++ side as well?

Phil

Actually the EncoderMethod is probably not needed for this case. I thought you had a scaled immediate that needs to be shifted during encoding.

Regarding pattern matching in the C++ code---yes, that's what ComplexPattern implies. The function whose name you provide will be used to match the DAG and generate the output.

The instruction definition would have 3 inputs: register, register and immediate:

def ADDR_SHLI: Addr<3, "SelectAddrShlImm",
     (ops GPRC:$base, GPRC:$offsetreg, i64imm:$shift)>;

The matching function would then need to match DAG and generate these 3 values if the match succeeded.

-Krzysztof

And map it to a load.idx instruction with the following semantics:
load.idx r1,r2,r3,SIZE r1 <- mem[r2 + (r3 << sizeof(operand))]

That somehow the pattern matching dag fragment would need to be
something like I had in ADDR_SHLI definition:
  def ADDR_SHLI : Addr< 2, "SelectAddrShlImm",
                      (ops GPRC:$base, ( shl GPRC:$offsetreg, (i64 3))) >;

Now If I have to create a subclass of Operand and define it's
EncoderMethod in C++, does that mean the pattern matching (matching the
shift left and add) now happens on the C++ side as well?

Actually the EncoderMethod is probably not needed for this case. I thought
you had a scaled immediate that needs to be shifted during encoding.

Regarding pattern matching in the C++ code---yes, that's what
ComplexPattern implies. The function whose name you provide will be used to
match the DAG and generate the output.

The instruction definition would have 3 inputs: register, register and
immediate:

def ADDR_SHLI: Addr<3, "SelectAddrShlImm",
    (ops GPRC:$base, GPRC:$offsetreg, i64imm:$shift)>;

The matching function would then need to match DAG and generate these 3
values if the match succeeded.

What I'm finding is that the matching function for 3 arguments (the
SelectAddrShlImm) is never tried for this particular case. This one is
tried:

   def ADDR_RR : Addr< 2, "SelectAddrRegReg",
                      (ops GPRC:$base, GPRC:$offsetreg) >;

I've added some debugging statements to show some of the output from llc
with -debug below:

          0x2385bb0: i64,ch = load 0x2356e90, 0x2388b60,
0x2385880<LD8[getelementptr ([0 x i64] addrspace(2)* @SpM_use_row, i64 0,
i64 undef)(addrspace=2)](tbaa=<0x2335188>)> [ORD=2] [ID=12]

          0x2385990: i64 = Constant<3> [ID=2]

        0x2385dd0: i64 = shl 0x2385bb0, 0x2385990 [ORD=3] [ID=13]

      0x2385ee0: i64 = add 0x2386650, 0x2385dd0 [ORD=3] [ID=14]

      0x2385880: <multiple use>
    *0x2386540*: i32,ch = load 0x2356e90, 0x2385ee0,
0x2385880<LD4[%arrayidx16(addrspace=4)](align=8)(tbaa=<0x2335188>)> [ORD=4]
[ID=15]

  0x2386320: ch,glue = CopyToReg 0x2356e90, 0x2386210, 0x2386540 [ORD=6]

    0x2386210: <multiple use>
    0x2386320: <multiple use>
    0x2386320: <multiple use>
  0x2386430: ch = RET 0x2386210, 0x2386320, 0x2386320:1 [ORD=6]

SelectAddrRegReg called: *0x2386540:* i32,ch = load 0x2356e90, 0x2385ee0,
0x2385880<LD4[%arrayidx16(addrspace=4)](align=8)(tbaa=<0x2335188>)> [ORD=4]
[ID=15]

SelectAddrRegRegCommon called:
  Morphed node: 0x2386540: i32,ch = LOADI32_RR 0x2386650, 0x2385dd0,
0x2356e90<Mem:LD4[%arrayidx16(addrspace=4)](align=8)(tbaa=<0x2335188>)>
[ORD=4]

ISEL: Match complete!

If I set a breakpoint in SelectAddrRegRegCommon I can see that this is true:

p Addr.getOperand(1).Node->getOpcode() == ISD::SHL)

true

So I've added that as a case to check for there, here's the function:

bool
XSTGDAGToDAGISel::SelectAddrRegRegCommon( SDValue Addr,
                                          SDValue &Base,
                                          SDValue &Offset ) {
  // If add operation, we can optimize.
  DEBUG(errs() << "SelectAddrRegRegCommon called: \n");
  if (Addr.getOpcode() == ISD::ADD) {
    if (Addr.getOperand(0).getOpcode() == ISD::TargetJumpTable) {`
      Base = Addr.getOperand(0);
      Offset = Addr.getOperand(1);

      return true;
    }

    if (Addr.getOperand(1).getOpcode() == ISD::TargetJumpTable) {
      Base = Addr.getOperand(1);
      Offset = Addr.getOperand(0);

      return true;
    }

    // operand 1 can't be a constant.
    if (dyn_cast<ConstantSDNode>(Addr.getOperand(1))) {
      return false;
    }

    //possibly a load.idx
    if(Addr.getOperand(1).Node->getOpcode() == ISD::SHL) {
       // Not sure what should go here?
    }

    // set address and offset.
    Base = Addr.getOperand(0);
    Offset = Addr.getOperand(1);

    return true;
  }
  return false;
} //end SelectAddrRegRegCommon

As the comment says, I'm not sure what code should go into the case where
it's possibly a load.idx - Base and Offset are SDValues not SDNodes so I
don't see how I could modify the DAG here to eliminate the shl - any ideas?

Phil

The shift-handling code would go into a different complex pattern that
only ever applied to the instruction variant that did the shift
(SelectAddrShlImm in your case by the looks of it).

You don't need to actually change the DAG here, just set the Base and
Offset parameters correctly (after checking the shift amount matches
too!). The generic code will use those operands instead of the SHL in
the selected instruction, and if the SHL ends up unused (often the
case, but not always since a value can be used multiple times) it'll
just be dropped for the final output.

Tim.

> As the comment says, I'm not sure what code should go into the case where
> it's possibly a load.idx - Base and Offset are SDValues not SDNodes so I
> don't see how I could modify the DAG here to eliminate the shl - any
ideas?

The shift-handling code would go into a different complex pattern that
only ever applied to the instruction variant that did the shift
(SelectAddrShlImm in your case by the looks of it).

Yeah, that's what I tried initially. But that pattern never even gets
tested for that particular load. Some of that code is posted below in this
thread, but here i'll post the CheckComplexPattern from the
TargetGenDAGISel.inc:

bool CheckComplexPattern(SDNode *Root, SDNode *Parent,
                         SDValue N, unsigned PatternNo,
         SmallVectorImpl<std::pair<SDValue, SDNode*> > &Result) override {
  unsigned NextRes = Result.size();
  switch (PatternNo) {
  default: llvm_unreachable("Invalid pattern # in table?");
  case 0:
    Result.resize(NextRes+3);
    return SelectAddrShlImm(Parent, N, Result[NextRes+0].first,
Result[NextRes+1].first, Result[NextRes+2].first);
  case 1:
    Result.resize(NextRes+2);
    return SelectAddrRegReg(Parent, N, Result[NextRes+0].first,
Result[NextRes+1].first);
  case 2:
    Result.resize(NextRes+2);
    return SelectAddrRegImm(Parent, N, Result[NextRes+0].first,
Result[NextRes+1].first);
  case 3:
    Result.resize(NextRes+1);
    return SelectAddrImmLocal(Parent, N, Result[NextRes+0].first);
  case 4:
    Result.resize(NextRes+1);
    return SelectAddrRegAny(Parent, N, Result[NextRes+0].first);
  }
}

Case 0 isn't even tried for this part of the DAG (PatternNo is 1 so it goes
directly to case 1). Meaning that SelectAdrShlImm isn't called for it.

This is the relevant part of the DAG:

          0x2385990: i64 = Constant<3> [ID=2]

        0x2385dd0: i64 = shl 0x2385bb0, 0x2385990 [ORD=3] [ID=13]

      0x2385ee0: i64 = add 0x2386650, 0x2385dd0 [ORD=3] [ID=14]

      0x2385880: <multiple use>
    0x2386540: i32,ch = load 0x2356e90, 0x2385ee0,
0x2385880<LD4[%arrayidx16(addrspace=4)](align=8)(tbaa=<0x2335188>)> [ORD=4]
[ID=15]

  0x2386320: ch,glue = CopyToReg 0x2356e90, 0x2386210, 0x2386540 [ORD=6]

    0x2386210: <multiple use>
    0x2386320: <multiple use>
    0x2386320: <multiple use>
  0x2386430: ch = RET 0x2386210, 0x2386320, 0x2386320:1 [ORD=6]

SelectAddrRegReg called: 0x2386540: i32,ch = load 0x2356e90, 0x2385ee0,
0x2385880<LD4[%arrayidx16(addrspace=4)](align=8)(tbaa=<0x2335188>)> [ORD=4]
[ID=15]

SelectAddrRegRegCommon called:
  Morphed node: 0x2386540: i32,ch = LOADI32_RR 0x2386650, 0x2385dd0,
0x2356e90<Mem:LD4[%arrayidx16(addrspace=4)](align=8)(tbaa=<0x2335188>)>
[ORD=4]

So it morphs it into a LOADI32_RR which isn't what I'd like it to do... I'm
not sure what I'm missing - do I need to try to match a bigger piece of the
DAG, or something? (which I tried to do earlier (see early part of the
thread), but that didnt' work out for other reasons)

Phil

This is likely to be a priority issue (your shifted pattern would be
tried, but the LOADI32_RR one claims the fragment first). If so, there
are two possible solutions:

  + Put some "AddedComplexity" boost into the pattern you want to
match so that LLVM knows it's better and tries it first.
  + Make the LOADI32_RR ComplexPattern also check for the shifted case
and refuse to handle it ("return false") if found.

Cheers.

Tim.