Printing PC-relative offsets - how to get the instruction length?

Hi

In my MC6809 backend, in llvm/lib/Target/MC6809/InstPrinter/MC6809InstPrinter.cpp, I have the routine

void MC6809InstPrinter::printPCRelImmOperand(const MCInst *MI, unsigned OpNo, raw_ostream &O) {
  const MCOperand &Op = MI->getOperand(OpNo);
ZZ
  if (Op.isImm()) {
    int64_t Imm = Op.getImm() + 2; <<<========================
    O << "$";
    if (Imm >= 0)
      O << '+';
    O << Imm;
  } else {
    assert(Op.isExpr() && "unknown pcrel immediate operand");
    Op.getExpr()->print(O, &MAI);
  }
}

Which works well enough except for the constant 2 that I've arrowed - it needs to be the length of the binary instruction in bytes. The MC6809 has a *LOT* of variability here, so a case statement would be a right pain to maintain.

An answer is tantalisingly close:

$ bin/llvm-mc -triple mc6809 -show-inst-operands -show-inst -show-encoding <<< "lda 0,pc"
  .text
<stdin>:1:1: note: parsed instruction: ['lda', 0, <register 13>]
lda 0,pc
^
  lda $+2,pc ; encoding: [0xa6,0x8c,0x00] <<===========
                                        ; <MCInst #1849 LDAi8oPC
                                        ; <MCOperand Imm:0>
                                        ; <MCOperand Imm:0>>

The "encoding:" knows that I have a three-byte instruction, but that is generated by another chunk of code miles away. I suppose I could replicate that, but it seems wasteful. Is there a better way, not involving nasty layering violations, to get the length of an instruction in bytes in the context of llvm/lib/Target/*/InstPrinter/*InstPrinter.cpp?

Also, both 8 and 16-bit variants are possible. The instruction picked is LDAi8oPC with is the 8-bit offset version. If I supply a bigger offset:

$ bin/llvm-mc -triple mc6809 -show-inst-operands -show-inst -show-encoding <<< "lda 1000,pc"
  .text
<stdin>:1:1: note: parsed instruction: ['lda', 1000, <register 13>]
lda 1000,pc
^
  lda $+1002,pc ; encoding: [0xa6,0x8c,0xe8]
                                        ; <MCInst #1849 LDAi8oPC
                                        ; <MCOperand Imm:0>
                                        ; <MCOperand Imm:1000>>

I still get the 8-bit variant instead of LDAi16oPC, and the operand is truncated.

The TableGen-generated .inc file has

{ 444 /* lda */, MC6809::LDAi8oPC, Convert__imm_95_0__Imm81_0, AMFBS_None, { MCK_Imm8, MCK_PC }, },
{ 444 /* lda */, MC6809::LDAi16oPC, Convert__imm_95_0__Imm161_0, AMFBS_None, { MCK_Imm16, MCK_PC }, },

... so how do I get the 16-bit variant with MCK_Imm16 selected instead?

The instructions are defined as

def LDAi8oPC : MC6809LoadIndexed_i8oPC_P1<
                (outs GR8:$dst8),
                (ins pcoffset8:$offset),
                !strconcat("lda", "\t", "${offset}", ",", "pc"),
                0x00,
                0xA6,
                []

{ let Inst{23-16} = offset{7-0}; let Inst{15} = 0b1; let Inst{14-13} = 0b00; let Inst{12-8} = 0b01100; let Inst{7-0} = opcode; }

def LDAi16oPC : MC6809LoadIndexed_i16oPC_P1<
                (outs GR8:$dst8),
                (ins pcoffset16:$offset),
                !strconcat("lda", "\t", "${offset}", ",", "pc"),
                0x00,
                0xA6,
                []

{ let Inst{31-24} = offset{7-0}; let Inst{23-16} = offset{15-8}; let Inst{15} = 0b1; let Inst{14-13} = 0b00; let Inst{12-8} = 0b01101; let Inst{7-0} = opcode; }

and I have

def pcoffset8 : Operand<i8>, ImmLeaf<i8, [{ return Immediate >= -128 && Immediate <= 127; }]> {
  let PrintMethod = "printPCRelImmOperand";
  let MIOperandInfo = (ops i8imm);
  let ParserMatchClass = ImmediateAsmOperand<"Imm8">;
  let EncoderMethod = "getMemOpValue";
  let DecoderMethod = "DecodeMemOperand";
}

def pcoffset16 : Operand<i16>, ImmLeaf<i16, [{ return Immediate >= -32768 && Immediate <= 32767; }]> {
  let PrintMethod = "printPCRelImmOperand";
  let MIOperandInfo = (ops i16imm);
  let ParserMatchClass = ImmediateAsmOperand<"Imm16">;
  let EncoderMethod = "getMemOpValue";
  let DecoderMethod = "DecodeMemOperand";
}

M

Hi Mark,

For your first question, the MCInstPrinter has a reference to the MCInstrInfo
object for your target, so something like this should give you the instruction
encoding size in bytes:

  MII.get(Op.getOpcode()).getSize()

For your second question, it looks like the MCK_Imm8 operand class is matching
the immediate even when it is out of range. This should be checked by a
function in your assembly parser. The ImmediateAsmOperand<"Imm8"> record (which
you didn't show the definition of, so I'm guessing a bit here) should have a
PredicateMethod value giving the name of that function. If that's not
specified, the default function name is based on the tablegen class name, which
won't be correct for both Imm8 and Imm16. Note that the ImmLeaf in the code
snippet you posted is only used for code generation from IR, not by the
assembler.

Oliver

Hi Oliver,

Thanks! Both your answers got me on the right track!

Regarding the second, I'm now correctly parsing an immediate using an MCExpr if it is not an actual number. When does the MCExpr get resolved to an actual number? During assembly time? Or is it a Link/Fixup thing?

If I have a snippet of code like (e.g.):

foo equ 12
  lda foo,x

... for a constant offset off the X index register. When and and by what will the foo get resolved to 12 for the LDA indstruction?

M

Hi Mark,

I'd expect that to happen in two steps:

- A function in <target>MCCodeEmitter will convert the MCExpr operand into
  either the immediate which will be encoded in the instruction (for simple
  immediates), or add an MCFixup to the instruction (when the operand is a
  symbol or more complex expression). The function to be called is listed in
  the tablegen description of the operand, in your case it is "getMemOpValue".

- The MCAssembler tries to resolve the fixup, either resolving it entirely
  within the assembler, or emitting a relocation for it. The main loop for this
  is at the end of MCAssembler::layout, and it calls target-specific code in
  <target>AsmBackend to modify the encoded instructions, and
  <target><objectformat>ObjectWriter to emit relocations.

The reason for the two phases is that we won't know whether a fixup needs a
relocation or not until we have parsed the whole file. For example, it might
reference a symbol defined after the instruction in the source.

Oliver

Thanks! That makes sense.

M