Encoding an X86 format with long operands

Hi all.

tl;dr: I would like to add a long x86 instruction which doesn’t conform to any existing format that I know; I’m not sure where to start.

I am attempting to add an instruction into X86, to be simulated in gem5. I’ve already added a simple, opcode-only instruction which I can successfully decode and run in gem5, so I am roughly familiar with .td files and how backends are built out of them.

My goal now is to make a more complex instruction – specifically, I need to add large operands. The format would look something like this:

  • 1 byte opcode (0x06, which I hijacked from PUSHES, which isn’t implemented in gem5)
  • n byte destination (memory location)
  • n byte source (memory location)
  • n byte source (memory location or immediate)
    If n=4, then the total opcode length is 13 bytes, which is under the 15 byte x86 limit.

As far as I know, this doesn’t conform to any existing x86 format. Because that’s the case, I’m not sure how to go about encoding an instruction like this; presumably, I can’t use the existing I<…> class, which is what I’d used previously.

Can anyone point me in the general direction of what I will need to do to encode this rather arbitrary instruction format? Should I look into implementing a new Instruction class? Is there an easier way?

Thanks,
Gus Smith, PSU

Hi Gus,

When you say “n byte destination” you mean you want to encode an n byte address as a constant within the instruction? That would mean you couldn’t encode an address that comes from a register.

Whoops - sorry for the confusion. n would be set in stone beforehand. I basically meant to indicate that we’d either be looking at a 32 bit or 64 bit system, ie 4 byte or 8 byte addresses.

That wasn’t the part that confused me. What confused me was what you expected to be encoded into the instruction. Your math indicated that you multiple n by 3 and added 1 to it to get your 13 bytes. So that means you intend to use 4 bytes to store an address in 32 bits which implied to me that you intended to have a fixed address encoded. But what if the address in in a register as would often be the case in real code. For example, when a pointer is passed as operand to a function. Most X86 instructions support a format of (base + scale * index + displacement) where any of those pieces are optional, but it takes up to 6 bytes to encode them.

Yes, you’re completely right…I’m thinking about this 100% the wrong way. I was not thinking at all about how the addresses I need won’t be available at compile time – that would be silly if it were the case. Thanks for pointing that out :slight_smile: