How to Find Instruction Encoding for a MachineInstr

Dear All,

I'm enhancing a MachineFunctionPass that enforces control-flow integrity. One of the things I want to do is to set the alignment of an instruction (by adding NOPs before it in the MachineBasicBlock or by emitting an alignment directive to the assembler) if it causes a specific sequence of bytes to be generated at a specific alignment. The goal is to ensure that sequences of bytes used to label valid targets of an indirect branch (e.g., a return instruction) do not appear at a given alignment anywhere in a program other than for where I inserted them explicitly.

It looks like MachineInstr has a method for finding the length of the instruction's binary encoding, but I didn't see a method for finding the exact bytes that would be emitted from the MachineInstr. Is there a way to do this in the MachineFunctionPass/MachineInstr infrastructure, or do I need to use something like the MC classes?

Thanks in advance for any help provided.

-- John T.

What function provides the encoding length? X86 in particular is so difficult to encode that only the old style JIT and the MC Code Emitter could possibly know how many bytes something takes.

The getSize() method of MCInstrDesc which can be fetched from a MachineInstr using the getDesc() method: Does this method not work as advertised in Doxygen? – John T.

After a quick glance at the generated description files looks like ARM has correct size information, but X86 always returns 0. ARM is easy because its fixed length.

As I recall (I haven't played this deep with MachineInstrs for close to a year), it's not necessarily knowable what the length is or the exact bytes that would be emitted since some of them depend on information not known until the final assembly emission pass. An example here is the x86 jmp instruction: the choice between near and long jumps (and hence 2 bytes or 5 bytes on x86-64) is not made until the actual conversion to MCInst and after applying all of the fixups--which only happens deep within the bowels of the AsmPrinter pass.

Dear All,

I'm enhancing a MachineFunctionPass that enforces control-flow integrity. One of the things I want to do is to set the alignment of an instruction (by adding NOPs before it in the MachineBasicBlock or by emitting an alignment directive to the assembler) if it causes a specific sequence of bytes to be generated at a specific alignment. The goal is to ensure that sequences of bytes used to label valid targets of an indirect branch (e.g., a return instruction) do not appear at a given alignment anywhere in a program other than for where I inserted them explicitly.

It looks like MachineInstr has a method for finding the length of the instruction's binary encoding, but I didn't see a method for finding the exact bytes that would be emitted from the MachineInstr. Is there a way to do this in the MachineFunctionPass/MachineInstr infrastructure, or do I need to use something like the MC classes?

As I recall (I haven't played this deep with MachineInstrs for close to a year), it's not necessarily knowable what the length is or the exact bytes that would be emitted since some of them depend on information not known until the final assembly emission pass. An example here is the x86 jmp instruction: the choice between near and long jumps (and hence 2 bytes or 5 bytes on x86-64) is not made until the actual conversion to MCInst and after applying all of the fixups--which only happens deep within the bowels of the AsmPrinter pass.

Right. See X86AsmBackend::mayNeedRelaxation() and friends for the gory details.

-jim

Jim and everyone else,

   I have somewhat related question then. I have a similar (to x86-64)
mechanism to handle long jumps on Hexagon.
This means, that on some occasions my jump instruction is 4 bytes long, on
others 8 bytes. It is important for reasons other than trampoline insertion
to know which version am I dealing with. Bundling is a prime example -
extended address affects the number of instructions in a bundle. I can
compute an estimated jump distance every time I need it based on available
CFG layout, but... it is rather expensive, and the test comes about rater
often.

  Here is the question - given the current infrastructure, how I can
tag/mark a given branch instruction as having "extended" address mode
without introducing a new opcode for it?

  There are several unintuitive side effects, so if someone has already
solved this - I would love to know how.
Thanks.

Sergei