encoded instruction sizes

In code review http://reviews.llvm.org/D4167 for my ForwardControlFlowIntegrity pass, the question has come up about how to get instruction encoding length information in a principled manner. I am aware of MCInstrDesc::getSize(), but when I try this with the relevant instruction on X86 (JMP_4 to a symbol@PLT), I get 0 back; IIUC, that means MCInstrDesc can’t determine the size.

I need the size information to generate the correct mask for the FCFI code: it needs to know the size of a jump-instruction table given the number of jump instructions in the table so that it can create a mask to make sure a given function pointer is pointing into a table. It would be sufficient for me to be able to get a reasonable upper bound on the length of the instruction, too, though a bound that was too loose would mean I would need to expand the jumptable entry size to match.

I don’t know enough about how TableGen works to know if it’s possible to get this information right now or if I’d need to add something else to enable that. Obviously, the solution of encoding it directly in the backend as a parameter seems like a non-starter, since it would be duplicating information in the X86 instruction tables and would be in danger of getting out of date.

Is there currently any way to get this information from the Targets?

Thanks,

Tom

In code review http://reviews.llvm.org/D4167 for my ForwardControlFlowIntegrity pass, the question has come up about how to get instruction encoding length information in a principled manner. I am aware of MCInstrDesc::getSize(), but when I try this with the relevant instruction on X86 (JMP_4 to a symbol@PLT), I get 0 back; IIUC, that means MCInstrDesc can’t determine the size.

I need the size information to generate the correct mask for the FCFI code: it needs to know the size of a jump-instruction table given the number of jump instructions in the table so that it can create a mask to make sure a given function pointer is pointing into a table. It would be sufficient for me to be able to get a reasonable upper bound on the length of the instruction, too, though a bound that was too loose would mean I would need to expand the jumptable entry size to match.

I don’t know enough about how TableGen works to know if it’s possible to get this information right now or if I’d need to add something else to enable that. Obviously, the solution of encoding it directly in the backend as a parameter seems like a non-starter, since it would be duplicating information in the X86 instruction tables and would be in danger of getting out of date.

Is there currently any way to get this information from the Targets?

The information is not available in TableGen, or even in the target lowering code in a general purpose target-independent manner. X86 in particular is problematic, as the encoded length of instructions for X86 can change very late (MC time relaxation).

For your specific task, however, you should be able to do something reasonable since you want the size of a specific interaction rather than any arbitrary X86 instruction. I believe that a jmp via a PLT entry will use a 5 byte jmp instruction. One byte for the opcode and 4 bytes of PC-relative offset with a R_X86_64_PLT32 entry on it. It’s possible something more clever (both for the instruction and the size information) might be needed in large memory model. I don’t know enough about the Linux models to know for sure.

-Jim