incorrect x86 instruction size calculation

Hello,

I‘m trying to write some backends for LLVM that generate code that complies with SFI validation rules by re-implementing SFI for the LLVM x86 backend based on the Google NaCl project.

However, in trying to implement 32-byte code alignment, X86InstrInfo::GetInstSizeInBytes() is returning incorrect instruction sizes for certain instructions (that I have seen so far): MOV32mi, LEA32r, MOV32mr, and MOV32rm.

MOV32mi is always calculated incorrectly while the remaining 3 are sometimes calculated incorrectly. Just to illustrate:

8d 9c 24 30 0a 00 00 LEA32r calculated length: 7 ok

8d 6c 24 28 LEA32r calculated length: 7 incorrect

8b 86 24 0a 39 00 MOV32rm calculated length: 6 ok

8b 44 24 10 MOV32rm calculated length: 7 incorrect

89 84 24 34 14 00 00 MOV32mr calculated length: 7 ok

89 2c 24 MOV32mr calculated length: 7 incorrect

c7 44 24 08 08 0a 00 00 MOV32mi calculated length: 11 incorrect

c7 04 24 20 00 38 00 MOV32mi calculated length: 11 incorrect

Has anyone else encountered this? If this turns out to be a bug, rather than some misuse/misinterpretation of the function on my part then I can resubmit it via that channel. Also if I need to submit more information let me know and I will do so.

Regards,

–John

Hello,

I‘m trying to write some backends for LLVM that generate code that complies with SFI validation rules by re-implementing SFI for the LLVM x86 backend based on the Google NaCl project.

However, in trying to implement 32-byte code alignment, X86InstrInfo::GetInstSizeInBytes() is returning incorrect instruction sizes for certain instructions (that I have seen so far): MOV32mi, LEA32r, MOV32mr, and MOV32rm.

This piece of code is rather unfortunate. It is cloned from the JIT and reimplements some of its logic, apparently incorrectly. I believe that the current clients in the tree work ok with over-approximations of the length, they don’t need exact answers.

Has anyone else encountered this? If this turns out to be a bug, rather than some misuse/misinterpretation of the function on my part then I can resubmit it via that channel. Also if I need to submit more information let me know and I will do so.

This definitely sounds like a bug, and improvements are certainly welcome.

Long term, I’m hoping that the MC framework will ultimately provide a more principled and robust way to do this sort of thing, but it will still be several months before it will be robust enough to switch this code over to use it.

-Chris