X86 disassembler 0x66 prefix

There is a problem on X86 disassembler for instructions beginning with x86 prefix :

$ echo “0x66 0x0f 0x6f 0x8f 0x00 0x00 0x00 0x00” | llvm-mc --disassemble
movdqa (%edi), %xmm1

$ echo “0x53 0x66 0x0f 0x6f 0x8f 0x00 0x00 0x00 0x00” | llvm-mc --disassemble
pushl %ebx
:1:6: warning: invalid instruction encoding
0x53 0x66 0x0f 0x6f 0x8f 0x00 0x00 0x00 0x00
^

I attach a patch, but I’m not really familiar with this code, so if someone can review/correct, it would be great.

The problem is between readPrefixes and getId : prefixLocation depends on buffer position, but necessaryPrefixLocation is never updated to match buffer position.

Cheers,
Olivier.

x86_disassembler_66_prefix.patch (688 Bytes)

Olivier,

thanks for identifying that issue! Your fix corrects the specific testcase you provide, but it could produce incorrect results in other cases.

Here’s a bit of background on necessary_prefix_location.

necessary_prefix_location is supposed to keep track of where prefixes like 66, f3, and f2 – which affect the class of instruction being decoded fundamentally – should be expected to be. If you look at the “Instruction Format” section of the Intel manuals (specifically, Volume 2A of the Intel 64 and IA-32 Architectures Software Developer’s Manual), it specifies that


Some instructions may use F2H,F3H as a mandatory prefix to express distinct functionality. A mandatory prefix generally should be placed after other optional prefixes (exception to this is discussed in Section 2.2.1, “REX Prefixes”).

Also,


Some SSE2/SSE3/SSSE3/SSE4 instructions and instructions using a three-byte sequence of primary opcode bytes may use 66H as a mandatory prefix to express distinct functionality. A mandatory prefix generally should be placed after other optional prefixes (exception to this is discussed in Section 2.2.1, “REX Prefixes”).

This is why the decoder maintains a special location – necessary_prefix_location – at which f2, f3, or 66 must reside. Perhaps I ought to call it “mandatory_prefix_location”…

In the 64-bit case (under “REX Prefixes”), the manual states:


Only one REX prefix is allowed per instruction. If used, the REX prefix byte must immediately precede the opcode byte or the escape opcode byte (0FH). When a REX prefix is used in conjunction with an instruction containing a mandatory prefix, the mandatory prefix must come before the REX so the REX prefix can be immediately preceding the opcode or the escape byte.

I took a quick look in the sources and saw that necessaryPrefixLocation was being set at two locations: line 382, and line 387. Those are both only enabled if insn->mode == MODE_64BIT, which in X86 (of course) it isn’t. They are handling the specific 64-bit case; the 32-bit case is unhandled (as you spotted).

Your patch would set necessaryPrefixLocation to wherever the 66 prefix is found, which works most of the time but would break cases in which 66 is not used as a mandatory prefix, but merely as an optional prefix (the operand-size prefix). Although obscure, such a use is possible. I have attached a patch that solves the problem more generally.

Sean

llvm.necessaary-prefix-location.diff (445 Bytes)