X86 disassembler fails to handle 0x66 prefix?

Hello,

I have seen this bug for quite a while, and even in the latest code it
is still there: that is X86 disassembler does not handle 0x66 prefix
properly, if it is put behind 0xF3 prefix.

The below commands should return the same output, but not.

$ echo "0xf3 0x66 0xa5"|./build/bin/llvm-mc --disassemble
-triple=x86_64 -output-asm-variant=1
.text
rep
movsw word ptr es:[rdi], word ptr [rsi]

$ echo "0x66 0xf3 0xa6"|./build/bin/llvm-mc --disassemble
-triple=x86_64 -output-asm-variant=1
.text
cmpsb byte ptr [rsi], byte ptr es:[rdi]

You can see just by exchanging the order of 0xf3 & 0x66, we get
different result. F3 in this case is not really a prefix for REP I
think.

Is there any solution to fix this?

Thanks.
Jun

Just to clarify, the correct output for both cases would be: "rep
movsw word ptr es:[rdi], word ptr [rsi]"

Thanks.

F3 in this case is not really a prefix for REP I think.

http://x86.renejeschke.de/html/file_module_x86_id_279.html

Intel arch. manual shows REP uses F3 and F2 as starting prefixes.

Kevin

F3 in this case is not really a prefix for REP I think.

http://x86.renejeschke.de/html/file_module_x86_id_279.html

Intel arch. manual shows REP uses F3 and F2 as starting prefixes.

Are you sure? I could not find anywhere in the above link stating that.

Anyway, on the real hardware it does not matter where F2/F3 stays,
which makes everything even more confused.

Thanks.

Intel manual states that:
Instruction prefixes are divided into four groups, each with a set of allowable prefix codes. For each instruction, it
is only useful to include up to one prefix code from each of the four groups (Groups 1, 2, 3, 4). Groups 1 through 4
may be placed in any order relative to each other.

Group 1
- REP or REPE/REPZ is encoded using F3H
Group 3
- Operand-size override prefix is encoded using 66H

So according to the Intel manual the ordering is arbitrary.