REX prefix is not handled properly for X86_64?


Intel’s Xed can interpret “43 40 04 75” as “add al, 0x75”, but LLVM’s X86 disassembler considers this invalid code. I guess the reason is that LLVM fails to recognize the REX prefix in this case.

Is this correct?



Hi Jun,

FWIW, I think LLVM's right in rejecting this. Per SDM 2.2.1, "Only one
REX prefix is allowed per instruction."
Here, 0x43 and 0x40 are both REX prefixes, so that contradicts the manual.

However, trunk llvm-mc is still able to disassemble the add, I guess
because it ignores invalid bytes:

<stdin>:1:1: warning: invalid instruction encoding
0x43 0x40 0x04 0x75
addb $117, %al ## encoding: [0x04,0x75]
                                        ## <MCInst #107 ADD8i8
                                        ## <MCOperand Imm:117>>

It would be trivial to change the disassembler to accept redundant REX
prefixes (see attached patch, turn that into a loop to accept more
than 2, but that would be even worse). Then, you have to decide which
one to use: the first, or the last. Currently, only the last REX
prefix is the one that's actually used for the following instruction:
all the others before are discarded as invalid encodings.

Now, if LLVM rejected useless REX prefixes (e.g. "40 04 75") that
would be a problem, but that seems to work fine without any change.

So, to recap: to avoid the problem, I think you should change the way
you use the LLVM Disassembler API. When it's unable to disassemble a
byte, ignore it and try again at the next one. That's what most
linear disassemblers do, and would correctly ignore the first REX
prefix here.

- Ahmed

x86_rex_redundant.patch (595 Bytes)

got it, thanks a lot!!!