X86 disassembler is quite broken on handling REX

hi,

i think the current X86 disassembler is quite broken and fails badly on handling REX for x86_64 code.

below are some examples:

$ echo “0x0f,0xeb,0xc3”|./Release+Asserts/bin/llvm-mc -disassemble -triple=x86_64
.text
por %mm3, %mm0

$ echo “0x40,0x0f,0xeb,0xc3”|./Release+Asserts/bin/llvm-mc -disassemble -triple=x86_64
.text
por %mm3, %mm0

$ echo “0x41,0x0f,0xeb,0xc3”|./Release+Asserts/bin/llvm-mc -disassemble -triple=x86_64
.text
:1:1: warning: invalid instruction encoding
0x41,0x0f,0xeb,0xc3
^

the last example should also return “por %mm3, %mm0”, but it fails to understand the input.

the reason stays with this line in X86DisassemblerDecoder.cpp:

rm |= bFromREX(insn->rexPrefix) << 3;

we can see that we take into account REX.B, but for “por” (0F EB), this should be ignored.

there are quite a lot of other instructions taking into account REX like this, while according to the manual, REX should be ignored.

i dont see any clean solution for this issue without some significant changes into the way we decode ModRM & providing more information to .td files.

any idea?

thanks.

Jun

I believe this particular error is caused by this. That seems easy enough to just drop the bit. Do you have other non-mmx examples?

case TYPE_MM:
if (index > 7)
*valid = 0;
return prefix##_MM0 + index;

I believe this particular error is caused by this. That seems easy enough
to just drop the bit. Do you have other non-mmx examples?

    case TYPE_MM: \
      if (index > 7) \
        *valid = 0; \
      return prefix##_MM0 + index;

yes, exactly this place. but the question is: how do we know when to drop
the REX.B?

i dont know any non-MMX examples. it seems only MMX related instructions
have this issue.

thanks,
Jun

Wouldn’t changing

case TYPE_MM:
if (index > 7)
*valid = 0;
return prefix##_MM0 + index;

to

case TYPE_MM:
return prefix##_MM0 + (index & 0x7);

Fix the issue for both rex.b and rex.r?

Wouldn't changing

    case TYPE_MM: \
      if (index > 7) \
        *valid = 0; \
      return prefix##_MM0 + index;

to

    case TYPE_MM: \
      return prefix##_MM0 + (index & 0x7);

Fix the issue for both rex.b and rex.r?

this sounds OK. but there is no more check (index > 7)? is there any case
that ca be the issue?

thanks,
Jun