Some bugs in x86 disasm (llvm-mc)

Hi,

With objdump, i have this (Intel syntax)

64 a1 00 00 00 00 mov eax,fs:0x0

However, if I pass above string to llvm-mc, I would have:

$ echo “0x64 0xa1 0x00 0x00 0x00 0x00”|./Release+Asserts/bin/llvm-mc -disassemble -arch=x86 --output-asm-variant=1
.text
mov eax, dword ptr [0]

You can see a big difference. This is on the latest code. Any idea how to fix this bug?

Thank you,

Jun

Hi Jun,

I’m not sure how to fix this yet, but this looks incorrectly defined in lib/Target/X86/X86InstrInfo.td:

def MOV32o32a : Ii32 <0xA1, RawFrm, (outs), (ins offset32:$src),
“mov{l}\t{$src, %eax|eax, $src}”, [], IIC_MOV_MEM>,
Requires<[In32BitMode]>;

This instruction can be REX-prefixed for a 64-bit move, and that also doesn’t appear to be defined anywhere.

I would file a bugzilla in the x86 component and cc Craig Topper, the x86 disasm/codegen expert.

I would file a bugzilla in the x86 component and cc Craig Topper, the x86
disasm/codegen expert.

If you chase down the revision history, there are already a couple of
bugs filed: PR16961 and PR16962.

Cheers.

Tim.

Thanks, Tim!

As Craig noted:
http://llvm.org/bugs/show_bug.cgi?id=16962#c1

"There are many things wrong with these instructions." 

:)

As Craig noted:
http://llvm.org/bugs/show_bug.cgi?id=16962#c1

"There are many things wrong with these instructions."

I think he fixed quite a few of them shortly afterwards (r189201),
though obviously not all, as you discovered.

Cheers.

Tim.

Craig, any plan to fix this "mov eax,fs:0x0" soon? hopefully before 3.4 is
out :slight_smile:

Thanks.

Any plan to fix this bug?

Thanks.

I started trying to fix this tonight but it’s pretty nasty to fix. I’ll try to make another go at it later this week.

Much of it seems fixed already; what's left to fix? The segment prefix
override? Does that mean we get to fix disassembly of '0x65 0xa4' while
we're at it? (Although we can't even *assemble* that one, I note.)

Any comments on the patches I posted on Monday to fix various 16-bit
disasm issues?

To fix it we need to change offset8/offset16/etc to have two suboperands and update the printer to understand that. Also update the disassembler to add the segment to the MCInst when its creating it. When I did these two things the MCCodeEmitter broke because it tried to interpret the extra operand as an immediate. I think I probably need to change the form from RawFrm to something new that I can teach the MCCodeEmitter to handle correctly.

I haven’t looked at the patches yet. I was out of town most of last week and trying to get caught back up.

I believe I have now fixed the 0x64 0xa1 0x00 0x00 0x00 0x00 bug in r199364.

I believe I have now fixed the 0x64 0xa1 0x00 0x00 0x00 0x00 bug in
r199364.

Still fails to correctly disassemble 0x66 0x67 0xa1… (mea culpa).

This should fix it. It applies on top of the patches I sent on Monday,
but probably only because it adds the test cases for 16-bit too.

I'll add it to my pile.

From a98a9ab044454bdb93be25fa8dbcd72d217f5651 Mon Sep 17 00:00:00 2001

Did we have IC_OPSIZE_ADSIZE defined but not used before this?

I added it in r198759, to allow the OpSize and/or AdSize forms of all
these instructions to coexist.

Actually, now I grok the disassembler a bit better I think I can kill
IC_OPSIZE_ADSIZE. I just need to special-case it to stop the decode
conflict warning, and let them be handled like the IC_ADSIZE case. The
disassembler handles the OpSize bit for the register all by itself,
without really referring to the operands in the table.

Will play later...

Did we have IC_OPSIZE_ADSIZE defined but not used before this?

This removes it and fixes the same disasm bugs...

From 01e472ac675363a71d27013bcf9d7f846093177d Mon Sep 17 00:00:00 2001