[PATCH] fix outs/ins of MOV16mr instruction (X86)

Hi,

This patch fixes outs/ins of MOV16mr instruction of X86.

Thanks.

diff --git a/lib/Target/X86/X86InstrInfo.td b/lib/Target/X86/X86InstrInfo.td
index e9a0431…f5b2064 100644
— a/lib/Target/X86/X86InstrInfo.td
+++ b/lib/Target/X86/X86InstrInfo.td
@@ -1412,7 +1412,7 @@ let SchedRW = [WriteStore] in {
def MOV8mr : I<0x88, MRMDestMem, (outs), (ins i8mem :$dst, GR8 :$src),
“mov{b}\t{$src, $dst|$dst, $src}”,
[(store GR8:$src, addr:$dst)], IIC_MOV_MEM>;
-def MOV16mr : I<0x89, MRMDestMem, (outs), (ins i16mem:$dst, GR16:$src),
+def MOV16mr : I<0x89, MRMDestMem, (outs i16mem:$dst), (ins GR16:$src),
“mov{w}\t{$src, $dst|$dst, $src}”,
[(store GR16:$src, addr:$dst)], IIC_MOV_MEM>, OpSize16;
def MOV32mr : I<0x89, MRMDestMem, (outs), (ins i32mem:$dst, GR32:$src),

Hi,

This patch fixes outs/ins of MOV16mr instruction of X86.

Thanks.

diff --git a/lib/Target/X86/X86InstrInfo.td b/lib/Target/X86/X86InstrInfo.td
index e9a0431..f5b2064 100644
--- a/lib/Target/X86/X86InstrInfo.td
+++ b/lib/Target/X86/X86InstrInfo.td
@@ -1412,7 +1412,7 @@ let SchedRW = [WriteStore] in {
def MOV8mr : I<0x88, MRMDestMem, (outs), (ins i8mem :$dst, GR8 :$src),
                 "mov{b}\t{$src, $dst|$dst, $src}",
                 [(store GR8:$src, addr:$dst)], IIC_MOV_MEM>;
-def MOV16mr : I<0x89, MRMDestMem, (outs), (ins i16mem:$dst, GR16:$src),
+def MOV16mr : I<0x89, MRMDestMem, (outs i16mem:$dst), (ins GR16:$src),
                 "mov{w}\t{$src, $dst|$dst, $src}",
                 [(store GR16:$src, addr:$dst)], IIC_MOV_MEM>, OpSize16;

Why? i16mem here stands for the pointer, not the actual memory. A
store doesn't define a pointer, so why would it be in "outs"?
Also, what's special about i16? You'd need to change the various other
*mr instructions, for instance the MOV8mr right above.

-Ahmed

Agree with Ahmed. Also all patches are supposed to go to llvm-commits not llvm-dev.

> Hi,
>
> This patch fixes outs/ins of MOV16mr instruction of X86.
>
> Thanks.
>
>
> diff --git a/lib/Target/X86/X86InstrInfo.td
b/lib/Target/X86/X86InstrInfo.td
> index e9a0431..f5b2064 100644
> --- a/lib/Target/X86/X86InstrInfo.td
> +++ b/lib/Target/X86/X86InstrInfo.td
> @@ -1412,7 +1412,7 @@ let SchedRW = [WriteStore] in {
> def MOV8mr : I<0x88, MRMDestMem, (outs), (ins i8mem :$dst, GR8 :$src),
> "mov{b}\t{$src, $dst|$dst, $src}",
> [(store GR8:$src, addr:$dst)], IIC_MOV_MEM>;
> -def MOV16mr : I<0x89, MRMDestMem, (outs), (ins i16mem:$dst, GR16:$src),
> +def MOV16mr : I<0x89, MRMDestMem, (outs i16mem:$dst), (ins GR16:$src),
> "mov{w}\t{$src, $dst|$dst, $src}",
> [(store GR16:$src, addr:$dst)], IIC_MOV_MEM>, OpSize16;

Why? i16mem here stands for the pointer, not the actual memory. A
store doesn't define a pointer, so why would it be in "outs"?

Then why does this "i16mem:$dst" belongs to "ins"? Is that wrong, correct?

Also, what's special about i16? You'd need to change the various other

*mr instructions, for instance the MOV8mr right above.

Yes, I would fix that with another patch, but now I am not sure if this is
needed.

Thanks,
Jun

Why? i16mem here stands for the pointer, not the actual memory. A
store doesn't define a pointer, so why would it be in "outs"?

Then why does this "i16mem:$dst" belongs to "ins"? Is that wrong, correct?

Think about a typical instruction:

    movw %ax, 1234(%rax, 4, %rbx)

The components making up the "i16mem" operand are "1234", %rax, 4 and
%rbx (+ a hidden segment register). None of those are defined/written
by the instruction (but they are read). So they need to be ins.

What's written is some nebulous memory location specified by those
operands. This isn't modelled by LLVM's basic instructions at all
(except possibly via mayLoad and mayStore).

The existing code is correct.

Cheers.

Tim.

<side rant>
AT&T syntax on x86: Ruining the eyes of everyone who read the Intel manuals and wrote in the standard syntax since... I'm too old to remember the year. I know LLVM has to be compatible with lots of GNU tools still, but I feel like now that clang has integrated AS it might do the world a favor and default to Intel syntax on input / output (especially since the internal assembler is mature now), while still accepting AT&T for compatibility reasons. I'm not saying ditch it altogether, but when you have to read an Intel manual to learn the instruction format then reverse the order of everything, consult a GAS manual, and insert bizarre tokens all over the place... I know it's Unix legacy and all, and there is probably more code than I've seen in my lifetime, but at the same time we aren't using AT&T syntax to write inline assembly for ARM, we're using ARM syntax.
</side rant>

The segment register thing is odd as a concern on modern x86_64 as I understand it. I see this patch seems to be targeting small-width moves in 64 bit mode. AFAIK all segment registers are zeroed... with GS / FS being special of course, I believe already modeled as such... and protection info pulled from the GDT. But *should* they be modelled?

<sidetracking again>
I bring this up because I'm wondering if LLVM models real mode correctly where segment registers and memory spaces are actually a concern (I haven't been paying attention to x86 backends much, more ARM and PPC). I assume override prefixes can handle most of this, but there were lots of implicit segments for some instructions and I'm not sure on the state of modelling that.

I saw some recent work towards getting real mode x86 working and would like to contribute if for nothing other than nostalgia's sake. Being at ring 0 all the time and having full system access was awful from any security standpoint, but allowed for some fun and weird programming as well. MS style inline assembly is close enough to Borland's old format that we could get old Turbo-C code compiling without a huge amount of effort beyond what has already been done, although the MS still needs a little work and I have a bug or two to file against it.

Sorry for derailing this thread, if anybody is interested in discussing off-list or starting a new thread on this topic feel free to email.

Cheers,
Gordon Keiser
Software Development Engineer
Arxan Technologies

Got it, thanks!

A question: I still dont really understand why & how LLVM uses these "outs"
& "ins" information from each instruction.
Any hint, please?

Thanks,
Jun

A question: I still dont really understand why & how LLVM uses these "outs"
& "ins" information from each instruction.
Any hint, please?

It's not really necessary information for the assembler, but CodeGen
uses it to track which registers are defined and used by each
instruction. The "outs" get defined, and "ins" get used.

Cheers.

Tim.

What does CodGen track these registers (defined & used) for?

Thanks,
Jun

What does CodGen track these registers (defined & used) for?

Almost all optimisations need that information. It determines which
instructions depend on the data from others and how the compiler can
move them around safely. Two instructions using (reading) a given
register have to be treated very differently from one defining
(setting) it and the second using it.

Cheers.

Tim.