x86 Intel Syntax and MASM 9.x

I would like to use the LLVM x86 code generator to emit Intel syntax that is compatible with Microsoft’s MASM 9.x. Taking the TOT LLVM, from last week, I have found a number of changes that are required to make this work, most of which are straight forward but a couple I wanted to check with the group to see what people thought was the best thing to do. In particular, I have made all necessary changes and these are mostly constrained to the files:

X86IntelAsmPrinter.[h|cpp]

X86TargetAsmInfo.[h|cpp]

Making sure the syntax follows Microsoft MASM requirements.

The main problem that I have hit is regarding the use of CL register in the shift instructions. The problem is that ATT syntax states that it should be referenced as “%cl” while Intel says just “cl” but these references occur in X86InstInfo.td and this means that it is shared between Intel and ATT printing! For example, the shift rules:

let Uses = [CL] in {

def SHL8rCL : I<0xD2, MRM4r, (outs GR8 :$dst), (ins GR8 :$src),

“shl{b}\t{%cl, $dst|$dst, %CL}”,

[(set GR8:$dst, (shl GR8:$src, %CL))]>;

def SHL16rCL : I<0xD3, MRM4r, (outs GR16:$dst), (ins GR16:$src),

“shl{w}\t{%cl, $dst|$dst, %CL}”,

[(set GR16:$dst, (shl GR16:$src, %CL))]>, OpSize;

def SHL32rCL : I<0xD3, MRM4r, (outs GR32:$dst), (ins GR32:$src),

“shl{l}\t{%cl, $dst|$dst, %CL}”,

[(set GR32:$dst, (shl GR32:$src, %CL))]>;

} // Uses = [CL]

Needs to be:

let Uses = [CL] in {

def SHL8rCL : I<0xD2, MRM4r, (outs GR8 :$dst), (ins GR8 :$src),

“shl{b}\t{%cl, $dst|$dst, CL}”,

[(set GR8:$dst, (shl GR8:$src, CL))]>;

def SHL16rCL : I<0xD3, MRM4r, (outs GR16:$dst), (ins GR16:$src),

“shl{w}\t{%cl, $dst|$dst, CL}”,

[(set GR16:$dst, (shl GR16:$src, CL))]>, OpSize;

def SHL32rCL : I<0xD3, MRM4r, (outs GR32:$dst), (ins GR32:$src),

“shl{l}\t{%cl, $dst|$dst, CL}”,

[(set GR32:$dst, (shl GR32:$src, CL))]>;

} // Uses = [CL]

The problem is that it does not make sense to have separate rules for Intel and ATT and as such I wanted to get the lists advice on what people think is the best approach to resolving this issue so I can make the changes?

It is also worth noting that MASM does not allow:

shr ESI

to be mean shift by 1 and instead I have to emit:

shr ESI, 1

which I’m assuming is not an issue?

Finally, as far as I can tell from comments on the mailing list the current Intel syntax emitted by LLVM does not work with any particular Window’s assembler and so making these changes will not cause another path to stop working, is this correct?

Many thanks,

Ben

I would like to use the LLVM x86 code generator to emit Intel syntax that is
compatible with Microsoft’s MASM 9.x. Taking the TOT LLVM, from last week, I
have found a number of changes that are required to make this work, most of
which are straight forward but a couple I wanted to check with the group to
see what people thought was the best thing to do. In particular, I have made
all necessary changes and these are mostly constrained to the files:

           X86IntelAsmPrinter\.\[h|cpp\]

           X86TargetAsmInfo\.\[h|cpp\]

Sounds good; did you mean to attach a patch? It'll be easier to
discuss with that. (The output of "svn diff" is fine.)

The main problem that I have hit is regarding the use of CL register in the
shift instructions. The problem is that ATT syntax states that it should be
referenced as “%cl” while Intel says just “cl” but these references occur in
X86InstInfo.td and this means that it is shared between Intel and ATT
printing! For example, the shift rules:

We have two different output styles for precisely that reason.

The problem is that it does not make sense to have separate rules for Intel
and ATT and as such I wanted to get the lists advice on what people think is
the best approach to resolving this issue so I can make the changes?

The changes just mentioned looks correct.

It is also worth noting that MASM does not allow:
shr ESI
to be mean shift by 1 and instead I have to emit:
shr ESI, 1
which I’m assuming is not an issue?

That's fine.

Finally, as far as I can tell from comments on the mailing list the current
Intel syntax emitted by LLVM does not work with any particular Window’s
assembler and so making these changes will not cause another path to stop
working, is this correct?

MASM is about as canonical as it gets for standard Intel syntax, and
the changes look reasonable.

-Eli

Hi Eli,

Thanks for the response I have one question inline.

Regards,

Ben

[...]

The main problem that I have hit is regarding the use of CL register

in the

shift instructions. The problem is that ATT syntax states that it

should be

referenced as "%cl" while Intel says just "cl" but these references

occur in

X86InstInfo.td and this means that it is shared between Intel and ATT
printing! For example, the shift rules:

We have two different output styles for precisely that reason.

The problem is that it does not make sense to have separate rules for

Intel

and ATT and as such I wanted to get the lists advice on what people

think is

the best approach to resolving this issue so I can make the changes?

The changes just mentioned looks correct.

[bg]The problem is I am not sure of the best approach to take here. For
example, one possible approach I can see is to following that of the
HasSSE2 constraint and introduce something like the follow to X86.td:

def IsIntelAsmWriter : Predicate<"Subtarget.isFlavorIntel()">;
def IsATTAsmWriter : Predicate<"!Subtarget.isFlavorIntel()">;

and then in X86InstrInfo.td make changes something like:

  def SHL8mCLIntel : I<0xD2, MRM4m, (outs), (ins i8mem :$dst),
           "shl{b}\t{%cl, $dst|$dst, CL}",
           [(store (shl (loadi8 addr:$dst), CL),
addr:$dst)]>, requires<[IsIntelAsmWriter]>;

  def SHL8mCLATT : I<0xD2, MRM4m, (outs), (ins i8mem :$dst),
           "shl{b}\t{%cl, $dst|$dst, %CL}",
           [(store (shl (loadi8 addr:$dst), CL),
addr:$dst)]>, requires<[IsATTAsmWriter]>;

I can get this two work with additional changes to X86InstrInfocpp but
the problem I have with this approach is that it introduces a lot of
duplication, when all I really want to do is parameterize the final
field in the string "shl{b}\t{%cl, $dst|$dst, %CL}". I was wondering
(hoping :slight_smile: if you knew of a better method to handling this?

[...]

I think you're missing the whole point of the "|" construct; the left
side is AT&T syntax, the right side is Intel syntax.

-Eli

Appently the GAS Intel backend has flaws and does not work correctly anyway so the X86IntelAsm backend is designed only to target MASM anyway.

Aaron

Hi Eli,

Yep I was being stupid.

Please find attached a patch for initial changes to get MASM working.

There is still one problem that I am looking into around changing
alignments within SEGMENTS. The problem is that MASM allows 2,4,16,256
alignments, default being 16, but LLVM is sometimes generating 32
alignment, for example, consider the following code:

float bar(float fy, float fx)
{
  static const double foo[ 241] = {
    6.24188099959573430842e-02,
    6.63088949198234745008e-02,
  }
}

Is generating the data segment:

_data segment PARA 'DATA'
__2E_str: ; .str
        db 'out',0
__2E_str1: ; .str1
        db 'in',0
        public ___some_other_sruct_data
        ALIGN 4
___some_other_sruct_data: ;
       dd 7 ; 0x7
       dd 3 ; 0x3
       dd __2E_str
       db 12 dup(0)
       dd 7 ; 0x7
       dd 3 ; 0x3
       dd __2E_str1
       db 12 dup(0)
       db 12 dup(0)
sgv: ; sgv
       db 1 dup(0)
       ALIGN 4
lvgv: ; lvgv
       ALIGN 32
foo: ;
        dq 4589156319577832937 ; double value: 6.241881e-002
        dq 4589442480094401190 ; double value: 6.630889e-002

MASM reports the following error:

error A2189:invalid combination with segment alignment : 32

Regards,

Ben

masm.patch (15.4 KB)

Hello, Benedict

There is still one problem that I am looking into around changing
alignments within SEGMENTS. The problem is that MASM allows 2,4,16,256
alignments, default being 16, but LLVM is sometimes generating 32
alignment, for example, consider the following code:

That's correct. MASM is too weak to represent even slightly non-trivial
program. In your particular case - LLVM IR can set up any alignment it
want. Also note the FIXME's in the X86IntelAsmPrinter.cpp wrt the
alignment.

You might try to round up the alignment to the highest allowed value,
but this might be an overkill...

Hi Eli,

Yep I was being stupid.

Please find attached a patch for initial changes to get MASM working.

Patch looks fine except that it has tabs (LLVM uses only spaces for
indentation). Also, can you generate the patch using "svn diff"?
It's currently in some unusual format which "patch" doesn't recognize.

There is still one problem that I am looking into around changing
alignments within SEGMENTS. The problem is that MASM allows 2,4,16,256
alignments, default being 16, but LLVM is sometimes generating 32
alignment, for example, consider the following code:

Huh. For correctness, I guess you'll have to round up the alignment,
and abort on anything higher than 256. (Also, perhaps the front-end
should avoid generating such constructs in the first place when
targeting Windows, but that's a separate issue.)

-Eli

Hi Anton,

Indeed your work around of rounding up to supported alignment is what I
have implemented but it was not clear to me that this should be
submitted back. As you point out as there are many other FIXMEs with the
X86IntelAsmPrinter maybe this is the "fix" for now.

Regards,

Ben

Hi, Eli

Huh. For correctness, I guess you'll have to round up the alignment,
and abort on anything higher than 256. (Also, perhaps the front-end
should avoid generating such constructs in the first place when
targeting Windows, but that's a separate issue.)

It depends on frontend :slight_smile: I don't see why, for example, llvm-gcc for
mingw32 should not generate such globals. Everything is perfect with
gas. Also, for example, masm does not support weak symbols which is a
requirement for more or less non-trivial C++ code

Hi Eli,

Sorry about that Visual Studio seems to have inserted tabs and I used an
internal diff tool. Anyway, I synced TOT LLVM and made the changes with
Emacs and the svn diff is attached.

Regards,

Ben

masm.diff (14.1 KB)

gas Intel syntax is indeed broken in LLVM. I'd love to make it work but
my work has not (yet) allocated time for that. Maybe I can hack LLVM on
the weekends. :slight_smile:

The above discussion leads me to believe there are fundamental conflicts
between MASM and gas syntax.

Is NASM any better than MASM?

I would hate for MASM to impose draconian restrictions on the Intel asm
printer for all targets.

Do we need a third asm printer?

                             -Dave

Personally, I'd rather just bring up a PE COFF writer and use the masm backend for "debugging".

-Chris

Appently the GAS Intel backend has flaws and does not work correctly anyway
so the X86IntelAsm backend is designed only to target MASM anyway.

gas Intel syntax is indeed broken in LLVM. I'd love to make it work but
my work has not (yet) allocated time for that. Maybe I can hack LLVM on
the weekends. :slight_smile:

I think writing an assembler using LLVM Table gen and data, and using the DOCE (Direct Object Code Emission) backends (see LLVM Wiki for details) when they are ready is a much better solution. I am planning on doing a full tool set (linker, and librarian anyway) to replace binutils for LLVM on Windows, and maybe for other bianry formats. Although this will take time, hopefully we will get more people working on it when there is something basic running as a proof of concept.

The above discussion leads me to believe there are fundamental conflicts
between MASM and gas syntax.

Is NASM any better than MASM?

Probably if MASM does not support aligns or weak symbols properly.

I would hate for MASM to impose draconian restrictions on the Intel asm
printer for all targets.

Yes.

Do we need a third asm printer?

I think so.

Aaron

You really need COFF object modules and a linker as you generally need other libraries linked in too.

Direct PE or COFF plus binutils linker could be stopgaps too.

Aaron

> The above discussion leads me to believe there are fundamental conflicts
> between MASM and gas syntax.
>
> Is NASM any better than MASM?

Probably if MASM does not support aligns or weak symbols properly.

FWIW: masm does not support much more :slight_smile:

Patch committed in r73753.

-Eli