Intel vs. AT&T Assembly.

Hi,

I notice `lli -print-machineinstrs -x86-asm-syntax=(att|intel)' both
prefix registers with `%'. Is this right? I thought AT&T did this and
Intel didn't. The GNU gas manual concurs.

    http://www.gnu.org/software/binutils/manual/gas-2.9.1/html_chapter/as_16.html

Cheers,

Ralph.

The Intel version is just a clone of the AT&T version at this time. No one has yet taken the time to make it produce actual Intel syntax.

Ralph Corderoy wrote:

Hi Jeff,

> I notice `lli -print-machineinstrs -x86-asm-syntax=(att|intel)' both
> prefix registers with `%'. Is this right? I thought AT&T did this
> and Intel didn't. The GNU gas manual concurs.
>
> http://www.gnu.org/software/binutils/manual/gas-2.9.1/html_chapter/as_16.html

The Intel version is just a clone of the AT&T version at this time.
No one has yet taken the time to make it produce actual Intel syntax.

It's a long way towards it:

    # AT&T. # Intel.
            .text .text
            .align 16 .align 16
            .globl main .globl main
                                         .type main, @function
    main: main:
            subl $12, %esp sub esp, 12
            fnstcw 10(%esp) fnstcw word ptr [esp + 10]
            movb $2, 11(%esp) mov byte ptr [esp + 11], 2
            fldcw 10(%esp) fldcw word ptr [esp + 10]
            movl 20(%esp), %eax mov eax, dword ptr [esp + 20]
            movl 4(%eax), %eax mov eax, dword ptr [eax + 4]

Just some little bits to go.

Cheers,

Ralph.

Hi,

It's a long way towards it:

    # AT&T. # Intel.
            .text .text
            .align 16 .align 16
            .globl main .globl main
                                         .type main, @function
    main: main:
            subl $12, %esp sub esp, 12
            fnstcw 10(%esp) fnstcw word ptr [esp + 10]
            movb $2, 11(%esp) mov byte ptr [esp + 11], 2
            fldcw 10(%esp) fldcw word ptr [esp + 10]
            movl 20(%esp), %eax mov eax, dword ptr [esp + 20]
            movl 4(%eax), %eax mov eax, dword ptr [eax + 4]

Whoops. I've provided my post-processed version of lli's Intel output
which, since I removed the `%' and lowered the `DWORD PTR' isn't a good
example. Still, you get the gist; there are already significant
differences between the two.

Cheers,

Ralph.

We know. Someone offered to do the Intel version, but did little more than a huge cut and paste of the AT&T version and then lost interest. No one else has had the time or inclination to finish the (barely begun) job. Patches accepted :slight_smile:

Ralph Corderoy wrote:

Jeff,

I had a working MASM Writer backend but it never got committed. I still have the code so could redo it relatively quickly.

Aaron

We know. Someone offered to do the Intel version, but did little more than a huge cut and paste of the AT&T version and then lost interest. No one else has had the time or inclination to finish the (barely begun) job. Patches accepted :slight_smile:

Actually, that's not true. The LLVM X86 backend started out emitting intel mode for use with GAS and it's "intel syntax mode" (which does use registers with %'s). Unfortunately GAS has (or commonly available versions have) a number of bugs in intel syntax mode (e.g. you can't define a function named 'dword'), so we switched to using AT&T syntax.

Intel syntax mode was retained because it's nicer to read :), and because it may be useful in the future. As Jeff says, patches are welcome to make it do something useful, e.g. be assemblable with MASM or NASM.

-Chris

Ralph Corderoy wrote:

Hi,

It's a long way towards it:

    # AT&T. # Intel.
            .text .text
            .align 16 .align 16
            .globl main .globl main
                                         .type main, @function
    main: main:
            subl $12, %esp sub esp, 12
            fnstcw 10(%esp) fnstcw word ptr [esp + 10]
            movb $2, 11(%esp) mov byte ptr [esp + 11], 2
            fldcw 10(%esp) fldcw word ptr [esp + 10]
            movl 20(%esp), %eax mov eax, dword ptr [esp + 20]
            movl 4(%eax), %eax mov eax, dword ptr [eax + 4]

Whoops. I've provided my post-processed version of lli's Intel output
which, since I removed the `%' and lowered the `DWORD PTR' isn't a good
example. Still, you get the gist; there are already significant
differences between the two.

Cheers,

Ralph.

_______________________________________________
LLVM Developers mailing list
LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-Chris

Jeff, many appologies. I slightly misread what you wrote, and came across more strongly that was appropriate: I'm sorry! :frowning: :frowning:

Let's see. Ralph correctly points out that LLVM isn't producing anything like Intel syntax. ...

This is not true. LLVM produces something very close to what GAS accepts in intel mode: that is, we produce fully intel syntax (e.g. "DWORD PTR", no opcode size suffixes, etc) but we prepend % onto registers. This is what GAS expects (i.e., it's gas intel mode). I believe there is a GAS option to turn off the % prefix, but we never used it because it had other bugs.

No, %reg is not Intel syntax, no matter what gas thinks.

There are several dialects of "intel mode", and GAS's is just one. Greater variation is due to differences in pseudo ops.

Last June, Aaron Gray offered to fix Intel mode so that it actually produced Intel syntax. He offered was a gigantic cut and paste: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20050627/026872.html. Then he apparently lost interest, though he now claims he had finished it after all but simply neglected to give us a patch.

Yup, that's true. Note, however, that that patch just moved around existing functionality, it didn't change or add anything (hence the use of the term "refactor").

Specifically, when I said "Actually, that's not true.", what I meant is that this:

"Someone offered to do the Intel version, but did little more
than a huge cut and paste of the AT&T version and then lost interest."

... is not true. The current Intel version is based on our original support for GAS Intel mode, it is not based on AT&T syntax support at all (which, again, came after gas intel syntax support).

As was independently pointed out, we're quite close to supporting MASM (or whatever) intel syntax. Removing the %'s, for example, is trivial. Anyone who wants to do so is welcome to. The current -x86-asm-syntax=intel support is not currently compatible with any assembler that I know of. Patches to make it useful are welcome.

-Chris

Hi Chris,

The LLVM X86 backend started out emitting intel mode for use with GAS
and it's "intel syntax mode" (which does use registers with %'s).
Unfortunately GAS has (or commonly available versions have) a number
of bugs in intel syntax mode (e.g. you can't define a function named
'dword'), so we switched to using AT&T syntax.

Ah, OK. The current gas manual says Intel register operands are
undelimeted, i.e. no `%'. Perhaps they've changed.

    http://www.gnu.org/software/binutils/manual/gas-2.9.1/html_chapter/as_16.html#IDX585

Intel syntax mode was retained because it's nicer to read :), and
because it may be useful in the future. As Jeff says, patches are
welcome to make it do something useful, e.g. be assemblable with MASM
or NASM.

NASM might be the nicer target since it's GNU LGPL and runs on multiple
OS. Its home page is broken at the moment, but the manual pages work.

    http://nasm.sourceforge.net/doc/html/nasmdoc0.html

You went onto write:

> Let's see. Ralph correctly points out that LLVM isn't producing
> anything like Intel syntax. ...

"Oh, no I didn't". :slight_smile: It was me that was saying llc's att and intel
are already very different but both happen to have `%'.

This is not true. LLVM produces something very close to what GAS
accepts in intel mode: that is, we produce fully intel syntax (e.g.
"DWORD PTR", no opcode size suffixes, etc) but we prepend % onto
registers. This is what GAS expects (i.e., it's gas intel mode). I
believe there is a GAS option to turn off the % prefix, but we never
used it because it had other bugs.

OK, looks like they may have made that the default now.

As was independently pointed out, we're quite close to supporting MASM
(or whatever) intel syntax. Removing the %'s, for example, is
trivial. Anyone who wants to do so is welcome to. The current
-x86-asm-syntax=intel support is not currently compatible with any
assembler that I know of. Patches to make it useful are welcome.

OK. Thanks for clarifying.

Cheers,

Ralph.

The LLVM X86 backend started out emitting intel mode for use with GAS
and it's "intel syntax mode" (which does use registers with %'s).
Unfortunately GAS has (or commonly available versions have) a number
of bugs in intel syntax mode (e.g. you can't define a function named
'dword'), so we switched to using AT&T syntax.

Ah, OK. The current gas manual says Intel register operands are
undelimeted, i.e. no `%'. Perhaps they've changed.

   http://www.gnu.org/software/binutils/manual/gas-2.9.1/html_chapter/as_16.html#IDX585

Though it really doesn't matter, IIRC, this was to work around bugs in GAS. In particular (again, as I recall, could be wrong), GAS accepted registers either with or without % prefixes. If you used % prefixes, it avoided some class of bug that I don't remember any longer. If you dig into the CVS history, looking at the commit that added the % prefixes, it will probably explain in further detail.

In any case, I committed a patch to remove the %'s from register names in intel mode.

Intel syntax mode was retained because it's nicer to read :), and
because it may be useful in the future. As Jeff says, patches are
welcome to make it do something useful, e.g. be assemblable with MASM
or NASM.

NASM might be the nicer target since it's GNU LGPL and runs on multiple
OS. Its home page is broken at the moment, but the manual pages work.

   http://nasm.sourceforge.net/doc/html/nasmdoc0.html

That's fine with me. The instructions are in true intel mode now, the hard part will be to get the pseudo ops to match what the assembler expects.

-Chris

Chris Lattner wrote:

NASM might be the nicer target since it's GNU LGPL and runs on multiple
OS. Its home page is broken at the moment, but the manual pages work.

   http://nasm.sourceforge.net/doc/html/nasmdoc0.html

That's fine with me. The instructions are in true intel mode now, the hard part will be to get the pseudo ops to match what the assembler expects.

-Chris

We had this discussion last year. We need to support the assembler that is guaranteed to be present as part of a tool chain, not every assembler in existence. On Unix, where we build with gcc, that is gas. On Windows, that is either again gcc or Visual Studio. Visual Studio also comes with an assembler, ml.exe, and users of Visual Studio will not appreciate being forced to download a different assembler. I doubt anyone else would either. Gas is perfectly happy assembling AT&T syntax, so the only assembler that Intel syntax mode needs to support is Mircosoft's ml.exe.

Chris Lattner wrote:

NASM might be the nicer target since it's GNU LGPL and runs on multiple
OS. Its home page is broken at the moment, but the manual pages work.

   http://nasm.sourceforge.net/doc/html/nasmdoc0.html

That's fine with me. The instructions are in true intel mode now, the hard part will be to get the pseudo ops to match what the assembler expects.

-Chris

We had this discussion last year. We need to support the assembler that is guaranteed to be present as part of a tool chain, not every assembler in existence. On Unix, where we build with gcc, that is gas. On Windows, that is either again gcc or Visual Studio. Visual Studio also comes with an assembler, ml.exe, and users of Visual Studio will not appreciate being forced to download a different assembler. I doubt anyone else would either. Gas is perfectly happy assembling AT&T syntax,

I agree with the above :slight_smile:

so the only assembler that Intel syntax mode needs to support is Mircosoft's ml.exe.

I agree that "the most useful assembler for intel syntax mode to support is microsoft's ml.exe", but I don't think it's true that it is the only one we can/should support. If there is little cost to adding NASM support (i.e. the code isn't horrible) and if someone produces a patch, we would be welcome it.

That said, support for ml.exe certainly sounds more *useful*. :slight_smile:

-Chris

Hi,

There maybe licencing problems with ML/MASM, we need to get someone to check this out if we are going to support them.

NASM and YASM were suggested.

Aaron

Absolutely not. I have just re-read the EULA for Visual Studio and it does not even mention ML or MASM. It places no restrictions on ML that do not also apply to everything else in Visual Studio, and that is the only license that controls my usage of ML. In other words, if ML has "licensing problems" (whatever those are supposed to be), then so does VC++ and using NASM would solve nothing.

Now I do not have a free and/or beta version of Visual Studio and it is possible (ok, probable) those versions have more restrictive licenses, but again those restrictions would apply to everything and not just ML.

And don't forget that the need for *any* assembler is just a short term limitation. The goal is for LLVM to produce object files directly. (Hmm... note to self... maybe I should finally get around to starting

Chris Lattner wrote:

Chris Lattner wrote:

NASM might be the nicer target since it's GNU LGPL and runs on multiple
OS. Its home page is broken at the moment, but the manual pages work.

   http://nasm.sourceforge.net/doc/html/nasmdoc0.html

That's fine with me. The instructions are in true intel mode now, the hard part will be to get the pseudo ops to match what the assembler expects.

-Chris

We had this discussion last year. We need to support the assembler that is guaranteed to be present as part of a tool chain, not every assembler in existence. On Unix, where we build with gcc, that is gas. On Windows, that is either again gcc or Visual Studio. Visual Studio also comes with an assembler, ml.exe, and users of Visual Studio will not appreciate being forced to download a different assembler. I doubt anyone else would either. Gas is perfectly happy assembling AT&T syntax,

I agree with the above :slight_smile:

so the only assembler that Intel syntax mode needs to support is Mircosoft's ml.exe.

I agree that "the most useful assembler for intel syntax mode to support is microsoft's ml.exe", but I don't think it's true that it is the only one we can/should support. If there is little cost to adding NASM support (i.e. the code isn't horrible) and if someone produces a patch, we would be welcome it.

That said, support for ml.exe certainly sounds more *useful*. :slight_smile:

-Chris

It's not that I am dead set against supporting any other assembler, it's just these sort of discussions always seem to degenerate into "hey, we should support assembler X that I really like!", where X ranges over a large set of assemblers. The overwhelming majority of LLVM users couldn't care which assembler is used so long as everything "just works." Heck, they don't care if an assembler is used at all. In fact, they would be happier with LLVM if none were, as LLVM would then run faster. I'd rather all this energy was focused on producing object files directly instead of supporting every assembler out there. Gas and ML are the two most useful to support, and the third most useful is a very distant third and a pain to verify that it continues to work with each new release.

Ok, less talk and more action. I just implemented proper Microsoft ML/MASM support. It probably has a few rough edges, so if anyone wants to try it out please do and let me know if you encounter any problems.

Note that you cannot take a bytecode file created by llvm-gcc on Unix, move it to Windows, translate it to Intel syntax assembler, assemble it with ML and expect it to work. You'll get an object file, but it won't link. It used to work, but something changed to make the C runtime libraries incompatible. lli cannot run the bytecode file either for the same reason, nor will using CBE work anymore either.

But for those of us writing our own front ends, this isn't a problem.

Jeff Cohen wrote:

Hi Jeff,

It's not that I am dead set against supporting any other assembler,
it's just these sort of discussions always seem to degenerate into
"hey, we should support assembler X that I really like!", where X
ranges over a large set of assemblers.

Personally, I'm not fussed if the Intel assembly output can be assembled
with a particular assembler; I'm interested in having a `readable'
assembly output to examine what x86 code is being produced by the
back-end.

Speaking of which, has there ever been any consideration to producing
hex constants in the various text outputs for values that look better in
hex, e.g. 65280 is 0xff00. There's probably some tests that could be
done before default to decimal.

Cheers,

Ralph.

Ok, less talk and more action. I just implemented proper Microsoft ML/MASM support. It probably has a few rough edges, so if anyone wants to try it out please do and let me know if you encounter any problems.

Note that you cannot take a bytecode file created by llvm-gcc on Unix, move it to Windows, translate it to Intel syntax assembler, assemble it with ML and expect it to work. You'll get an object file, but it won't link. It used to work, but something changed to make the C runtime libraries incompatible. lli cannot run the bytecode file either for the same reason, nor will using CBE work anymore either.

But for those of us writing our own front ends, this isn't a problem.

I got the code I did from the mailing list.

It needs some reworking as I created another TableGen identity. Its attached.

If you can wait a week (as I am on another project at the moment) I will be able to do that.

Aaron

MASM.tar.gz (32.7 KB)

Already committed yesterday.

Aaron Gray wrote:

This looks different than what Jeff implemented, though he's far more of an expert on whether it is appropriate or not.

One request though Aaron: instead of attaching a bunch of files like this, please attach a cvs diff against CVS head. This makes it much easier for us to apply the changes and understand what you changed. To get a diff, use something like this from the top level of your llvm tree:

$ cvs diff -u

If you have other unrelated changes, you can use:

$ cvs diff -u lib/Target/X86/foo.cpp lib/Target/X86/blah.h ...

Thanks,

-Chris