GCC assembler rejects native code generated by LLVM

I successfully compiled CVS HEAD yesterday on my win32 machine using
Visual C++ Express (2005). I also have Mingw tools installed. I wrote
a simple hello world application and generated native assembly code
using llvm. When I tried to feed the code into GCC, it rejected it
with "junk at the end of line" error messages. Shouldn't GCC be able
to assemble this code? I realize win32 port isn't complete but I was
under the impression that this should work.

Thanks.

I'm not sure what the issue is, but if you send in the exact .s file produced and the errors you're getting, we can fix it. :slight_smile:

-Chris

I'm at work so I don't have exact details at the moment. I'll send the
.s file and the errors tonight.

Thanks for the quick reply!

I'm confused. My understanding is that Visual C++ Express does not include Visual Studio, which is required to build LLVM. Anyway, assembly code generation is not yet supported using the Microsoft tool chain (as documented in the Getting Started with VS page), and when it is it will be with NASMW and not GCC. Not that any of this explains the "junk at the end of line" you got.

Vyacheslav Akhmechet wrote:

I'm confused. My understanding is that Visual C++ Express does not
include Visual Studio, which is required to build LLVM.

Well, Visual C++ Express is a cut down version of Visual Studio. I'm
not sure about exact differences between editions but Visual C++
Express does read the .sln files and comes with an excellent C++
compiler. I didn't encounter any problems building llvm or running
various tools.

Anyway,
assembly code generation is not yet supported using the Microsoft tool
chain (as documented in the Getting Started with VS page)

I've seen that document. It suggests that one cannot assemble native
executables using MS tools because they don't come with an assembler.
Another document (don't remember which one, exactly) says native code
generation is supported on Windows using Mingw tools. This is exactly
what I'm trying to do: assemble an executable using Mingw. I don't see
why it should matter whether llvm tools are built using gcc or MSVC,
the generated assembly code should be the same, shouldn't it?

Just to add a little more to this discussion ...

As part of the Cygwin port I was doing, I managed to get a similar (or
possibly exactly) the same symptom: assembly output was using keywords
or extensions that the GNU assembler didn't recognize. This was caused
by using the wrong assembly style (intel vs. at&t). To get this to work
correctly, I had to teach LLVM that for "cygwin" platforms it should
generate at&t style x86 assembly. The same simple change might fix the
problem for mingw .. or this could be a red herring. I'll comment again
on this once I see the output from Vyacheslav ..

Reid.

Ok, I got home so I have more details. Here's the sample C program:
----------------- C program ---------------
   #include <stdio.h>
   int main() {
     printf("hello world\n");
     return 0;
   }
------------- end C program -------------

This is compiled using llvm online demo into the following llvm code
(target removed):
----------------- LLVM code --------------
deplibs = [ "stdc++", "c", "crtend" ]
%.str_1 = internal constant [13 x sbyte] c"hello world\0A\00"; <[13 x
sbyte]*> [#uses=1]

implementation ; Functions:

declare int %printf(sbyte*, ...)

int %main() {
entry:
call void %__main( )
%tmp.0 = call int (sbyte*, ...)* %printf( sbyte* getelementptr ([13 x
sbyte]* %.str_1, int 0, int 0) ); <int> [#uses=0]
ret int 0
}

declare void %__main()
------------- End LLVM code -----------

which in turn produces the following assembly code:

------------- Assembly code -------------
  .text
  .align 16
  .globl main
  .type main, @function
main:
  subl $12, %esp
  fnstcw 10(%esp)
  movb $2, 11(%esp)
  fldcw 10(%esp)
  call __main
  movl $l1__2E_str_1, %eax
  movl %eax, (%esp)
  call printf
  movl $0, %eax
  #IMPLICIT_USE
  addl $12, %esp
  ret

  .data
  .align 1
  .type l1__2E_str_1,@object
  .size l1__2E_str_1,13
l1__2E_str_1: # [13 x sbyte]* %.str_1 = c"hello world\0A\00"
  .ascii "hello world\n\000"
---------- End assembly code ----------

When I try to assemble the above code using
gcc hello.c.s -o hello.exe
I get the following errors:

hello.c.s: Assembler messages:
hello.c.s:6: Warning: .type pseudo-op used outside of .def/.endef ignored.
hello.c.s:6: Error: junk at end of line, first unrecognized character is `m'
hello.c.s:24: Warning: .type pseudo-op used outside of .def/.endef ignored.
hello.c.s:24: Error: junk at end of line, first unrecognized character is `l'
hello.c.s:25: Warning: .size pseudo-op used outside of .def/.endef ignored.
hello.c.s:25: Error: junk at end of line, first unrecognized character is `l'

Sorry for the long email. I attach all relevant files for clarity.

hello.c.s (442 Bytes)

hello.c (406 Bytes)

Vyacheslav,

This is the same problem that I had with Cygwin .. nearly identical.
The issue was documented in PR492 if you want some background. I'm
currently trying to dig up what I did to fix this in December for Cygwin
and see if I can apply the same change for mingw.

Reid.

Vyacheslav,
This is the same problem that I had with Cygwin .. nearly identical.
The issue was documented in PR492 if you want some background. I'm
currently trying to dig up what I did to fix this in December for Cygwin
and see if I can apply the same change for mingw.

You could extend the X86 backend's "getModuleMatchQuality" to match for both mingw and cygwin, instead of just cygwin...

-Chris

Vyacheslav,

I've tracked down the change and I have a fix for you to test. The
attached patch should be applied to the CVS head (version 1.132) of
X86AsmPrinter.cpp in llvm/lib/Target/X86. The patch just includes MINGW
targets in the same set of choices that it makes for Cygwin. Could you
please try the patch and let me know if it solves your problem? If it
does, I'll commit the patch.

Thanks,

Reid.

patch.txt (303 Bytes)

My first patch was a little premature, please use this one.

patch.txt (498 Bytes)

Reid,

This patch won't work for me. I compile llvm toolset with MSVC Express
(hence __MINGW32__ won't be defined for me at compile time). I only
try to feed the generated assembly into gcc (pretty much gnu
assembler, I suppose). I don't use mingw tools at the earlier stage.
However it's obvious for me how to modify the code now (just add MSVC
at that line), thanks! I'll try it as soon as I can.

Thanks,
- Vyacheslav.

Ok, I just tried the patch with some modifications (added msvc target
and used WIN32 instead of __MINGW32__ for preprocessor) and everything
worked beautifully. Thanks for the help!

Yes, but it won't work in the future because the VC++ build will use Intel syntax, not AT&T.

If you have mingw installed, why not use it to build LLVM? It's a lot more functional. Mixing and matching Microsoft and GNU tool chains is not good for your sanity.

Vyacheslav Akhmechet wrote:

I'm trying to turn some GenericValues into Constants in the interpreter
using code like this, in a switch statement:

     case Type::IntTyID:
         SI = ConstantSInt::get(FB->getType(), ArgVals[i].IntVal);
         params.push_back(UI);
         UI->dump(); //for testing
     break;

FB is a Function::ArgumentListType::iterator
ArgVals is a std::vector<GenericValue>
the switch is on FB->getType()->getTypeID()
so basically what I am doing is iterating through the formal argument list
and using that to know how to convert the GenericValues
(on a side note I can't get it to create iterators for ArgVals, hence the
often n time operator)

The dump() call causes a segfault and I can't figure out why. I looked at
the implementation of get and it is supposed to create the constant if it
does not already exist in the module, so I'm at a complete loss.

Nevermind, I got burned by cut and paste, this works perfectly.

Ok, I just tried the patch with some modifications (added msvc target
and used WIN32 instead of __MINGW32__ for preprocessor) and everything
worked beautifully. Thanks for the help!

Did you actually try the previous patch? If you compiled llvm-gcc with mingw, it should work, regardless of the compiler you use to compile the LLVM X86 backend with.

-Chris

Yes, but it won't work in the future because the VC++ build will use
Intel syntax, not AT&T.

I'm curious, why did you make that decision? It looks like the
infrastructure already supports GNU Assembler perfectly even under an
MSVC build. I understand that you may want to move Win32 users to NASM
but why would you impose artificial limitations? Why not let the users
control the native assembly syntax from the command line/API?

If you have mingw installed, why not use it to build LLVM? It's a lot
more functional. Mixing and matching Microsoft and GNU tool chains is
not good for your sanity.

I have very strong antipathy towards GNU build tools. I try to avoid
them whenever I can. Besides, it looked like another MSVC tester could
help advance the state of LLVM on that platform so I decided to give
it a try. Anyway, mixing and matching didn't cause me any problems.
After making this small fix everything worked like a charm.

Did you actually try the previous patch? If you compiled llvm-gcc with
mingw, it should work, regardless of the compiler you use to compile
the LLVM X86 backend with.

I didn't build llvm-gcc. I just used the front end provided by the
online demo on LLVM's webpage.

Ah, ok, well that won't work, at least not elegantly, because the web machine is a linux box. :slight_smile: If you edit the target-triple at the top of the file (s/linux/cygwin/), it will sorta work, but even then only for trivial examples. If you #include a standard library header, the .ll file will refer to glibc internal details that you won't have on windows...

-Chris