Win32 COFF Support patch 5 (the final patch in the saga!)

Attached is the 5th and final patch of the beginning of COFF support
for MC. It simply makes the X86 backend use it on Win32 targets and
tests it.

- Michael Spencer

ms-coff-patch-5.patch (22.9 KB)

You probably want to add Cygwin and MinGW32 Triples as well :-

case Triple::Win32:

  • case Triple::Cygwin:
  • case Triple::MinGW32:
    return new WindowsX86AsmBackend (T);

Aaron

Can someone test this on those platforms? That change would effect
quite a few people.

- Michael Spencer

It should not have any side effects as object emission is not used on these platforms and only if enabled.

Also ELF which is the default, which is not used on these platforms, so is invalid anyway.

Aaron

Daniel somehow replied to one of my previous commits on llvm-commits
instead of this thread.

Hi Michael,

Two minor notes:
--

diff --git a/lib/Target/X86/X86AsmBackend.cpp b/lib/Target/X86/X86AsmBackend.cpp
index 2cf65c1..02ac2be 100644
--- a/lib/Target/X86/X86AsmBackend.cpp
+++ b/lib/Target/X86/X86AsmBackend.cpp
@@ -14,6 +14,7 @@
#include "llvm/MC/MCAssembler.h"
#include "llvm/MC/MCExpr.h"
#include "llvm/MC/MCObjectWriter.h"
+#include "llvm/MC/MCSectionCOFF.h"
#include "llvm/MC/MCSectionELF.h"
#include "llvm/MC/MCSectionMachO.h"
#include "llvm/MC/MachObjectWriter.h"
@@ -212,6 +213,24 @@ public:
: ELFX86AsmBackend(T) {}
};

+class WindowsX86AsmBackend : public X86AsmBackend {
+public:
+ WindowsX86AsmBackend(const Target &T)
+ : X86AsmBackend(T) {
+ HasAbsolutizedSet = true;
+ HasScatteredSymbols = true;

These probably should be false for Win32.

What do they do?

diff --git a/test/MC/COFF/dg.exp b/test/MC/COFF/dg.exp
new file mode 100644
index 0000000..7b7bd4e
--- /dev/null
+++ b/test/MC/COFF/dg.exp
@@ -0,0 +1,5 @@
+load_lib llvm.exp
+
+if { [llvm_supports_target X86] } {
+ RunLLVMTests [lsort [glob -nocomplain $srcdir/$subdir/*.{ll}]]
+}

This should be *.{s}.

This has to be *.{ll} for now because llvm-mc cannot parse COFF .s files.

--

Otherwise looks good to me!

- Daniel

- Michael Spencer

Michael,

Thanks for taking the time to get this into the mainline.

  • Nathan

Daniel somehow replied to one of my previous commits on llvm-commits
instead of this thread.

Hi Michael,

Two minor notes:
--

diff --git a/lib/Target/X86/X86AsmBackend.cpp b/lib/Target/X86/X86AsmBackend.cpp
index 2cf65c1..02ac2be 100644
--- a/lib/Target/X86/X86AsmBackend.cpp
+++ b/lib/Target/X86/X86AsmBackend.cpp
@@ -14,6 +14,7 @@
#include "llvm/MC/MCAssembler.h"
#include "llvm/MC/MCExpr.h"
#include "llvm/MC/MCObjectWriter.h"
+#include "llvm/MC/MCSectionCOFF.h"
#include "llvm/MC/MCSectionELF.h"
#include "llvm/MC/MCSectionMachO.h"
#include "llvm/MC/MachObjectWriter.h"
@@ -212,6 +213,24 @@ public:
: ELFX86AsmBackend(T) {}
};

+class WindowsX86AsmBackend : public X86AsmBackend {
+public:
+ WindowsX86AsmBackend(const Target &T)
+ : X86AsmBackend(T) {
+ HasAbsolutizedSet = true;
+ HasScatteredSymbols = true;

These probably should be false for Win32.

What do they do?

You don't really want to know, but:
(1) The Darwin tools conspire to allow the linker to reorder code, do
dead code stripping, etc. What this means is that any two "atoms" in
an assembly file might be moved in the final linked image. Therefore,
you can't expect that things like 'A - B' are constants.

(2) "Absolutized set" is something where Darwin 'as' will take a
directive like '.set foo, A - B', and make foo be the current absolute
expression for A - B, ignoring the fact that A and B might move. In
practice, this is because the assembler wasn't smart enough to figure
out which situations it could reliably compute A - B. This is related
to (1).

These two things end up impacting relaxation and the assembler
backend, which is why they unfortunately get exposed at this level.

diff --git a/test/MC/COFF/dg.exp b/test/MC/COFF/dg.exp
new file mode 100644
index 0000000..7b7bd4e
--- /dev/null
+++ b/test/MC/COFF/dg.exp
@@ -0,0 +1,5 @@
+load_lib llvm.exp
+
+if { [llvm_supports_target X86] } {
+ RunLLVMTests [lsort [glob -nocomplain $srcdir/$subdir/*.{ll}]]
+}

This should be *.{s}.

This has to be *.{ll} for now because llvm-mc cannot parse COFF .s files.

Oh, right.

- Daniel

Daniel somehow replied to one of my previous commits on llvm-commits
instead of this thread.

Hi Michael,

Two minor notes:
--

diff --git a/lib/Target/X86/X86AsmBackend.cpp b/lib/Target/X86/X86AsmBackend.cpp
index 2cf65c1..02ac2be 100644
--- a/lib/Target/X86/X86AsmBackend.cpp
+++ b/lib/Target/X86/X86AsmBackend.cpp
@@ -14,6 +14,7 @@
#include "llvm/MC/MCAssembler.h"
#include "llvm/MC/MCExpr.h"
#include "llvm/MC/MCObjectWriter.h"
+#include "llvm/MC/MCSectionCOFF.h"
#include "llvm/MC/MCSectionELF.h"
#include "llvm/MC/MCSectionMachO.h"
#include "llvm/MC/MachObjectWriter.h"
@@ -212,6 +213,24 @@ public:
: ELFX86AsmBackend(T) {}
};

+class WindowsX86AsmBackend : public X86AsmBackend {
+public:
+ WindowsX86AsmBackend(const Target &T)
+ : X86AsmBackend(T) {
+ HasAbsolutizedSet = true;
+ HasScatteredSymbols = true;

These probably should be false for Win32.

What do they do?

You don't really want to know, but:
(1) The Darwin tools conspire to allow the linker to reorder code, do
dead code stripping, etc. What this means is that any two "atoms" in
an assembly file might be moved in the final linked image. Therefore,
you can't expect that things like 'A - B' are constants.

(2) "Absolutized set" is something where Darwin 'as' will take a
directive like '.set foo, A - B', and make foo be the current absolute
expression for A - B, ignoring the fact that A and B might move. In
practice, this is because the assembler wasn't smart enough to figure
out which situations it could reliably compute A - B. This is related
to (1).

These two things end up impacting relaxation and the assembler
backend, which is why they unfortunately get exposed at this level.

I tried setting these to false, but the resulting executable (made
with link.exe) crashed. I looked at the differences in the object
file, but couldn't figure out what was causing it. I'm guessing
there's a bug in the path when these are false.

Now that I know what these flags do, I'll try and figure out exactly
what the problem is.

- Michael Spencer

Daniel somehow replied to one of my previous commits on llvm-commits
instead of this thread.

Hi Michael,

Two minor notes:
--

diff --git a/lib/Target/X86/X86AsmBackend.cpp b/lib/Target/X86/X86AsmBackend.cpp
index 2cf65c1..02ac2be 100644
--- a/lib/Target/X86/X86AsmBackend.cpp
+++ b/lib/Target/X86/X86AsmBackend.cpp
@@ -14,6 +14,7 @@
#include "llvm/MC/MCAssembler.h"
#include "llvm/MC/MCExpr.h"
#include "llvm/MC/MCObjectWriter.h"
+#include "llvm/MC/MCSectionCOFF.h"
#include "llvm/MC/MCSectionELF.h"
#include "llvm/MC/MCSectionMachO.h"
#include "llvm/MC/MachObjectWriter.h"
@@ -212,6 +213,24 @@ public:
: ELFX86AsmBackend(T) {}
};

+class WindowsX86AsmBackend : public X86AsmBackend {
+public:
+ WindowsX86AsmBackend(const Target &T)
+ : X86AsmBackend(T) {
+ HasAbsolutizedSet = true;
+ HasScatteredSymbols = true;

These probably should be false for Win32.

What do they do?

You don't really want to know, but:
(1) The Darwin tools conspire to allow the linker to reorder code, do
dead code stripping, etc. What this means is that any two "atoms" in
an assembly file might be moved in the final linked image. Therefore,
you can't expect that things like 'A - B' are constants.

(2) "Absolutized set" is something where Darwin 'as' will take a
directive like '.set foo, A - B', and make foo be the current absolute
expression for A - B, ignoring the fact that A and B might move. In
practice, this is because the assembler wasn't smart enough to figure
out which situations it could reliably compute A - B. This is related
to (1).

These two things end up impacting relaxation and the assembler
backend, which is why they unfortunately get exposed at this level.

I tried setting these to false, but the resulting executable (made
with link.exe) crashed. I looked at the differences in the object
file, but couldn't figure out what was causing it. I'm guessing
there's a bug in the path when these are false.

Now that I know what these flags do, I'll try and figure out exactly
what the problem is.

If I had to guess, this is probably because the assembler isn't
relaxing something that should be. Scattered symbols will hide this,
because they will force most things to be relaxed, but I think we
probably need a new bit to model the actual semantics in play. I've
paged this stuff out, but I think the semantics the current
implementation isn't modeling is the possible relocation of static
symbols.

If you try your example without the scattered symbols bit, but with
-mc-relax-all, does the resulting binary work?

- Daniel

Can't do that with llc, but I'll just go change the code to force that
option in MC for this test.

Should the mc options be added to llc?

- Michael Spencer

I tried setting these to false, but the resulting executable (made
with link.exe) crashed. I looked at the differences in the object
file, but couldn't figure out what was causing it. I'm guessing
there's a bug in the path when these are false.

Now that I know what these flags do, I'll try and figure out exactly
what the problem is.

If I had to guess, this is probably because the assembler isn't
relaxing something that should be. Scattered symbols will hide this,
because they will force most things to be relaxed, but I think we
probably need a new bit to model the actual semantics in play. I've
paged this stuff out, but I think the semantics the current
implementation isn't modeling is the possible relocation of static
symbols.

If you try your example without the scattered symbols bit, but with
-mc-relax-all, does the resulting binary work?

- Daniel

Can't do that with llc, but I'll just go change the code to force that
option in MC for this test.

Err, I think it would be better to keep ScatteredSymbols=true than do
that. At least that is more likely to generate correct code. Adding a
FIXME above it is a fine start.

Should the mc options be added to llc?

Sure, if you like.

- Daniel

I tried setting these to false, but the resulting executable (made
with link.exe) crashed. I looked at the differences in the object
file, but couldn't figure out what was causing it. I'm guessing
there's a bug in the path when these are false.

Now that I know what these flags do, I'll try and figure out exactly
what the problem is.

If I had to guess, this is probably because the assembler isn't
relaxing something that should be. Scattered symbols will hide this,
because they will force most things to be relaxed, but I think we
probably need a new bit to model the actual semantics in play. I've
paged this stuff out, but I think the semantics the current
implementation isn't modeling is the possible relocation of static
symbols.

If you try your example without the scattered symbols bit, but with
-mc-relax-all, does the resulting binary work?

- Daniel

Can't do that with llc, but I'll just go change the code to force that
option in MC for this test.

Err, I think it would be better to keep ScatteredSymbols=true than do
that. At least that is more likely to generate correct code. Adding a
FIXME above it is a fine start.

I meant as a way to get the same behavior as -mc-relax-all on my local
tree, not to commit.

Should the mc options be added to llc?

Sure, if you like.

Will do.

- Michael Spencer

I tested it all out. Setting relax all to true doesn't fix the
problem, however, HasAbsolutizedSet was unneeded.

- Michael Spencer