Win32 COFF Support

Hi All,

I have created a minimally functional Win32 COFF Exporter using the new MC framework. I made some minor changes to other libraries to allow me to plug it in without building it as part of the LLVM project. I wanted to share it but wasn’t sure how to go about doing so, so I have attached the code to this message.

Any feedback on would be appreciated.

core-changes.patch (9.06 KB)

COFF.7z (5.8 KB)

Hi Nathan,

A couple of points :-

  1. Your code seems incomplete, there is nothing envoking ‘createWin32CoffStreamer()’ in you patch.
  2. You need to look at LLVM coding standards, i.e. maximum line length of 80 characters and tabs of 2. You need to look at the following document and existing code. Usage of spaces is quite strict.

http://llvm.org/docs/CodingStandards.html

I too have been working on a COFFWriter but with little success. Got a basic COFFWriter working but have had problems with the MC framework in regards to getting the right information to generate code properly.

It will be good if we can get something working first, so send me a proper change set, read this :-

http://llvm.org/docs/DeveloperPolicy.html

I generally use :-

utils/mkpatch patchname

from the root svn directory with out the ‘.patch’ to make patches. You need to include any files from ‘svn status’ with question marks, by doing an ‘svn add filename’.

Thanks for submitting a patch, most patches are welcomed, but they may need to be honed a bit.

Aaron

Hello, Nathan

First of all, I would like to mention that COFF MC backend will be a
really good contribution to LLVM!

I have created a minimally functional Win32 COFF Exporter using the new MC
framework. I made some minor changes to other libraries to allow me to plug
it in without building it as part of the LLVM project. I wanted to share it
but wasn't sure how to go about doing so, so I have attached the code to
this message.
Any feedback on would be appreciated.

As it was already mentioned before you should definitely read & follow
LLVM coding standards.

I'll let others comment on MC hooks, I just made a very brief look
over the contents of the archive. You can find the comments below.

First of all, please include all the files as attachments. If you
decide to put stuff into archive use something standard, e.g. .zip or
.tar.gz.

coff.h:
Please document the source of the information for this file - some
documents, links will be really good. If you copied stuff out of
somewhere - there should be some license disclaimer, etc.

#pragma once

Use standard include guards

       typedef unsigned char uint8;
       typedef unsigned short uint16;
       typedef unsigned long uint32;

Do not reinvent the wheel - use standard portable types e.g. uint8_t
and friends. "unsigned long" is not 32 bit everywhere.

#pragma pack (push, 1)

Do not use these pragma's. Not all compilers support them.

       struct symbol

Could you please add the comments describing the entities involved?

              typedef std::map <std::string, size_t> map;

Please read LLVM Programmer’s Manual — LLVM 18.0.0git documentation and use the
data structures already provided by LLVM. E.g.
LLVM Programmer’s Manual — LLVM 18.0.0git documentation should
definitely be used here.

#ifdef _MSC_VER
#include "stdafx.h"
#endif
#define report_fatal_error_dbg(x) __debugbreak ()

Generally it's prohibited to use target/compiler-specific defines
outside of libSystem.

       struct MCWin32CoffObjectWriter :
               MCObjectWriter,
               coff::file

Any reason for such inheritance?

MCSectionCOFF const & Section = dynamic_cast <MCSectionCOFF const &> (SectionData.getSection ());

LLVM normally compiles w/o RTTI, so this won't work

                               CoffSection->Header.Characteristics |=
                                       coff::IMAGE_SCN_CNT_CODE |
                                       coff::IMAGE_SCN_MEM_READ |
                                       coff::IMAGE_SCN_MEM_EXECUTE |
                                       false;

Was "| false" intentional? Why?

                       default:
                               report_fatal_error_dbg ("unsupported section alignment");

Use llvm_unreachable()

                       for (SmallVector<char, Len>::const_iterator i = Data.begin (); i != Data.end (); i++)
                       {
                               if (i != Data.begin ())
                                       dbgout (' ');

See llvm/Support/Debug.h for standard way of doing debug output in
LLVM code. There are bunch of examples in many places.

Hello, Nathan
I’ll let others comment on MC hooks, I just made a very brief look
over the contents of the archive. You can find the comments below.

Yes I looked at this and there are several ways but TargetAsmBackend looked the bast place to deal with the MCStreamer choice for different TargetTriples.

for (SmallVector<char, Len>::const_iterator i = Data.begin (); i != Data.end (); i++)
{
if (i != Data.begin ())
dbgout (’ ');
See llvm/Support/Debug.h for standard way of doing debug output in
LLVM code. There are bunch of examples in many places.

See :-

http://llvm.org/docs/ProgrammersManual.html#DEBUG

Also trailing commas on enums cause errors on GCC. There are more errors on GCC as well which I will forward you ASAIC.

Aaron

Thanks for both of your feedback, I will attempt to make it fit LLVM coding standards better before I resubmit my work.

As far as the hooks I put in, they are really only there so I could build the object exporter outside of LLVM, I think having it built directly in like others is preferable in the end. I do think it would be useful if projects had the option of extending supported formats without having to modify LLVM itself, but my changes main goal was minimal impact on LLVM sources.

Thanks for both of your feedback, I will attempt to make it fit LLVM coding standards better before I resubmit my work.

You can send me early drafts to run past GCC, send me something that works.

As far as the hooks I put in, they are really only there so I could build the object exporter outside of LLVM, I think having it built directly in like others is preferable in the end. I do think it would be useful if projects had the option of extending supported formats without having to modify LLVM itself, but my changes main goal was minimal impact on LLVM sources.

AFAICT The hooks you sent did not call your code at all.

Have a look at the MachOWriter, this is designed to be extendable to other writers. The MCStreamer stuff just needs the same doing probably in TargetAsmBackend. It would be good to have this as a separate patch. This will allow both COFFWriter and ELFWriter as well as the MachOWriter to coexist and be selected by TargetTriple. BTW could you have your Writer and Streamer in different source code files, and follow the MachO code format as far as is possible/reasonable.

JTFYI The COFF code I have written consists of Reader and Writer of COFF internal object modules to/from COFF format. I am looking at producing a linker with the addition of a PE Writer class, and Library Reader class. This should then make LLVM and CLANG more independant on the Windows Platform. I have a COFF Dumper class too which works with the reader and the internal Object Module format.

Regards,

Aaron

Thanks for both of your feedback, I will attempt to make it fit LLVM coding standards better before I resubmit my work.

You can send me early drafts to run past GCC, send me something that works.

I will test against GCC on my ubuntu virtual machine.

As far as the hooks I put in, they are really only there so I could build the object exporter outside of LLVM, I think having it built directly in like others is preferable in the end. I do think it would be useful if projects had the option of extending supported formats without having to modify LLVM itself, but my changes main goal was minimal impact on LLVM sources.

AFAICT The hooks you sent did not call your code at all.

The hooks I put in place allow my compiler to specify its own MCStreamer factory function that TargetMachine::addPassesToEmitFile calls when its time to create the streamer object. I also made MCAssembler use a virtual function to construct the MCObjectWriter which I then override in my streamer to make it construct my object writer. To me, this was important to allow the streamer & object writer to be built as part of my compiler and not require me to run a build of LLVM every time I changed something. This aproach also allows me to keep this code in source control as I bounce between development machines multiple times a day.

Have a look at the MachOWriter, this is designed to be extendable to other writers. The MCStreamer stuff just needs the same doing probably in TargetAsmBackend. It would be good to have this as a separate patch. This will allow both COFFWriter and ELFWriter as well as the MachOWriter to coexist and be selected by TargetTriple. BTW could you have your Writer and Streamer in different source code files, and follow the MachO code format as far as is possible/reasonable.

I have already re-factored the code a bit to better fit LLVM coding standard. The writer & streamer are now two different source files, and code for the coff::file class has been moved directly into the writer instead of multiply inheriting from it.

JTFYI The COFF code I have written consists of Reader and Writer of COFF internal object modules to/from COFF format. I am looking at producing a linker with the addition of a PE Writer class, and Library Reader class. This should then make LLVM and CLANG more independant on the Windows Platform. I have a COFF Dumper class too which works with the reader and the internal Object Module format.

Some time ago, I wrote a simple COFF based linker that would link into series of object files into a memory buffer then write out the linked and loaded image to a bin file for use with a OS project I was playing around with. I revived it to help me test some problems I was having with relocations. Perhaps some of that code could be useful.

hopefully, I can have a version that will compile as part of LLVM later today

Thanks,
Nathan

I cleaned up my code a bit, and integrated into the LLVM sources. While this code works for my purposes, it needs a bit more work before its generally useful.

Two aspects I think will cause the most trouble are not having support for uninitialized data sections, and common symbols. Uninitialized data sections should be relatively simple to add, but its not clear to me how common symbols should be implemented.

For normal symbols, I assume they are contained within the current section. From what I understand, COFF common symbols must be in their own section. It can’t tell if it is guaranteed that MCStreamer::SwitchSection will be called after the emission of a common symbol thus giving the exporter an opportunity to switch back to the section for normal symbols. On the other hand, there is a section pointer associated with the symbol, if this is reliable, what is the need for MCStreamer::SwitchSection, is it (or should it be) possible to emit data or code not tied to a symbol?

  • Nathan

win32coff.patch (29.8 KB)

Thanks, applied in r103150! llvm-mc -filetype=obj probably needs a similar patch.

cool!, I will make that change and submit it too.

Yes probably, I don’t know what the .def and .scl directives “do” :slight_smile:

I think I can figure out the right thing to do here.

Also, w.r.t. section handling stuff, there is this fixme in the asmprinter:

} else if (const char *LinkOnce = MAI->getLinkOnceDirective()) {
// .globl _foo
OutStreamer.EmitSymbolAttribute(GVSym, MCSA_Global);
// FIXME: linkonce should be a section attribute, handled by COFF Section
// assignment.
// http://sourceware.org/binutils/docs-2.20/as/Linkonce.html#Linkonce
// .linkonce discard
// FIXME: It would be nice to use .linkonce samesize for non-common
// globals.
OutStreamer.EmitRawText(StringRef(LinkOnce));
} else {

Basically, it seems like the MCSectionCOFF implementation should get the linkonce bit. I’m not an expert on COFF, but I think that’s the right place for it. “.linkonce” will also have to be added as a new MCStreamer api, which would set the bit.

linkonce is one option for COFF COMDAT sections. I think that this call should not exist at all, instead, when the AsmPrinter asks the TLOF what section to use for the global, TLOF should create a new unique section with the bit set already.

One thing that I don’t understand is why common symbols do not get sections assigned to them. I ended up assigning them to the ‘.bss’ section in my streamer (which I now know was wrong) but it still seems like they should be part of some section. At least in COFF, they will be handled as a COMDAT just like symbols with the linkonce linkage type.

I guess this all goes back to the assembly language view of things. I think I have enough information to handle these things correctly now.

Thanks for the feedback.

  • Nathan

Thanks, applied in r103150! llvm-mc -filetype=obj probably needs a similar patch.

cool!, I will make that change and submit it too.

Thanks!

Also, w.r.t. section handling stuff, there is this fixme in the asmprinter:

} else if (const char *LinkOnce = MAI->getLinkOnceDirective()) {
// .globl _foo
OutStreamer.EmitSymbolAttribute(GVSym, MCSA_Global);
// FIXME: linkonce should be a section attribute, handled by COFF Section
// assignment.
// http://sourceware.org/binutils/docs-2.20/as/Linkonce.html#Linkonce
// .linkonce discard
// FIXME: It would be nice to use .linkonce samesize for non-common
// globals.
OutStreamer.EmitRawText(StringRef(LinkOnce));
} else {

Basically, it seems like the MCSectionCOFF implementation should get the linkonce bit. I’m not an expert on COFF, but I think that’s the right place for it. “.linkonce” will also have to be added as a new MCStreamer api, which would set the bit.

linkonce is one option for COFF COMDAT sections. I think that this call should not exist at all, instead, when the AsmPrinter asks the TLOF what section to use for the global, TLOF should create a new unique section with the bit set already.

Yeah, I completely agree. That’s what I was thinking when I wrote that comment :). However, doing it that way would break .s emission, which is “bad” :slight_smile:

Also, to implement a COFF assembler, we’d need to support the .linkonce directive. To me, it seems most straight-forward to just support .linkonce as a MCStreamer callback.

One thing that I don’t understand is why common symbols do not get sections assigned to them. I ended up assigning them to the ‘.bss’ section in my streamer (which I now know was wrong) but it still seems like they should be part of some section. At least in COFF, they will be handled as a COMDAT just like symbols with the linkonce linkage type.

I guess this all goes back to the assembly language view of things. I think I have enough information to handle these things correctly now.

I’m not sure (all I know about COFF is what’s on this page: http://www.nondot.org/sabre/os/articles/ExecutableFileFormats/) but you can find out by making a .s file, running it through a coff assembler (like gas) and seeing what bits it produces. When working on the macho assembler, we found it very very useful to produce bitwise identical output to the system assembler, allowing us to just diff the .o files.

-Chris

Thanks, applied in r103150! llvm-mc -filetype=obj probably needs a similar patch.

cool!, I will make that change and submit it too.

Thanks!

After looking into this, I have found that AsmParser is hard coded to use MachO sections which causes my code to segfault, reworking this code to use TargetLoweringObjectFile or some equivalent is a larger change than I am comfortable with at the moment.

On the other hand, llc, clang, and (according to Aaron) llvm-gcc all work with my COFF backend, so I will continue to work on it using one of those to test it.

  • Nathan

Ok, sounds reasonable.

-Chris

...
Thanks, applied in r103150! llvm-mc -filetype=obj probably needs a
similar patch.

cool!, I will make that change and submit it too.

Thanks!

After looking into this, I have found that AsmParser is hard coded to use
MachO sections which causes my code to segfault, reworking this code to use
TargetLoweringObjectFile or some equivalent is a larger change than I am
comfortable with at the moment.

This is a bug. Feel free to file a PR for it, I will *try* to get to
it, but it may take me a while (>1 month).

- Daniel

Hi All,

After a week of working with several people on this forum, I have a much more functional version of my Win32 COFF object file support. I believe this should cover most use cases. Attached is a patch file against r103336. Feedback is welcome.

Thanks,
Nathan

changes.patch (63.4 KB)