lld-link with MSVC6 object files

Hi,

I have a question about lld-link. What obj file formats should it support? When I try to use an obj from msvc 6.0 it complains that the file magic is not valid.

However when running llvm-objdump it reports:

test1.obj:      file format COFF-i386

Disassembly of section .text:
0000000000000000 _main:
       0:       68 00 00 00 00  pushl   $0
       5:       e8 00 00 00 00  calll   0 <_main+0xa>
       d:       33 c0   xorl    %eax, %eax

f: c3 retl

Thanks,
Paul

MSVC 6 as in the Visual Studio released in 1989? Yes, I imagine that’s a bit outside the intended support window.

The CodeView library in LLVM only supports Codeview C13 types, that is, MSVC 7.0 / Visual Studio 2002 or after.

Envoyé : September 30, 2019 2:38 PM

MSVC 6 is 1998 not 1989 :slight_smile:

The latest MSVC linker can link these object files. Is this just because it has support for C13 types and some other code path for whatever MSVC6 uses? After some digging around it appears to be this format:

https://docs.microsoft.com/en-us/windows/win32/debug/pe-format#coff-file-header-object-and-image

Which is COFF object file format? Does lld link support this format?

MSVC 6 is 1998 not 1989 :slight_smile:

Ah, I just glanced briefly at the Wikipedia article ( https://en.wikipedia.org/wiki/Microsoft_Visual_C%2B%2B ) & misread the “C 6.0” and didn’t notice it was distinct from “Visual C++ 6.0” - thanks for the catch!

The latest MSVC linker can link these object files. Is this just because it has support for C13 types and some other code path for whatever MSVC6 uses? After some digging around it appears to be this format:

https://docs.microsoft.com/en-us/windows/win32/debug/pe-format#coff-file-header-object-and-image

Which is COFF object file format? Does lld link support this format?

COFF is still the windows object file format, and the Windows support in lld is COFF support, yeah. I guess there might be some format variations that haven’t been implemented in lld, though. It’s mostly an “on demand” sort of approach.

It sounds like perhaps it might mostly work with some tweaks - given its complaining about bad file magic. I’ll see if I can get lld-link to build locally and hack out the magic checks to see if it works.

I would expect it to be able to link the object file, even if it ignored debug info. It’s a bit strange that it complains about bad file magic.

It might be tricky to get debug information working and produce a valid PDB file since that is pretty old and the format has changed both with how it was stored in the object file itself as well as the format of the PDB file.

My guess is that the “magic” it’s complaining about is not the magic of the object file itself but rather the first 4 bytes of the .debug$S (or was it the .debug$T?) section. Perhaps a simple fix in this case is that instead of erroring out if we encounter an “older” magic, we just link as if debug info was not present to begin with.

This will at least make it work. If you want to actually consume the debug info though, you’re in for a fun ride :slight_smile:

Out of curiosity, why do you want to use lld-link with a compiler that was released 20 years ago?

I have the most edge of edge use cases :). I am recovering the lost source code to an application built with MSVC 6. However because I want to produce byte for byte exact output I need to ensure that the import table is in the same order as the original binary. Since the MSVC6 linker has no way of doing this I figured I could hack this feature into lld-link. I need to also set the PDB path in the debug data but a newer version of the MS linker can do this and I believe lld-link already supports this too.

I just tested building an object file with MSVC 6.0 and linking it with lld, and it mostly works fine.

At first I got errors like these though:

lld-link: error: /safeseh: hello.obj is not compatible with SEH

But by adding -safeseh:no, I was able to link the file just fine.

If the MSVC 6.0 built object file was built with debug info, I get lld warnings like these:

lld-link: warning: ignoring section .debug$S with unrecognized magic 0x2
lld-link: warning: ignoring section .debug$T with unrecognized magic 0x2

Is this what you got? Despite these, linking works (but you won't get a working debug info).

// Martin

I have the most edge of edge use cases :). I am recovering the lost source code to an application built with MSVC 6. However because I want to produce byte for byte exact output I need to ensure that the import table is in the same order as the original binary.

I’m not sure if I follow this part – if you build an executable using lld-link and compare it with an executable built with MSVC linker, they are almost always different. lld-link doesn’t attempt to produce the byte-wise same outputs as MSVC. So, if you want to compare lld-link-produced output, the other file needs to be built with lld-link too. But is that the case?

That isn’t the case but my idea is that I can hack a copy of lld-link to produce the same output. Since the other option is to use the MSVC6 linker which will do things like randomly re-order the order of imported functions and the likes. I can’t change that without doing something crazy like reverse engineering the linker and patching something in there to force a particular ordering. I suspect that the imported function order isn’t the only thing that it might change on a rebuild.

I think it would be quite hard to hack lld so that the linker produces the same output as Microsoft link.exe. Although lld can produce the semantically same executables as link.exe, every detail is different. If you are working on it as a long-term project, it is probably doable, but it doesn’t seem like it is something you can easily hack.

Have you considered disassembling the original binary and your new binary and compare the two as text files? If only imported functions are different, the text outputs will be mostly the same, and you would be able to tell if you succeeded recovering the source code.

Yeah ideally I wanted the tool chain to just produce the same binary. I suppose running a disassembly step could work to ensure that only offsets to imports have changed. But I think this would still give me issues with comparing data sections since offsets to constant strings and globals could also be swapped around too?

I believe in GCC this can be “fixed” by using a linker script. MSVC doesn’t have anything like this however.

I haven’t looked at lld-links binary output yet - but I would have imagined that the import table format and the way that global data is created must be done in the same way? It would just be orderings/lack of “rich” header and other things that lld-link does differently?

Well, given that link.exe does not produce deterministic output (i…e you run it twice and get different results), it would certainly be hard to make lld produce the same non-deterministic output :slight_smile:

But if you just want a linker that is compatible with link.exe and produces the same output compared to itself every time you run it, then yes you can probably get there with lld-link (although as mentioned earlier, if you want to produce a working PDB file you will have some pretty hairy work ahead of you)

Yeah ideally I wanted the tool chain to just produce the same binary. I suppose running a disassembly step could work to ensure that only offsets to imports have changed. But I think this would still give me issues with comparing data sections since offsets to constant strings and globals could also be swapped around too?

I believe in GCC this can be “fixed” by using a linker script. MSVC doesn’t have anything like this however.

I haven’t looked at lld-links binary output yet - but I would have imagined that the import table format and the way that global data is created must be done in the same way? It would just be orderings/lack of “rich” header and other things that lld-link does differently?

lld’s section layout is not the same as MSVC’s, and even if you make the section layout the same, there are still many things that are not the same. For example, executables contain string tables for string constants (e.g. imported symbol names), and two string tables that contains the same set of strings doesn’t have to contain the strings in the same order.