Trying out lld to link windows binaries (using msvc as a compiler)

Hi all,

At work I have been experimenting with linking with lld, I can give all sorts of information about the performance so far if someone wants to hear it, but for a large binary (around 100mb and 800mb of symbols) it takes considerably more memory and time than the vs 2015 and 2017 linker.

What I’ve been playing around is the idea of creating a tool that generate those DEBUG:GHASH sections on .obj files already created, I saw code for reading them in lld and started looking at the codeview part of clang, is there any documentation around that, and would anyone else have interest in such a tool? I wish we could just jump shit to clang but right now that is not a possibility (It mostly revolves around windows.h and the copy of it we mantain).

ps: I have two patches that I needed to get it to finish linking I can submit them monday.

-lldb-dev, +llvm-dev

lldb-dev is specifically for the llvm debugger, and llvm-dev is for everything else (including lld)

Did you compile the object files with clang and use -mllvm -emit-codeview-ghash-section?

It sounds like probably not. If you don’t do that /DEBUG:GHASH will indeed be much slower. I wonder if we should emit a linker diagnostic in this case

Hi,

No I didn’t, I used cl.exe from the visual studio toolchain. What I’m proposing is a tool for processing .obj files in COFF format, reading them and generating the GHASH part.

To make our build faster we use hundreds of unity build files (.cpp’s with a lot of other .cpp’s in them aka munch files) but still have a lot of single .cpp’s as well (in total something like 3.4k .obj files).

ps: sorry for sending to the wrong list, I was reading about llvm mailing lists and jumped when I saw what I thought was a lld exclusive list.

Zachary,

Do you generally recommend using -mllvm -emit-codeview-ghash-section and /DEBUG:GHASH when linking for Windows?

Hi,

No I didn’t, I used cl.exe from the visual studio toolchain. What I’m proposing is a tool for processing .obj files in COFF format, reading them and generating the GHASH part.

To make our build faster we use hundreds of unity build files (.cpp’s with a lot of other .cpp’s in them aka munch files) but still have a lot of single .cpp’s as well (in total something like 3.4k .obj files).

ps: sorry for sending to the wrong list, I was reading about llvm mailing lists and jumped when I saw what I thought was a lld exclusive list.

A tool like this would be useful, yes. We’ve talked about it internally as well and agreed it would be useful, we just haven’t prioritized it. If you’re interested in submitting a patch along those lines though, I think it would be a good addition.

I’m not sure what the best place for it would be. llvm-readobj and llvm-objdump seem like obvious choices, but they are intended to be read-only, so perhaps they wouldn’t be a good fit.

llvm-pdbutil is kind of a hodgepodge of everything else related to PDBs and symbols, so I wouldn’t be opposed to making a new subcommand there called “ghash” or something that could process an object file and output a new object file with a .debug$H section.

A third option would be to make a new tool for it.

I don’t htink it would be that hard to write. If you’re interested in trying to make a patch for this, I can offer some guidance on where to look in the code. Otherwise it’s something that we’ll probably get to, I’m just not sure when.

I definitely want you to try it out and let me know how it goes :slight_smile: It’s behind the undocumented -emit-codeview-ghash-section for now until we have some more data indicating it’s safe to make default. I’d like to eventually make it the default, so that you won’t have to specify anything on the compiler side other than /Z[i|7].

So far I am not aware of any downsides of using -mllvm -emit-codeview-ghash-section and /DEBUG:GHASH (aside from a small increase in object file size). But if you find any let me know.

Great, OK thanks.

I will turn it on in Zig, and as long as our tests keep passing, I'll keep
it on. I'll let you know if it breaks anything.

I would love to write it and contribute it back, please do tell, I did find
some of the code of ghash in lld, but in fuzzy on the llvm codeview part of
it and never seen llvm-readobj/objdump or llvm-pdbutil, but I'm not afraid
to look :slight_smile:

Hi,

No I didn't, I used cl.exe from the visual studio toolchain. What I'm
proposing is a tool for processing .obj files in COFF format, reading them
and generating the GHASH part.

To make our build faster we use hundreds of unity build files (.cpp's
with a lot of other .cpp's in them aka munch files) but still have a lot of
single .cpp's as well (in total something like 3.4k .obj files).

ps: sorry for sending to the wrong list, I was reading about llvm mailing
lists and jumped when I saw what I thought was a lld exclusive list.

A tool like this would be useful, yes. We've talked about it internally
as well and agreed it would be useful, we just haven't prioritized it. If
you're interested in submitting a patch along those lines though, I think
it would be a good addition.

I'm not sure what the best place for it would be. llvm-readobj and
llvm-objdump seem like obvious choices, but they are intended to be
read-only, so perhaps they wouldn't be a good fit.

llvm-pdbutil is kind of a hodgepodge of everything else related to PDBs
and symbols, so I wouldn't be opposed to making a new subcommand there
called "ghash" or something that could process an object file and output a
new object file with a .debug$H section.

A third option would be to make a new tool for it.

Another option would be to make it a feature of llvm-objcopy. That would
probably require adding COFF support to that tool though.

Peter

That’s a good idea too, I had forgotten about llvm-objcopy.

Luckily all of the important code is hidden behind library calls, and it should already just do the right thing, so I suspect you won’t need to know much about CodeView to do this.

I think Peter has the right idea about putting this in llvm-objcopy.

You can look at one of the existing CopyBinary functions there, which currently only work for ELF, but you can just make a new overload that accepts a COFFObjectFile.

I would probably start by iterating over each of the sections (getNumberOfSections / getSectionName) looking for .debug$T and .debug$H sections.

If you find a .debug$H section then you can just skip that object file.

If you find a .debug$T but not a .debug$H, then basically do the same thing that LLD does in PDBLinker::mergeDebugT (create a CVTypeArray, and pass it to GloballyHashedType::hashTypes. That will return an array of hash values. (the format of .debug$H is the header, followed by the hash values). Then when you’re writing the list of sections, just add in the .debug$H section right after the .debug$T section.

Currently llvm-objcopy only writes ELF files, so it would need to be taught to write COFF files. We have code to do this in the yaml2obj utility (specifically, in yaml2coff.cpp in the function writeCOFF). There may be a way to move this code to somewhere else (llvm/Object/COFF.h?) so that it can be re-used by both yaml2coff and llvm-objcopy, but in the worst case scenario you could copy the code and re-write it to work with these new structures.

Lastly, you’ll probably want to put all of this behind an option in llvm-objcopy such as -add-codeview-ghash-section

Thanks for the tips, I now have something that reads the obj file, finds .debug$T sections and global hashes it (proof of concept kind of code). What I can’t find is: how does clang itself writes the coff files with global hashes, as that might help me understand how to create the .debug$H section, how to update the file section count and how to properly write this back.

The code on yaml2coff is expecting to be working on the yaml COFFParser struct and I’m having quite a bit of a headache turning the COFFObjectFile into a COFFParser object or compatible… Tomorrow I might try the very non efficient path of coff2yaml and then yaml2coff with the hashes header… but it seems way too inefficient and convoluted.

You probably don’t want to go down the same route that clang goes through to write the object file. If you think yaml2coff is convoluted, the way clang does it will just give you a headache. There are multiple abstractions involved to account for different object file formats (ELF, COFF, MachO) and output formats (Assembly, binary file). At least with yaml2coff

It’s true that yaml2coff is using the COFFParser structure, but if you look at the writeCOFF function in yaml2coff it’s pretty bare-metal. The logic you need will be almost identical, except that instead of checking the COFFParser for the various fields, you’ll check the existing COFFObjectFile, which should have similar fields.

The only thing you need to different is when writing the section table and section contents, to insert a new entry. Since you’re injecting a section into the middle, you’ll also probably need to push back the file pointer of all subsequent sections so that they don’t overlap. (e.g. if the original sections are 1, 2, 3, 4, 5 and you insert between 2 and 3, then the original sections 3, 4, and 5 would need to have their FilePointerToRawData offset by the size of the new section).

If you need to know what values to put for the other fields in a section header, run dumpbin /headers foo.obj on a clang-generated object file that has a .debug$H section already (e.g. run clang with -emit-codeview-ghash-section, and look at the properties of the .debug$H section and use the same values).

The only invariant that needs to be maintained is that Section[N]->FilePointerOfRawData == Section[N-1]->FilePointerOfRawData + Section[N-1]->SizeOfRawData

You probably don't want to go down the same route that clang goes through
to write the object file. If you think yaml2coff is convoluted, the way
clang does it will just give you a headache. There are multiple
abstractions involved to account for different object file formats (ELF,
COFF, MachO) and output formats (Assembly, binary file). At least with
yaml2coff

I think your phrase got cut there, but yeah I just found AsmPrinter.cpp and
it is convoluted.

It's true that yaml2coff is using the COFFParser structure, but if you
look at the writeCOFF function in yaml2coff it's pretty bare-metal. The
logic you need will be almost identical, except that instead of checking
the COFFParser for the various fields, you'll check the existing
COFFObjectFile, which should have similar fields.

The only thing you need to different is when writing the section table and
section contents, to insert a new entry. Since you're injecting a
section into the middle, you'll also probably need to push back the file
pointer of all subsequent sections so that they don't overlap. (e.g. if
the original sections are 1, 2, 3, 4, 5 and you insert between 2 and 3,
then the original sections 3, 4, and 5 would need to have their
FilePointerToRawData offset by the size of the new section).

I have the PE/COFF spec open here and I'm happy that I read a bit of it so
I actually know what you are talking about... yeah it doesn't seem too
complicated.

If you need to know what values to put for the other fields in a section
header, run `dumpbin /headers foo.obj` on a clang-generated object file
that has a .debug$H section already (e.g. run clang with
-emit-codeview-ghash-section, and look at the properties of the .debug$H
section and use the same values).

Thanks I will do that and then also look at how the CodeView part of the
code does it if I can't understand some of it.

The only invariant that needs to be maintained is that Section[N]->FilePointerOfRawData ==
Section[N-1]->FilePointerOfRawData + Section[N-1]->SizeOfRawData

Well, that and all the sections need to be on the final file... But I'm
hopeful.

Anyone has times on linking a big project like chrome with this so that at
least I know what kind of performance to expect?

My numbers are something like:

1 pdb per obj file: link.exe takes ~15 minutes and 16GB of ram,
lld-link.exe takes 2:30 minutes and ~8GB of ram
around 10 pdbs per folder: link.exe takes 1 minute and 2-3GB of ram,
lld-link.exe takes 1:30 minutes and ~6GB of ram
faslink: link.exe takes 40 seconds, but then 20 seconds of loading at the
first break point in the debugger and we lost DIA support for listing
symbols.
incremental: link.exe takes 8 seconds, but it only happens when very minor
changes happen.

We have an non negligible number of symbols used on some runtime systems.

Chrome is actually one of my exact benchmark cases. When building blink_core.dll and browser_tests.exe, i get anywhere from a 20-40% reduction in link time. We have some other optimizations in the pipeline but not upstream yet.

My best time so far (including other optimizations not yet upstream) is 28s on blink_core.dll, compared to 110s with /debug

Generally speaking a good rule of thumb is that /debug:ghash will be close to or faster than /debug:fastlink, but with none of the penalties like slow debug time

if we get to < 30s I think most users would prefer it to link.exe, just hopping there is still some more optimizations to get closer to ELF linking times (around 10-15s here).

10-15s will be hard without true incremental linking.

At some point that’s going to be the only way to get any faster, but incremental linking is hard (putting it lightly), and since our full links are already really fast we think we can get reasonably close to link.exe incremental speeds with full links. But it’s never enough and I will always want it to be faster, so you may see incremental linking in the future after we hit a performance wall with full link speed :slight_smile:

In any case, I’m definitely interested in seeing what kind of numbers you get with /debug:ghash after you get this llvm-objcopy feature implemented. So keep me updated :slight_smile:

As an aside, have you tried building with clang instead of cl? If you build with clang you wouldn’t even have to do this llvm-objcopy work, because it would “just work”. If you’ve tried but ran into issues I’m interested in hearing about those too. On the other hand, it’s also reasonable to only switch one thing at a time.

I can totally see something like incremental linking with a simple padding between obj and a mapping file (which can also help with edit and continue, something we also would love to have).

We have another developer doing the port to support clang-cl, but although most of our code also goes trough a version of clang, migrating the rest to clang-cl has been a fight. From what I heard the main problem is that we have a copy of parts of windows.h (so not to bring the awful parts of it like lower case macros) and that totally works on cl, but clang (at least 6.0) complains about two struct/vars with the same name, even though they are exactly the same. Making clang-cl as broken as cl.exe is not an option I suppose? I would love to turn on a flag --accept-that-cl-made-bad-decisions-and-live-with-it and have this at least until this is completely fixed in our code base.

the biggest win with moving to cl would be a better more standards compliant compiler, no 1 minute compiles on heavily templated files and maybe the holy grail of ThinLTO.

Clang-cl maintains compatibility with msvc even in cases where it’s non standards compliant (eg 2 phase name lookup), but we try to keep these cases few and far between.

To help me understand your case, do you mean you copy windows.h and modify it? How does this lead to the same struct being defined twice? If i were to write this:

struct Foo {};
struct Foo {};

Is this a small repro of the issue you’re talking about?