Making LLD PDB generation faster

Hi,

Is anyone working on making the PDB generation on LLD faster? Looking
of a trace for linking one of our binaries (it takes 1min6s-1min20s) I
see two things:

1) LookupBucketFor(Val, ConstFoundBucket); takes 35s so almost half of
the time of linking, mostly finding duplicates
2) There is no parallelization inside of addObjectsToPDB

Is anyone working on those? Also has anyone thought about merging .obj
files to deduplicate type infomation so we can do the linking on
projects to generate something like a lib file, but deduplicated debug
information (as far as I know actual .lib just put all pdbs or /Z7
debug info inside a file without dedup).

Just looking at the code it seems it is much more mature and also the
choice of SHA1_8 seems interesting (still don't know why not use
xxHash64).

ps: My code to add ghashes to msvc compiled .obj files is almost ready
to be pushed as an option for llvm-objcopy.

+Reid and Alexandre, who have been doing work in this area recently

Leonardo, to answer to your questions, yes to all of them  You can take a

look at this prototype/proposal: https://reviews.llvm.org/D55585

Overall, computing ghashes in parallel at link-time and merging Types with them

is less costly that the current approach to merging. The 35sec you’re seeing

for merging should go down to about 15sec. The patch doesn’t parallelize

(yet) the Type merging itself, but we have an alternate multithread-suitable

implementation of DenseHash which already supports lockless, wait-free,

insert/fetch/resize.

The prototype allows for testing different hashing algorithms, and indeed

xxHash seems to be the best general-purpose choice. I’ve also added support

for more specialized hardware-based hashes, like Casey Muratori’s Meow Hash

(uses hardware AES SSE 4.2 instructions), which brings the figures down a bit.

Future changes could write back the computed ghash stream back to OBJs if

/INCREMENTAL is specified (just an idea). Incrementally linking will be faster

that way when working with MSVC OBJs.

As for creating PDBs for independent projects, that would help most likely.

However the ghash stream would need to be stored in the PDB in that case

(currently, ghashes are dropped after merging). That could help when using

rarely compiled projects, used along with network caches.

I will start sending smaller patches to converge towards the functionally of

the prototype above.

Best,

Alex.

More info inline, I think there is a couple of misconceptions on what I'm doing:

1) I already patch all my .obj files to contain .debug$H entries so it
is all ghashed already
2) All the 35s is spent adding to the DenseMap

Here is my current times (lld-link.exe compiled with -O2 so no
lto/pgo), lld generates a 141 MB binary and 1.2GB pdb file:

  Input File Reading: 1724 ms ( 2.1%)
  Code Layout: 482 ms ( 0.6%)
  PDB Emission (Cumulative): 79261 ms ( 96.8%)
    Add Objects: 68650 ms ( 83.8%)
      Type Merging: 57534 ms ( 70.2%)
      Symbol Merging: 10822 ms ( 13.2%)
    TPI Stream Layout: 1501 ms ( 1.8%)
    Globals Stream Layout: 770 ms ( 0.9%)
    Commit to Disk: 7007 ms ( 8.6%)
  Commit Output File: 19 ms ( 0.0%)

How do you compile LLD? There's a big difference between when using MSVC vs
Clang. The parallel ghash patch I was mentioning is almost 2x as fast when
using Clang 7.0+ vs. MSVC 15.9+, I don't know exactly why. I also suggest you use
the Release target. You should also grab this patch:
https://reviews.llvm.org/D55056 - I had to revert it because it was causing
issues with LLDB. But it will give an improvement for LLD.
Please let me know if that improves your timings.

The page faults are probably the OS loading from disk: most, if not all the
files are accessed by LLD by mmap'ing them.

The lockless DenseHash I was talking about will be published in an upcoming
patch. As for reproducibility, this can be an issue on build systems. But on
local machines, we could explicitly state that we want non-deterministic
builds, through some cmd-line flag. If your 57sec for "Type Merging"
transforms into 5sec when non-deterministic, I think that's worth it.

Alex.

My current patch for llvm-objcopy to include a --add-ghashes option,
preprocess your obj files before linking them, should be a easy change
on something like Fastbuild to distribute the hashes creation.
Unfortunately it doesn't work for .lib files you might be linking with
or pdb files.

With your patch for cmake and reconfiguring it with "cmake -G "Visual
Studio 15 2017" -A x64 -T"llvm",host=x64 -DLLVM_ENABLE_PDB=true
-DLLVM_ENABLE_PROJECTS=lld ../llvm" we get these results:

  Input File Reading: 1602 ms ( 3.5%)
  Code Layout: 493 ms ( 1.1%)
  PDB Emission (Cumulative): 43127 ms ( 94.5%)
    Add Objects: 34577 ms ( 75.8%)
      Type Merging: 26709 ms ( 58.5%)
      Symbol Merging: 7598 ms ( 16.7%)
    TPI Stream Layout: 1107 ms ( 2.4%)
    Globals Stream Layout: 602 ms ( 1.3%)
    Commit to Disk: 5636 ms ( 12.4%)
  Commit Output File: 16 ms ( 0.0%)

Yes - I was suggesting a behavior like /Zi, but at librarian time. That requires a new option to llvm-lib.exe.
That way, you would have a pre-linked PDB in your LIB, and thusly save time for the final link. LLD would use that PDB instead of the debug info in your OBJs (which are also in the LIB). That could work out well for rarely-changed LIBs. As an additional optimization, you/we could modify Fastbuild to call the librarian on a remote worker (which currently is done on your local host).

NB: Do you have a Phabricator account? (reviews.llvm.org) I can't seem to tag you.

That's good news. For having debug info, you could try adding /Z7 on the cmake cmd-line, such as -DCMAKE_CXX_FLAGS="/Z7". Or use the 'RelWithDebInfo' target instead of 'Release' and add -DCMAKE_CXX_FLAGS="/Ob2" (because that target uses /Ob1 as a default).

Can you please send a patch on Phabricator if you fix the LLVM_ENABLE_PDB issue with Clang? The goal is to have performance out-of-the-box.

Alex.

Ok so there's a lot of confusion on cmake regarding using llvm as a
toolset. It still does all its checks against cl.exe (not clang-cl)
and somehow overriders CMAKE_LINKER to be link.exe. I tried a couple
of places including:

cmake -G "Visual Studio 15 2017" -A x64 -T"llvm",host=x64
-DCMAKE_LINKER="C:/Program Files/LLVM/bin/lld-link.exe"
-DCMAKE_CXX_COMPILER="C:/Program Files/LLVM/bin/clang-cl.exe"
-DCMAKE_C_COMPILER="C:/Program Files/LLVM/bin/clang-cl.exe"
-DLLVM_ENABLE_LTO=true -DLLVM_ENABLE_PDB=true
-DLLVM_ENABLE_PROJECTS=lld ../llvm

but it seems like the generator overrides it.

ps: Created a phabricator account

Can you please try using Ninja instead?

cmake -G Ninja f:/svn/llvm -DCMAKE_BUILD_TYPE=Release -DLLVM_OPTIMIZED_TABLEGEN=true -DLLVM_EXTERNAL_LLD_SOURCE_DIR=f:/svn/lld -DLLVM_TOOL_LLD_BUILD=true -DLLVM_ENABLE_LLD=true -DCMAKE_C_COMPILER="C:/Program Files/LLVM/bin/clang-cl.exe" -DCMAKE_CXX_COMPILER="C:/Program Files/LLVM/bin/clang-cl.exe" -DCMAKE_LINKER="C:/Program Files/LLVM/bin/lld-link.exe" -DLLVM_ENABLE_PDB=true

It will be faster to compile. The setup I use is the above Ninja cmd-line for compiling optimized builds; and in addition, I keep the Visual Studio generator, as you do, but only for having a .sln to debug. It is a bit annoying to cmake twice, in two different build folders, but you can write a batch script.

If the above works, maybe you should log the bug on https://bugs.llvm.org/ so it is not forgotten.

Alex.

I don’t think changing the compiler or linker is supported with the vs generator, but I also don’t think it’s a bug

Shouldn’t -DLLVM_ENABLE_PDB=true work when targeting the “llvm” platform toolset with VS? The Release target doesn’t have debug info in that case. That seems to be the root issue. It works with Ninja but not with VS.

cmake -G"Visual Studio 15 2017 Win64" -T"llvm",host=x64 f:/svn/llvm -DLLVM_OPTIMIZED_TABLEGEN=true -DLLVM_EXTERNAL_LLD_SOURCE_DIR=f:/svn/lld -DLLVM_TOOL_LLD_BUILD=true -DLLVM_ENABLE_PDB=true

I think its a huge bug that it doesn't raise any errors or warnings
about it. But I will open a ticket on cmake, they should be using
clang-cl.exe and lld-link.exe if T="llvm" probably set host to 64 bit
as well.

Is -Tllvm even supported? I thought the only thing you could pass for -T was -Thost=x64

Yes, -Tllvm works.

…however it is very slow to compile, because /MP isn’t currently supported by clang-cl. So each CPP is compiled sequentially, one after another. Thus my patch for adding /MP.

Times for lld compiled with LTO:

  Input File Reading: 1430 ms ( 3.3%)
  Code Layout: 486 ms ( 1.1%)
  PDB Emission (Cumulative): 41042 ms ( 94.6%)
    Add Objects: 33117 ms ( 76.4%)
      Type Merging: 25861 ms ( 59.6%)
      Symbol Merging: 7011 ms ( 16.2%)
    TPI Stream Layout: 996 ms ( 2.3%)
    Globals Stream Layout: 513 ms ( 1.2%)
    Commit to Disk: 5175 ms ( 11.9%)
  Commit Output File: 37 ms ( 0.1%)

For enabling large memory pages, see this link: How to enable large/huge memory pages in Windows

Meow hash isn't in the patch I posted, but you can use xxHash, it is good enough. Just add /hasher:xxhash to the LLD cmd-line.

I have to change it on llvm-objcopy, which is doing most of the hash
generation, just to make the hashes actually collide if they are
pointing to the same thing :slight_smile: