[RFC] Add -build-id flag to lld-link

Currently, lld-link generates build id under the following two cases, primarily used for associating executable with debug info.

  1. When generating PDB, build id is stored in debug directory which is part of .rdata section. Build id is a hash of PDB content.
  2. When generating Dwarf under MinGw mode, build id is stored in debug directory at .buildid section. Build id is a hash of the executable content.

Propose

I’m suggesting to add a -build-id flag on lld-link to allow associating the executable with other metadata information. It creates a .buildid section which contains only the hash of PDB/executable content.

Here’s how it will work with existing build id generations:

  1. With PDB

    • Without -build-id, it works as before, stored in debug directory at .rdata
    • With -build-id, it creates an extra .buildid section just contains the hash of the PDB content.
  2. Dwarf + MingW. I don’t know how .buildid is used in this situation.
    Can we just put build id into .buildid without other information in debug directory?
    I guess only the hash in .buildid is actually useful for tools like debugger to associate Dwarf with the executable as .note.gnu.build-id in ELF just contains the hash.

  3. Otherwise with -build-id, it generates .buildid section just containing the hash of the executable content.

Use case

When .buildid section exists, we could associate lightweight raw profiles with the instrumented binaries.

The lightweight raw profiles are files containing header with optional build id and counter values. They are dumped at runtime. Later, llvm-profdata will merge raw profiles with other metadata information from their corresponding instrumented binaries. Without build id embedded into the binaries and lightweight raw profiles, it’s hard to associate raw profiles with the correct binaries.

Context:

@mstorsjo @rnk

What’s the purpose of this, compared to the existing debug directory in .rdata? Is it just that the tools you want to read it would be able to read it easier, if it is in a separate section with a unique name, compared to looking it up in the middle of .rdata based on the debug directory via the PE header?

I don’t quite understand this question here - that’s what we’re doing right now, no? Yes, we can do that - and that’s enough for the use cases with GDB/LLDB and crash reporting tools, AFAIK.

Yes, pretty much.

FWIW, I’m not entirely sure if there’s a good reason for the tools that use it mingw environments for keeping it in a separate .buildid section, or if all of these tools would cope with it merged into .rdata as well. Honestly, I just think it’s been done this way because it was easiest that way in GNU ld, and then I tried to mimic it in LLD.

Yes, it’s easier for runtime to find build id and dump it if it’s in a separate section. Also I want to have build id even when building without debug info.

I don’t quite understand this question here - that’s what we’re doing right now, no? Yes, we can do that - and that’s enough for the use cases with GDB/LLDB and crash reporting tools, AFAIK.

There is a bit more stuff being stored in .buildid, timestamp etc.

FWIW, I’m not entirely sure if there’s a good reason for the tools that use it mingw environments for keeping it in a separate .buildid section, or if all of these tools would cope with it merged into .rdata as well. Honestly, I just think it’s been done this way because it was easiest that way in GNU ld, and then I tried to mimic it in LLD.

There’s no existing tools using .buildid in mingw? Then we can mimic it as GNU ld to just contain the hash.

I guess that’s reasonable.

Right - although that could be done even if it’d be placed in .rdata.

Ah, I see. TBH I believe most of the tools that look at it only look at the build id part of it, nothing else.

I think most of the existing tooling that read the build id (as opposed to just write and forget) would look it up via the debug directory, so it can be placed anywhere. .buildid is just one convention. I’m not sure exactly how much such tooling there is though - I think the prime case is for minidumps.

Hmm.

But on second thought, if you write tools that just look at the verbatim .buildid section, and only look at e.g. the first N bytes or so (assuming a specific hash length), then this will break if it turns out to be a .buildid section which is a full debug directory, if the hash is located further into the structure and the start of the structure is constant.

If you consider the whole section contents though, and ignore whether it’s a plain hash or a full debug directory, then it’s probably ok.

In order to avoid breaking existing tools that read .buildid. I think it’s better to just keep .buildid as it is now.

I found that the .buildid sections is composed of a list of debug_directory, followed by “build id” + optional PDB alt path. And the first debug_directory always describe the “build id” hash with a rva points to it. So, it’s not difficult to locate the “build id” hash once we found .buildid. Probably this is also how those tools read the hash.

And the first debug_directory always describe the “build id” hash with a rva points to it.

With two exceptions which will be .buildid containing other information but no build id hash.

  1. /lldmingw + /Brepro without /debug
  2. /lldmingw + /cetcompat without /debug.

Thanks, that sounds like a very pragmatic choice!