[RFC] Add binary profile correlation to not load profile metadata sections into memory at runtime

Hi,

I’m creating a RFC about [Profile] Add binary profile correlation to offload profile metadata at runtime. by ZequanWu · Pull Request #69493 · llvm/llvm-project · GitHub for better discussion as requested at here.

Motivation

Since we don’t need the profile metadata sections at runtime(except data section is still needed for value profiling), we can somehow mark them metadata sections so that they not loaded into memory at runtime. Initially, I explored debug info correlation, which is used for PGO with value profiling disabled. However, it currently only works with DWARF and it’s be hard to add such artificial debug info for every function in to CodeView which is used on Windows. So, offloading profile metadata sections at runtime seems to be a more platform independent option. This is primary used for saving stripped binary size and reducing disk I/O when using code coverage.

Design

The idea is to offload profile metadata(profile name and data sections) from binary by marking them as metadata sections when lowering to object files, controlled by a new llvm flag. Under this mode, they don’t have SHF_ALLOC flags in ELF (Or have IMAGE_SCN_MEM_DISCARDABLE in COFF). So, they are not loaded into memory at runtime and can be stripped away as a post-linking step. After the process exits, the generated raw profiles will contains only headers + counters. llvm-profdata can be used correlate raw profiles with the unstripped binary to generate indexed profile.

Data

For chromium base_unittests with code coverage on linux, the binary size overhead due to instrumentation reduced from 64M to 38.8M (39.4%) and the raw profile files size reduced from 128M to 68M (46.9%).

$ bloaty out/cov/base_unittests.stripped -- out/no-cov/base_unittests.stripped
    FILE SIZE        VM SIZE
 --------------  --------------
  +121% +30.4Mi  +121% +30.4Mi    .text
  [NEW] +14.6Mi  [NEW] +14.6Mi    __llvm_prf_data
  [NEW] +10.6Mi  [NEW] +10.6Mi    __llvm_prf_names
  [NEW] +5.86Mi  [NEW] +5.86Mi    __llvm_prf_cnts
   +95% +1.75Mi   +95% +1.75Mi    .eh_frame
  +108%  +400Ki  +108%  +400Ki    .eh_frame_hdr
  +9.5%  +211Ki  +9.5%  +211Ki    .rela.dyn
  +9.2% +95.0Ki  +9.2% +95.0Ki    .data.rel.ro
  +5.0% +87.3Ki  +5.0% +87.3Ki    .rodata
  [ = ]       0   +13% +47.0Ki    .bss
   +40% +1.78Ki   +40% +1.78Ki    .got
   +12% +1.49Ki   +12% +1.49Ki    .gcc_except_table
  [ = ]       0   +65% +1.23Ki    .relro_padding
   +62% +1.20Ki  [ = ]       0    [Unmapped]
   +13%    +448   +19%    +448    .init_array
  +8.8%    +192  [ = ]       0    [ELF Section Headers]
  +0.0%    +136  +0.0%     +80    [7 Others]
  +0.1%     +96  +0.1%     +96    .dynsym
  +1.2%     +96  +1.2%     +96    .rela.plt
  +1.5%     +80  +1.2%     +64    .plt
  [ = ]       0 -99.2% -3.68Ki    [LOAD #5 [RW]]
  +195% +64.0Mi  +194% +64.0Mi    TOTAL
$ bloaty out/cov-cor/base_unittests.stripped -- out/no-cov/base_unittests.stripped
    FILE SIZE        VM SIZE
 --------------  --------------
  +121% +30.4Mi  +121% +30.4Mi    .text
  [NEW] +5.86Mi  [NEW] +5.86Mi    __llvm_prf_cnts
   +95% +1.75Mi   +95% +1.75Mi    .eh_frame
  +108%  +400Ki  +108%  +400Ki    .eh_frame_hdr
  +9.5%  +211Ki  +9.5%  +211Ki    .rela.dyn
  +9.2% +95.0Ki  +9.2% +95.0Ki    .data.rel.ro
  +5.0% +87.3Ki  +5.0% +87.3Ki    .rodata
  [ = ]       0   +13% +47.0Ki    .bss
   +40% +1.78Ki   +40% +1.78Ki    .got
   +12% +1.49Ki   +12% +1.49Ki    .gcc_except_table
   +13%    +448   +19%    +448    .init_array
  +0.1%     +96  +0.1%     +96    .dynsym
  +1.2%     +96  +1.2%     +96    .rela.plt
  +1.2%     +64  +1.2%     +64    .plt
  +2.9%     +64  [ = ]       0    [ELF Section Headers]
  +0.0%     +40  +0.0%     +40    .data
  +1.2%     +32  +1.2%     +32    .got.plt
  +0.0%     +24  +0.0%      +8    [5 Others]
  [ = ]       0 -22.9%    -872    [LOAD #5 [RW]]
 -74.5% -1.44Ki  [ = ]       0    [Unmapped]
  [ = ]       0 -76.5% -1.45Ki    .relro_padding
  +118% +38.8Mi  +117% +38.8Mi    TOTAL

Build ID

Since the generated raw profiles contains only headers + counters, we need a way to associate them with their corresponding binaries at the merging step. Build id is already used by Fuchsia toolchain team to fetch matching raw profiles from a symbol server. So, it makes sense to use it for matching raw profiles with the unstripped binaries when merging. The workflow I have in mind is to have scripts invoke llvm-profdata to get all binary ids for all raw profiles, and selectively choose the raw pnrofiles with matching binary id and the binary to llvm-profdata for merging.

Discussions

Build ID

There are still few things unclear about Build ID in COFF.

Currently, build id is only generated by lld-link when

  1. Generating PDB, the build id is stored at .rdata section.
  2. Generating Dwarf under mingw mode, the build id is stored at .buildid section.

Maybe we should add a flag -build-id to always place it at .buildid section.

How can we dump build id at runtime?

In Linux, it was achieved by reading the program header and find .note.gnu.build-id section to dump it into raw profile. I’m not aware of similar way to do it on Windows.

@ellishg @rnk @hansw2000 @ayzhao @petrhosek @gulfemsavrun @davidxl @MaskRay @evodius96

Nit: Can we not use the name “offload”, I find it very confusing for what this is (or what I think this is).
Especially since we are enabling PGO on GPUs, “offload” in multiple places will confuse other people as well.

1 Like

Neat! Is this relative to the existing lightweight instrumentation that uses debug info correlation (assuming that applies to code coverage), or regular PGO?

About buildid for PE/COFF, there’s a CheckSum field in one of the headers: PE Format - Win32 apps | Microsoft Learn
Maybe we could use that?
Or if there are no better options, could we just use the hash of the file?

This is comparing with regular code coverage size result. Debug info correlation currently doesn’t work with code coverage but should have similar result as I got here assuming it works.

How about we extend llvm-profdata to support the functionality that you described with the scripts? This way other users can use binary correlation easily. We should target reducing the usage of customized scripts when it might be possible to integrate the functionality into llvm tools. For ex, we integrated debuginfod into llvm-cov. We can consider adding the following new functionalities into llvm-profdata:

  1. Integrate debuginfod into llvm-profdata, so that it can fetch the unstripped binaries via reading the build ids from raw profiles while merging profiles.
  2. Extend llvm-profdata merge command to take unstripped binaries as an optional argument.
1 Like
  1. Integrate debuginfod into llvm-profdata, so that it can fetch the unstripped binaries via reading the build ids from raw profiles while merging profiles.

This is more complicated, we can extend in this direction in the future.

  1. Extend llvm-profdata merge command to take unstripped binaries as an optional argument.

This is what am I doing in the PR, but it only accepts one binary file as input at this time. We can extend it to allow merging raw profiles from different binaries and take multiple binaries at once later.

We should definitely extend it to take multiple binary files because we are changing the functionality in merge command. When you merge multiple profiles, they typically correspond to different executables. IIUC, your implementation does not use build ids at all at the moment to correlate profiles with unstripped binaries, and it just uses the provided unstripped binaries.

We should definitely extend it to take multiple binary files because we are changing the functionality in merge command. When you merge multiple profiles, they typically correspond to different executables. IIUC, your implementation does not use build ids at all at the moment to correlate profiles with unstripped binaries, and it just uses the provided unstripped binaries.

Yes. I think a pre-work change to implement taking multiple files and make use of build ids is necessary for the current PR to base on.

I feel like we can extend it to take multiple correlation files later. For now, just let -debug-info and -binary-file take one argument, and later extend to support a list of files.

That sounds reasonable to me.