Hi,
I’m creating a RFC about [Profile] Add binary profile correlation to offload profile metadata at runtime. by ZequanWu · Pull Request #69493 · llvm/llvm-project · GitHub for better discussion as requested at here.
Motivation
Since we don’t need the profile metadata sections at runtime(except data section is still needed for value profiling), we can somehow mark them metadata sections so that they not loaded into memory at runtime. Initially, I explored debug info correlation, which is used for PGO with value profiling disabled. However, it currently only works with DWARF and it’s be hard to add such artificial debug info for every function in to CodeView which is used on Windows. So, offloading profile metadata sections at runtime seems to be a more platform independent option. This is primary used for saving stripped binary size and reducing disk I/O when using code coverage.
Design
The idea is to offload profile metadata(profile name and data sections) from binary by marking them as metadata sections when lowering to object files, controlled by a new llvm flag. Under this mode, they don’t have SHF_ALLOC flags in ELF (Or have IMAGE_SCN_MEM_DISCARDABLE in COFF). So, they are not loaded into memory at runtime and can be stripped away as a post-linking step. After the process exits, the generated raw profiles will contains only headers + counters. llvm-profdata can be used correlate raw profiles with the unstripped binary to generate indexed profile.
Data
For chromium base_unittests with code coverage on linux, the binary size overhead due to instrumentation reduced from 64M to 38.8M (39.4%) and the raw profile files size reduced from 128M to 68M (46.9%).
$ bloaty out/cov/base_unittests.stripped -- out/no-cov/base_unittests.stripped
FILE SIZE VM SIZE
-------------- --------------
+121% +30.4Mi +121% +30.4Mi .text
[NEW] +14.6Mi [NEW] +14.6Mi __llvm_prf_data
[NEW] +10.6Mi [NEW] +10.6Mi __llvm_prf_names
[NEW] +5.86Mi [NEW] +5.86Mi __llvm_prf_cnts
+95% +1.75Mi +95% +1.75Mi .eh_frame
+108% +400Ki +108% +400Ki .eh_frame_hdr
+9.5% +211Ki +9.5% +211Ki .rela.dyn
+9.2% +95.0Ki +9.2% +95.0Ki .data.rel.ro
+5.0% +87.3Ki +5.0% +87.3Ki .rodata
[ = ] 0 +13% +47.0Ki .bss
+40% +1.78Ki +40% +1.78Ki .got
+12% +1.49Ki +12% +1.49Ki .gcc_except_table
[ = ] 0 +65% +1.23Ki .relro_padding
+62% +1.20Ki [ = ] 0 [Unmapped]
+13% +448 +19% +448 .init_array
+8.8% +192 [ = ] 0 [ELF Section Headers]
+0.0% +136 +0.0% +80 [7 Others]
+0.1% +96 +0.1% +96 .dynsym
+1.2% +96 +1.2% +96 .rela.plt
+1.5% +80 +1.2% +64 .plt
[ = ] 0 -99.2% -3.68Ki [LOAD #5 [RW]]
+195% +64.0Mi +194% +64.0Mi TOTAL
$ bloaty out/cov-cor/base_unittests.stripped -- out/no-cov/base_unittests.stripped
FILE SIZE VM SIZE
-------------- --------------
+121% +30.4Mi +121% +30.4Mi .text
[NEW] +5.86Mi [NEW] +5.86Mi __llvm_prf_cnts
+95% +1.75Mi +95% +1.75Mi .eh_frame
+108% +400Ki +108% +400Ki .eh_frame_hdr
+9.5% +211Ki +9.5% +211Ki .rela.dyn
+9.2% +95.0Ki +9.2% +95.0Ki .data.rel.ro
+5.0% +87.3Ki +5.0% +87.3Ki .rodata
[ = ] 0 +13% +47.0Ki .bss
+40% +1.78Ki +40% +1.78Ki .got
+12% +1.49Ki +12% +1.49Ki .gcc_except_table
+13% +448 +19% +448 .init_array
+0.1% +96 +0.1% +96 .dynsym
+1.2% +96 +1.2% +96 .rela.plt
+1.2% +64 +1.2% +64 .plt
+2.9% +64 [ = ] 0 [ELF Section Headers]
+0.0% +40 +0.0% +40 .data
+1.2% +32 +1.2% +32 .got.plt
+0.0% +24 +0.0% +8 [5 Others]
[ = ] 0 -22.9% -872 [LOAD #5 [RW]]
-74.5% -1.44Ki [ = ] 0 [Unmapped]
[ = ] 0 -76.5% -1.45Ki .relro_padding
+118% +38.8Mi +117% +38.8Mi TOTAL
Build ID
Since the generated raw profiles contains only headers + counters, we need a way to associate them with their corresponding binaries at the merging step. Build id is already used by Fuchsia toolchain team to fetch matching raw profiles from a symbol server. So, it makes sense to use it for matching raw profiles with the unstripped binaries when merging. The workflow I have in mind is to have scripts invoke llvm-profdata to get all binary ids for all raw profiles, and selectively choose the raw pnrofiles with matching binary id and the binary to llvm-profdata for merging.
Discussions
Build ID
There are still few things unclear about Build ID in COFF.
Currently, build id is only generated by lld-link when
- Generating PDB, the build id is stored at .rdata section.
- Generating Dwarf under mingw mode, the build id is stored at .buildid section.
Maybe we should add a flag -build-id to always place it at .buildid section.
How can we dump build id at runtime?
In Linux, it was achieved by reading the program header and find .note.gnu.build-id section to dump it into raw profile. I’m not aware of similar way to do it on Windows.
@ellishg @rnk @hansw2000 @ayzhao @petrhosek @gulfemsavrun @davidxl @MaskRay @evodius96