Has there been any analysis on where that gain is coming from? For instance, how many indirect call sites is this optimization firing on relative to original ICP and is this coming from a few hot callsites or more generally across a large number of callsites?
I haven’t collected aggregated stats; but the local change prints logs (probably worth gated by LLVM_DEBUG
). I could try getting this data and update back.
For associating profiled loads with vtables, I’m wondering if enough information exists in the binary already to make that happen without a separate section. Vtable symbols are recorded in the ELF .symtab with their starting address and size which seems to be everything in
__llvm_prf_vtab
.
This idea reminds me of the profile correlation work based on binary or dwarf debug information.
To look up the profiled address collected at runtime using the address recorded in the binary, the raw profile also needs to record the runtime start address of the relevant segment (e.g. .data.rel.ro
). This also means llvm-profdata
needs to take the binary as input to process profiles (either to generate indexed profiles from raw profiles, or show the profiled vtable information from a raw profile file)