[RFC] VTable Type Profiling for SampleFDO

paschalis.mpeis · July 25, 2025, 1:03pm

Thanks for the proposal.

On needing raw perf traces:

Can you please elaborate on the Linux Perf limitation? Are there any plans of dealing with that other than using -D?

If attaching raw events is also needed for AArch64, this would attach the Arm SPE’s native packets in textual format, making the text file bigger (~50-60% in some quick, rough test I did). We may be interested in improving the handling in Linux for that.

On the discussion of sampling bias:

the vtable counters are in the range of 200 - 4500, while the counters inferred from LBR are in the range of 500,000 - 700,000. The memory-load raw counters and LBR-inferred counters differ by orders of magnitude , and the ratio of virtual function target (from LBR) is often much closer to 3:1 than the ratio of vtables (from memory access events) if we repeat the experiment a couple of times. Presumably with continuous sampling (at much lower sampling rate but across the entire fleet with more machines in a real world setting), the bias is mitigated. I don’t have analysis result over real-world data points though. One way to do this analysis is to have a SampleFDO profile generated with vtable counters from continuous sampled data and analyze the profile.

That is interesting. Of course, this profiling is done in separate steps, and as mentioned, the sampling rate could be configured differently.

From the memory profiling, we are primarily interested in identifying vtable loads, but we also obtain a ratio to compare against the edge profile. In this isolated example, the edge profile appear to be more accurate.

Is the plan to use the memory-profiling ratio as a partial verification, or do you think matching rations would be required?

Some clarifications / naive questions:

I haven’t followed the instrumentation-based implementation (your prior work); does code emission support multiple target types for ICP?

drops by 0.4 ~ 0.5% after the vtable-based ICP is applied for instrumented PGO binaries,

How the above should this be interpreted? Does this vtable improvement apply on top of binaries previously optimized with instrumentation-based PGO? (where this was not used)

Build a position dependent binary so runtime addr is the same as static virtual addr for parsing profiles.

Is this a limitation in supporting PIE/PIC code? If so, would that be part of #148013? I’ve left a comment on the patch.

Thanks again for your work!

Paschalis

Topic		Replies	Views
[RFC] Dynamic Type Profiling and Optimizations in LLVM IR & Optimizations llvm	12	2550	January 23, 2024
RFC - Profile Guided Optimization in LLVM LLVM Dev List Archives	30	583	September 6, 2013
RFC - Improvements to PGO profile support LLVM Dev List Archives	61	747	May 29, 2015
RFC: A binary serialization format for MemProf LLVM Dev List Archives	24	634	October 10, 2021
GSoC Proposal : Path Profiling Support LLVM Dev List Archives	10	205	March 23, 2016

[RFC] VTable Type Profiling for SampleFDO

Related topics