Thanks for the detailed RFC! This is really exciting work and I think there is significant overlap with what @sharonxu and I have planned.
A while back I extended IRPGO to support “Temporal Profiling”.
The main motivation was to improve binary startup time (which is important in the mobile space) by reducing the number of .text section page faults via a function order. We do this by instrumenting function timestamps (which measures how early a function is called for the first time) and running a new Balanced Partitioning algorithm during linking to come up with an optimal function order using our profile data. We believe we can expand on this approach to reorder data. (Based on @Colibrow’s recent work, I believe they would be very interested in this)
Temporal PGO is similar to this work, but there are some key differences. Let me know if I get something wrong.
- This work improves throughput during steady state execution, I assume by reducing data cache misses. Temporal PGO reduces page faults during startup.
- This work partitions hot data into a new section at compile time. Temporal PGO orders data in the linker, which has the affect of hot/cold splitting as well as colocating functions often used in sequence.
- This work uses samples profile data in production while Temporal PGO uses IRPGO instrumented binaries to collect profile data.
How do you plan to extend the MemProf profile format to support this? I still need to study this profile format, but it would be greatly helpful to us if your extension could include a “timestamp” for each symbol, similar to how we extended IRPGO to support function timestamps. Since I assume MEM_INST_RETIRED.ALL_LOADS
does not collect timestamps of data loads, you would probably use a value of zero for your case. However, if you were to collect this info, I believe you could further improve data locality by colocating data that are often used close in time during execution.
Have you considered using --symbol-ordering-file
(or -order_file
in Mach-O) to cluster these sections in the linker? Or you could add a new symbol orderer in LLD, similar to --call-graph-ordering-file
or --bp-startup-sort
? This would give you much more control over where these symbols are laid out.