All the PGO counters are using 64-bit values to store their respective counts, but the corresponding materializations in the IR in the form of branch weights are scaled to fit 32-bits.
The scaling factors are computed per branch operation, which minimizes the precision loss when computing branch probabilities. The disadvantage of the approach is that the absolute branch weights cannot be compared globally. For example, it is hard to detect a hot loop in a function. While this scaling approach preserves probabilities, it seems overly restrictive for other kinds of usages of the counter values.
While I could not find any recent discussions, I found two posts from 2015 (Loss of precision with very large branch weights - #15 by davidxl, RFC - Improvements to PGO profile support - #3 by dnovillo) that mentioned such a change might be possible.
The linked posts only raise memory concerns, and it’s unclear if these are still relevant today. Are there counter-arguments for changing the MD_prof metadata to hold i64 values?
To simplify such a transition, I opened ⚙ D141393 [llvm][ir] Purge MD_prof custom accessors, which aims to unify the way the branch weights are accessed. By reducing the number of places that access the branch weights, changing their representation should become less error-prone.
I’d attempted to use 64 bit branch weights in ⚙ D88609 Use uint64_t for branch weights instead of uint32_t. IIRC I ran into issues where we were doing math on 32 bit weights but with 64 bit ints to avoid overflow issues.
Thank you very much for pointing me to this revision. Indeed there are a few places where branch weights are multiplied and thus can overflow.
Such examples are most prominent in
SimplifyCFG; in almost all places where the weights are re-scaled, a multiplication happened before.
Given that the branch weights are either synthesized or produced by PGO counters, it should be reasonable to enforce that the sum of all weights is less than
UINT64_MAX. Note that even
PGOInstrumentation.cpp has code that assumes this. To model absolute branch weights, 64 bits are enough, assuming they are updated correctly by transformations.
I do see two possible ways of pushing for such a change:
- Manually checking and changing all usages of the weights to ensure that they do not rely on
uint32s to perform computations.
- Deprecate the accessor that spits out scaled weights and gradually move the usage away to work with the 64-bit version. Note that this still requires a manual check of some cases that directly load the metadata.