Problem
Several optimizations need a stable identifier for global values throughout compilation, thinlink, and recompilation. This notably includes PGO (instrumented and sampled). In addition, ThinLTO needs identifiers for indexing its summary, in both thin link and the ThinLTO backend.
Currently we use the GlobalValue
GUID, but these are not stable. They’re a hash constructed from the name, linkage, and source filename, and these properties are not stable in all circumstances. In particular, ThinLTO function importing can change them. If module A
defines an external function Foo
which references an internal symbol Bar
, then after importing Foo
into module B
we need to promote Bar
to external so that it’s visible from B
, and change its name to ensure uniqueness. Conversely, if ThinLTO finds a symbol with external linkage which doesn’t need to be external, it will change its linkage to internal.
This means that GUIDs are not guaranteed to be stable across different phases of the LTO / PGO pipeline. Without stable GUIDs, users have various workarounds. For instance, InstrProf
stores its own metadata and uses that to construct a stable unique identifier, and for sample PGO we use -funique-internal-linkage-names
to include a unique module ID in the name. This flag in particular is problematic as changing the names of internal symbols breaks assumptions (for instance, in the Linux kernel where assembly depends on symbol names).
These mechanisms are multiple, special-purpose, add a large space overhead, and do not work in all situations. Ideally we’d have a simple, small, unique identifier for any GlobalValue
that we can use wherever necessary.
Proposed solution
The proposal is to make the GlobalValue
GUID stable, and replace the special-purpose mechanisms with GUIDs. GUIDs are a good choice as they already exist for all GlobalValue
objects, and are compact at only 8 bytes each.
To ensure the GUID remains stable, we will compute it once, up-front, in the module which defines the GlobalValue
. The resulting value will be stored as metadata attached to the GlobalValue
. This metadata will be used when constructing the module summary index. If a function is imported, its GUID metadata will be imported alongside. ThinLTO will use the pre-computed GUIDs when building its index, rather than computing them itself. Any changes to linkage at the function importing stage will have no effect on the GUID, as it is never recomputed. Any later accesses of the GUID will use the value stored in the metadata.
A section will be added to the binary which associates symbols with their GUID, to enable sample-based profiling to use the same GUIDs. This will cost 8 bytes per symbol (but note however that for sample PGO, removing the existing metadata / name mangling will save at least that much).
Plan
This is a somewhat wide-reaching change, as GUIDs are used in dozens of places across the codebase. We intend to roll this change out via the following steps:
Milestone 1: ThinLTO
As noted above, there are several users who require unique identifiers. The plan is to migrate ThinLTO first, as a proof of concept.
- Add a new LTO pre-link pass which computes a GUID for each
GlobalValue
and saves it as metadata in the module. The GUID computation will remain unchanged for values with external linkage, so existing users will get a matching GUID in this case (otherwise, they might not, as is the existing behaviour today). - Change
GlobalValue::getGUID
to retrieve the value from the metadata (assert-failing if it’s not present). - Use the metadata when constructing the module summary index, and in the LTO backend.
- Note that this will require an extra side table in the bitcode. Function importing lazily loads functions, importing metadata only with the body. Before the body is imported, we won’t have the metadata and hence the GUID. This side table will provide the GUID immediately. This could be a new table, or an extra column in the existing value symbol table, details TBD.
Milestone 2: Other PGO users
With milestone 1 completed, we can be confident that this approach will work, and we can migrate other users piecemeal, as time permits. For each one we can remove its bespoke means of obtaining unique identifiers once they’re migrated. It is expected that not all users of GUIDs will need to be migrated to use this metadata, as the existing method will work unchanged for external symbols, and in many cases the assumption that the symbol is external will always hold.