Function merging in LLVM currently only merges identical functions, which largely overlaps with the linker’s identical code folding (ICF). Additionally, Swift’s function merger has a slightly better capability to merge similar functions, but it has not yet been upstreamed – RFC for moving Swift’s merge function pass to LLVM. Both methods rely on IR compactors to identify identical or similar functions. However, this approach is less effective in a separate compilation environment due to the limited scope for comparison.
This RFC proposes a stable hash-based function merging approach, which can be used in separate compilation environments. Similar to Swift’s function merger, it tracks differences in constants. However, instead of direct comparison, it encodes stable functions into a map during the analysis phase.
During the merging step, it compares the hash summary with the current IR being processed. If a match is found, it optimistically creates a merging instance using a thunk to provide its specific context. Unlike traditional mergers, this method does not explicitly merge the call sites for safety. The actual size reduction occurs when identically created merging instances are folded by the linker. This approach is akin to the global function outlining introduced in a previous RFC which has been fully upstreamed – RFC Enhanced Machine Outliner Part 2: ThinLTO & NoLTO. You can also refer to a technical paper which is the basis for this PR – ACM Digital Library.
Results
The implementation was tested by building LLD for arm64 MachO with various configurations:
-
LM: Local merging (
-enable-global-merge-func
) from this RFC -
LO: Local outlining (
-enable-machine-outliner=always
, which is on by default) -
GM: Global merging (LM + using CG Data for merging) from this RFC
-
GO: Global outlining (LO + using CG Data for outlining) from the prior RFC
-
Gen:
-fcodegen-data-generate
generates CGData in each object file, which the linker merges into a .cgdata (same as the prior RFC) -
Use:
-fcodegen-data-use
uses the CGData to perform GM and/or GO (same as the prior RFC) -
Two-rounds:
-codegen-data-thinlto-two-rounds
is a special case of thin-LTO that performs the CGData generation and use in place (same as the prior RFC)
The results indicate that Global Merging (GM) achieves an additional 2-3% reduction in code size on top of the state-of-the-art Global Outlining (GO). Although GM may increase build times, particularly when code generation is repeated in place, this increase is minimal compared to the build times associated with full Link-Time Optimization (LTO). GM is also applicable in full-LTO scenarios, where it performs analysis and merging in place within a single large module.
Benchmark | Code Size Change |
---|---|
7zip/7zip-benchmark | -1.6% |
Bullet/bullet | -0.3% |
ClamAV/clamscan | 0.1% |
consumer-typeset/consumer-typeset | -0.2% |
kimwitu++/kc | -10.3% |
lencod/lencod | -0.3% |
mafft/pairlocalalign | -0.2% |
SPASS/SPASS | -0.3% |
sqlite3/sqlite3 | -0.1% |
tramp3d-v4/tramp3d-v4 | -2.2% |
Additional data from the LLVM test suite targeting arm64 MachO with -Oz -flto=thin -codegen-data-thinlto-two-rounds -enable-global-merge-func
indicates potential savings of up to 10%
further on top of the state-of-the-art Global Outlining (GO) for certain C++ applications. This pass could be particularly useful for languages like Swift or others that extensively use generics/templates or lambdas.
Planned PRs
-
Refactoring structural hash [NFC]: PR #112621
-
Structural hash tracking differences with a custom function: PR #112638
-
Function merging summary data (serialization/deserialization): PR #112662
-
llvm-cgdata integration for new stable function map data: PR #112664
-
The main global merging function pass: PR #112671
-
Mach-O LLD integration, currently plugging this pass into the precodegen pass to fit into the thinlto-two-rounds framework mentioned above: PR #112674