We have recently observed notable reductions in code size when applying Swift’s merge-function pass on C++/ObjectiveC++ source code. When running this pass with monoLTO, we’ve achieved size reductions ranging from 1% to 2% for some iOS apps. This approach is more effective than the previous efforts to merge similar functions or using IR outliner, in conjunction with enabling machine outlining. One of the reasons can be that a small similar section has its own overhead which also reduces the effectiveness of the machine outliner, while finding a large similar section (in this case, a whole function) and extracting different constants as parameters can be beneficial.
What is Swift’s merge-function pass?
Swift’s function merger is implemented in LLVMMergeFunctions.cpp in the Swift repo. In addition to merging completely identical functions, it can merge functions which only differ by a few constants in specific instructions. These varying constants are extracted and treated as parameters. A thunk function may be generated to call the merged function. The callsites to the original function can be converted to invoke the merged function.
Plan to upstream the patches
The size win is based on monoLTO and a lot of iOS apps are using thinLTO, due to build scalability issues with monoLTO. Kyungwoo introduced the global merger in his talk, for thinLTO, it uses stable hash for each candidate function and makes global decisions based on the hashes for all functions in the link unit. The idea can apply to no LTO as well.
We plan to divide the upstreaming into 3 steps:
Improve the existing FunctionComparator to handle the case where constants will be ignored, implement stable hashing that will be aligned with FunctionComparator, and will hash both opcodes and operands
Move Swift’s merge-function pass over to llvm and use callbacks for swift-specific things that are currently in LLVMMergeFunctions.cpp
Upstream the global merger
Patches for step 1 and 2 will follow shortly if folks are generally okay, and we will do another RFC for the global merger.
Just want to clarify a couple of points related to step 1.
By stable hashing, I’m assuming that you’re planning on using the StructuralHash as implemented in llvm/lib/IR/StructuralHash.cpp? I recently moved the hashing implementation from MergeFunctions to use that one since they were basically exact duplicates of each other. The new implementation should be flexible enough to do what you want to do, probably after some modifications. We might need to do something like more config options rather than what’s done currently where it’s a boolean that hashes everything or just some things that MergeFunctions (in its current state) needs.
When you mention adding instruction operands into the hash, what exact operand types are you interested in? Some of the major ones are already handled in StructuralHash, but others are not (like instruction operands). A lot of them probably need to be special cased due to how the hashing infrastructure in LLVM works. Just wondering what the implementation/changes will end up looking like, but I’m sure all will be clarified in the patches.
Also, just throwing it out there, we recently did a large scale analysis on function duplication as a part of https://arxiv.org/abs/2309.15432. We did this using StructuralHash with the detailed option enabled:
So, it definitely seems like MergeFunctions (and future extensions like this one) could definitely help with code size. There are a couple key points to note about these statistics though:
These are duplication figures across an entire corpus of projects rather than across an individual project where an optimization like this would help.
The raw duplication figures don’t take into account size information. So while there is a very high duplication rate, these functions are likely small and while merging them will result in size wins, they might be lower than what the raw numerical figures suggest.
This also doesn’t take into account the benefit of this specific change as for this analysis we’re treating functions that only differ by constants as different functions entirely.
These figures are collected from functions directly after the frontend rather than after optimizations. After optimization, from our testing, the duplication rate can be even higher as things get canonicalized, depending upon the language. However, we did these experiments with the opt O3 pipeline, but I think the conclusions should generalize to the pre-merge ThinLTO optimization pipeline. Post optimization duplication rate:
I don’t think it’s necessarily the opposite as it’s very language dependent (and probably should’ve pointed this out in my previous statement). This would definitely make sense for C/C++ (which is what I assume most of Meta’s code size sensitive projects are written in) where the duplication rate goes down by a decent amount (10+%) after optimization, and the duplication rate is probably even higher right before inlining after some optimization/canonicalization has taken place.
In addition to merging completely identical functions, it can merge functions which only differ by a few constants in specific instructions.
Note that ideally the pass should run after LLVM’s merge function pass. Citation from the comment:
// This pass should run after LLVM's MergeFunctions pass, because it works best
// if there are no _identical_ functions in the module.
// Note: it would also work for identical functions but could produce more
// code overhead than the LLVM pass.
I also have a question: do you have an idea how to generate the names of the new merged functions? In the swift compiler this pass uses the swift mangling scheme, which is probably not ideal for C++, etc.
We currently extended FunctionComparator and also implemented FunctionHashIgnoringConst according to the updated FunctionComparatorIgnoringConst. Sounds like StructuralHash will be a good fit for this purpose. How do you make sure that StructuralHash and FunctionComparator will not diverge?
I am going to upload preliminary patches some time this week. With the patch, it will be clear which instruction operands will be in the hash.
ideally the pass should run after LLVM’s merge function pass
In order to re-use the two-codegen framework for global outlining (slide 9 of this talk), we put it at end of the pipeline. For the actual discussion on global merging, we will defer it to the future RFC for global merger.
do you have an idea how to generate the names of the new merged functions?
Good question. We are using a different suffix, but have not thought about demangler support, similar to how we use suffix like __llvm etc.
I recently refactored MergeFunctions to just use StructuralHash and pulled out the FunctionHash originally in FunctionComparator, so we make sure they don’t diverge by just having a single hashing implementation and associated testing. There is not necessarily a guarantee that the two won’t diverge (other than failing tests), but I would expect anyone working on MergeFunctions to see the hashing implementation. The idea with StructuralHash is that it implements all the functionality and is configurable depending upon what exactly the use is (currently just MergeFunctions and pass change status validation in expensive checks).
We will need to refactor extension to FunctionComparator/FunctionHash to StructuralHash. We added a new pass which is ported from Swift, and will need to discuss on how to migrate Swift’s pass over after we landed this in llvm.
We define duplication using StructuralHash with the detailed option set to true. It’s decently lossy at the moment (not looking at things like instruction operands). If the hashes are the same we consider them to be duplicate functions. The way we’re detecting duplication, we can’t really find good merge/outline candidates. The main idea behind the dataset/associated analyses is that it’s a very large corpus of production code (that hopefully matches the distribution of internal applications that people actually care about) that we can look at to see how these sorts of changes work at a much bigger scale.
Reading through the patches, it definitely seems like the FunctionHashIgnoringConstant could be integrated mostly into StructuralHash. There are a couple elements like the isEligibleForConstantSharing call that might be more difficult to integrate, but the core is essentially the same. Not sure what the best paradigm is to make everything clean while not sacrificing performance, but I’m sure there’s something that will work.