Recently, a patch which implemented function specialization is being reviewed at: ⚙ D93838 [SCCP] Add Function Specialization pass.
This pass works under LTO. And I want to see if it is possible to make it available under ThinLTO, although this patch isn’t accepted yet.
I know the current framework for ThinLTO would import functions which could be used for all the passes in current translation unit. However,
the problem is that the current cost model would only import functions whose IR is less than threshold (which is relatively small). I can guess
that it is designed for inlining. My point is, in the current cost model, the most function imported would be inlined directly. The space for function
specialization to work is not big. The goal I want is to import more functions which can’t be handled by inline but function specialization.
The idea I had initially was simple. We could add value infomation in the edge of call graph. Then we could make the call graph a bidirection graph
(now the graph only contains edges to the callee from my point). Finally, in the import stage, we could traverse the call graph to pick the suitable
functions to import.
However, there seems to be some problems:
- We can’t see the funciton body before we import it.
- It would repeat traversing the call graph in each translation unit, which is very redundant.
- It may specialize functions with the same version, which could make the code size get larger and redundant.
I had some solutions:
- We could extract the analysis part from function specialization pass. Then we can use the analysis pass to generate summary infomation. However,
the down side for this approach is that it may make the time for generating summary longger (it looks like the process of generating summary isn’t pararrel).
- I can’t find solution for the second problem. If we put this part in generating summary, it would only make it slower.
- My solution was to add special marker to functions specialized. Then we can eliminate the redundant functions at the end. However, it looks like serial too.
And I don’t know if it is time killer to traverse and merge functions.
How do you guys think about this?