Context
We are rearchitecting how we ship Python and C++ inter-op code at Facebook. Historically, the C/C++ code that serves as the Python/C++ interop layer (we call these python_cxx_extensions
) are compiled into individual shared libraries, while all second-level C/C++ dependencies are compiled into a omnibus library (libomnibus.so
). The main problem with this approach is that we need to load thousands of python_cxx_extensions
shared libraries at runtime which can be very expensive. We are trying to reduce runtime overhead by linking all python_cxx_extensions
, C/C++ dependencies, and the python runtime together as a static binary.
Since all python_cxx_extensions
were separate shared libraries, we saw a ton of duplicate symbols when we try to link them together. It is not feasibly to change the code base to remove duplicate symbols, so instead we use objcopy
to rename external symbols in python_cxx_extensions
as <symbol_name>.<unqiue_identifier>
. We’ve successfully linked our binaries using this approach, but this approach does not work when using ThinLTO because there is no easy way to rewrite symbol names in IR.
Our system is such that:
- We need to be transparent to the user, we cannot change the source code
- We know in which files the symbols are defined and referenced
Proposal
We’d like to mirror the objcopy
approach when using ThinLTO. The UniqueInternalLinkageName
pass already append internal symbol names with MD5 hash, we would like to create a similar pass where give a list of symbols and a list of IR files, rewrite all the specified symbol references into <symbol_name>.<unqiue_identifier>
.
Questions
- Aside from extending
UniqueInternalLinkageName
pass, are there any existing tools that we overlooked? There’s https://clang.llvm.org/extra/clang-rename.html but it doesn’t operate on IR level. - Are there alternatives we can use to avoid duplicate symbols at LTO time?
– with @LorenArthur