We are rearchitecting how we ship Python and C++ inter-op code at Facebook. Historically, the C/C++ code that serves as the Python/C++ interop layer (we call these
python_cxx_extensions) are compiled into individual shared libraries, while all second-level C/C++ dependencies are compiled into a omnibus library (
libomnibus.so). The main problem with this approach is that we need to load thousands of
python_cxx_extensions shared libraries at runtime which can be very expensive. We are trying to reduce runtime overhead by linking all
python_cxx_extensions, C/C++ dependencies, and the python runtime together as a static binary.
python_cxx_extensions were separate shared libraries, we saw a ton of duplicate symbols when we try to link them together. It is not feasibly to change the code base to remove duplicate symbols, so instead we use
objcopy to rename external symbols in
<symbol_name>.<unqiue_identifier>. We’ve successfully linked our binaries using this approach, but this approach does not work when using ThinLTO because there is no easy way to rewrite symbol names in IR.
Our system is such that:
- We need to be transparent to the user, we cannot change the source code
- We know in which files the symbols are defined and referenced
We’d like to mirror the
objcopy approach when using ThinLTO. The
UniqueInternalLinkageName pass already append internal symbol names with MD5 hash, we would like to create a similar pass where give a list of symbols and a list of IR files, rewrite all the specified symbol references into
- Aside from extending
UniqueInternalLinkageNamepass, are there any existing tools that we overlooked? There’s https://clang.llvm.org/extra/clang-rename.html but it doesn’t operate on IR level.
- Are there alternatives we can use to avoid duplicate symbols at LTO time?
– with @LorenArthur