TL;DR: I am trying to combine two projects that both have their own out-of-tree dialects and corresponding Python bindings, currently with Bazel. I am running into the problem that the symbols of some static variables are defined multiple times, such as the various TypeID
s of various dialects, ops, etc, which then exists several times at runtime and thus cause problems. I know a solution based on shared libraries but struggle implementing it in Bazel.
I have posted a more detailed question on StackOverflow but the essence is that (1) the symbols of Python extensions aren’t visible from different extensions (even if they have public linker visibility) and (2) symbols from common object files that are linked statically into multiple extensions exist in all of them. The two combined lead to multiple instances of static variables at runtime.
A solution that seems to work is to have the symbols of all static variables live in separate shared libraries that the extensions share. In that case, all Python extensions that link to the same set of shared libraries see the same set of instances of static variables.
The question is how to organize the build system to produce that. My question on StackOverflow contains details about why this isn’t trivial with Bazel.
For the MLIR C API, there is a solution: there is a custom rule called mlir_c_api_cc_library
that defines (1) a target for “normal” consumers, (2) a *Header
target for only the headers that the Python extensions would use, and (3) a *Objects
target that only the (single instance of the) shared libraries would use. For example, the CAPIInterfaces
target looks like this:
mlir_c_api_cc_library(
name = "CAPIInterfaces",
srcs = [
"lib/CAPI/Interfaces/Interfaces.cpp",
],
capi_deps = [
":CAPIIR",
],
includes = ["include"],
deps = [
":IR",
# ...
],
)
With this rule, the source files from the srcs
argument and the files from the (transitive) capi_deps
argument exist in the *Objects
target but not in the *Header
. However, the source files from any target that is pulled in via deps
either exists in all targets (if it has been defined with alwayslink
, see my SO post for details) or it isn’t exported by the shared libraries that depend on that target. This affects the symbols in the :IR
target: by default, they aren’t exported, and if I set them to alwayslink
, they are exported in all extensions and exist several times at runtime.
The only solution that I am currently aware of that solves this problem technically is to apply the mlir_c_api_cc_library
rule to all transitive dependencies of MLIR, including LLVM (among potentially other things, command line options are defined in static variables, and I ran into run time issues with those being defined twice). This doesn’t sound very realistic, or at least highly non-trivial. Among things I might not have though about, it will require to change hundreds or thousands of targets ~manually and convincing a lot of involved people.
Before I consider embarking into that mission: is this really the only solution? Potentially just a work-around that unblocks me on my current project while we look for a more sustainable solution?