Aync runtime method not being properly exported?

This is from the latest MLIR upstream as of Jun 21 74909e4b6e9bc0da6c197cf6c4419991a8dc335f
I notice that the async runtime lib methods aren’t being exported through its library. I notice this in lib/ExecutionEngine/CMakeLists.txt which is by default hiding all symbols and the ones to export aren’t being exported explicitly.

set_property(TARGET mlir_async_runtime PROPERTY CXX_VISIBILITY_PRESET hidden)

For eg:

$ nm lib/libmlir_async_runtime.so | grep mlirAsyncRuntimeAwaitToken
000000000006e360 t mlirAsyncRuntimeAwaitToken
...

(Note t above instead of T.)
I can confirm that trying to use the library from the Python bindings leads to errors on these symbols being missing. I am not sure though as to how mlir-cpu-runner is able to find these (check-mlir passes). Dropping the hidden setting exposes these methods and it’s found via the Python bindings. But are these methods missing explicit visibility settings?

@antiagainst

We setup the runner libraries to rely on explicit loading of the symbols independently of the visibility, so that the only two symbols exposed by the runner libraries need to be these:

$ nm lib/libmlir_async_runtime.so  | grep " T "
0000000000005de0 T __mlir_runner_destroy
0000000000005580 T __mlir_runner_init

(not all the runner libraries have been updated on this pattern though).

See here for the implementation: llvm-project/AsyncRuntime.cpp at main · llvm/llvm-project · GitHub

Thanks, so this was the cause. The Python shared lib loading path is missing handling of libraries that are set up with this approach. Although the relevant code is in JitRunner.cpp:

llvm::StringMap<void *> exportSymbols;
  SmallVector<MlirRunnerDestroyFn> destroyFns;

  // Handle libraries that do support mlir-runner init/destroy callbacks.
  for (auto &libPath : libPaths) {
    auto lib = llvm::sys::DynamicLibrary::getPermanentLibrary(libPath.c_str());
    void *initSym = lib.getAddressOfSymbol("__mlir_runner_init");
    void *destroySim = lib.getAddressOfSymbol("__mlir_runner_destroy");

    // Library does not support mlir runner, load it with ExecutionEngine.
    if (!initSym || !destroySim) {
      executionEngineLibs.push_back(libPath);
      continue;
    }

    auto initFn = reinterpret_cast<MlirRunnerInitFn>(initSym);
    initFn(exportSymbols);

    auto destroyFn = reinterpret_cast<MlirRunnerDestroyFn>(destroySim);
    destroyFns.push_back(destroyFn);
  }

  // Build a runtime symbol map from the config and exported symbols.
  auto runtimeSymbolMap = [&](llvm::orc::MangleAndInterner interner) {
    auto symbolMap = config.runtimeSymbolMap ? config.runtimeSymbolMap(interner)
                                             : llvm::orc::SymbolMap();
    for (auto &exportSymbol : exportSymbols)
      symbolMap[interner(exportSymbol.getKey())] =
          llvm::JITEvaluatedSymbol::fromPointer(exportSymbol.getValue());
    return symbolMap;
  };

  ...

  auto engine = std::move(*expectedEngine);
  engine->registerSymbols(runtimeSymbolMap);

and which is why the symbols were still found via the cpu-runner path, can I know why this additional setup to call a special init function instead of exposing the desired ones via a visibility attribute?

void __attribute__((__visibility__("default"))) mlirAsync...

I’ve seen problems with non-deterministic dynamic library unloading, that lead to segfaults, in some cases pthread (std::thread implementation) was unloaded before the async runtime, and then when async runtime library was destroyed it crashed.

Async runtime library is problematic because it has a global static variable that owns a thread pool.

See: ⚙ D92368 [mlir] AsyncRuntime: disable threading until test flakiness is fixed and ⚙ D94312 [mlir:JitRunner] Use custom shared library init/destroy functions if available

I see - thanks for this link and this historic context. If we follow this init/destroy approach, how would we actually link to these methods from “compiled MLIR” to create a binary and then execute (i…e, outside of the ORC JIT)? Let’s say ones wishes to lower to LLVM and then compile/assemble to native code and then link with the runtime libraries: this works with the other runtime methods like print_memref but with this we’ll have to either export the symbols or have an additional stub linked in that manually opens the shared library and calls the init/destroy. I’m also not sure how the latter part will work easily since the init methods rely on LLVM ADT.

Good point, didn’t think about that. This init/destroy functions are optional, it’s possible to add a flag to build async runtime lib without them, and make all function visible.

Internally at google instead of init/destroy functions we had an option to build statically linked async runtime library (async runtime + all dependencies in a single library), but I was not able to make it work in OSS.

For the AOT case I’d look into statically linking runtime library into the binary + making all symbols visible. For “reasons” I had troubles making it work internally, but I suspect it’s just a problem with our toolchain.

1 Like

Making all symbols visible is an easy workaround. For AOT, did you also mean to say that with shared loading one would run into race conditions leading to crashes just like with the ORC JIT?

Yes, the problem was not in the ORC JIT itself, but with dlopen. I don’t remember all the details, but I think the root cause was in non-deterministic execution order of global static destructors and library unloading.

For example if library A depends on B via global static, then B can be unloaded before A destructors will be called, and then A will try to execute code that no longer available to the binary.

This all goes away if libraries A and B statically linked into the single binary.

1 Like

It’ll work naturally if you build it as a static library and link it with your AOT generated code.

1 Like