Failing tests in latest LLVM: mlir-cpu-runner: CommandLine Error

I don’t have an account at, so reporting it in the forum here.

mlir-cpu-runner: CommandLine Error: Option 'debug-buffer-size' registered more than once!

To reproduce:
Just build latest llvm with mlir

mkdir build && cd build
cmake -G Ninja ../llvm -DLLVM_ENABLE_PROJECTS=mlir
ninja check-mlir


Failed Tests (4):
  MLIR :: mlir-cpu-runner/async-error.mlir
  MLIR :: mlir-cpu-runner/async-group.mlir
  MLIR :: mlir-cpu-runner/async-value.mlir
  MLIR :: mlir-cpu-runner/async.mlir
FAIL: MLIR :: mlir-cpu-runner/async-error.mlir (759 of 1031)
******************** TEST 'MLIR :: mlir-cpu-runner/async-error.mlir' FAILED ********************
: 'RUN: at line 1';     /tmp/llvm-project/build/bin/mlir-opt /private/tmp/llvm-project/mlir/test/mlir-cpu-runner/async-error.mlir -async-to-async-runtime                                                 -async-runtime-ref-counting                                             -async-runtime-ref-counting-opt                                         -convert-async-to-llvm                                                  -convert-linalg-to-loops                                                -convert-scf-to-std                                                     -convert-linalg-to-llvm                                                 -convert-vector-to-llvm                                                 -convert-std-to-llvm                                      | /tmp/llvm-project/build/bin/mlir-cpu-runner                                                           -e main -entry-point-result=void -O0                                    -shared-libs=/tmp/llvm-project/build/lib/libmlir_c_runner_utils.dylib       -shared-libs=/tmp/llvm-project/build/lib/libmlir_runner_utils.dylib         -shared-libs=/tmp/llvm-project/build/lib/libmlir_async_runtime.dylib    | /tmp/llvm-project/build/bin/FileCheck /private/tmp/llvm-project/mlir/test/mlir-cpu-runner/async-error.mlir --dump-input=always
Exit Code: 2

Command Output (stderr):
mlir-cpu-runner: CommandLine Error: Option 'debug-buffer-size' registered more than once!
LLVM ERROR: inconsistency in registered CommandLine options
PLEASE submit a bug report to and include the crash backtrace.
Stack dump:
0.	Program arguments: /tmp/llvm-project/build/bin/mlir-cpu-runner -e main -entry-point-result=void -O0 -shared-libs=/tmp/llvm-project/build/lib/libmlir_c_runner_utils.dylib -shared-libs=/tmp/llvm-project/build/lib/libmlir_runner_utils.dylib -shared-libs=/tmp/llvm-project/build/lib/libmlir_async_runtime.dylib
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  mlir-cpu-runner             0x0000000102a68710 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 68
1  mlir-cpu-runner             0x0000000102a68c04 PrintStackTraceSignalHandler(void*) + 28
2  mlir-cpu-runner             0x0000000102a66c78 llvm::sys::RunSignalHandlers() + 124
3  mlir-cpu-runner             0x0000000102a6b234 SignalHandler(int) + 220
4  libsystem_platform.dylib    0x00000001a9c4c4a4 _sigtramp + 56
5  libsystem_pthread.dylib     0x00000001a9c350a4 pthread_kill + 292
6  libsystem_c.dylib           0x00000001a9b72314 abort + 164
7  mlir-cpu-runner             0x00000001028f1a58 llvm::report_fatal_error(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool) + 0
8  mlir-cpu-runner             0x00000001028f1914 llvm::report_fatal_error(llvm::Twine const&, bool) + 0
9  mlir-cpu-runner             0x00000001028a5a08 (anonymous namespace)::CommandLineParser::addOption(llvm::cl::Option*, llvm::cl::SubCommand*) + 512
10 mlir-cpu-runner             0x0000000102899d78 (anonymous namespace)::CommandLineParser::addOption(llvm::cl::Option*, bool) + 128
11 mlir-cpu-runner             0x0000000102898064 llvm::cl::Option::addArgument() + 52
12 mlir-cpu-runner             0x0000000102898018 llvm::cl::opt<unsigned int, false, llvm::cl::parser<unsigned int> >::done() + 28
13 libmlir_async_runtime.dylib 0x00000001172dd588 llvm::cl::opt<unsigned int, false, llvm::cl::parser<unsigned int> >::opt<char [18], llvm::cl::desc, llvm::cl::OptionHidden, llvm::cl::initializer<int> >(char const (&) [18], llvm::cl::desc const&, llvm::cl::OptionHidden const&, llvm::cl::initializer<int> const&) + 144
14 libmlir_async_runtime.dylib 0x00000001172d8818 llvm::cl::opt<unsigned int, false, llvm::cl::parser<unsigned int> >::opt<char [18], llvm::cl::desc, llvm::cl::OptionHidden, llvm::cl::initializer<int> >(char const (&) [18], llvm::cl::desc const&, llvm::cl::OptionHidden const&, llvm::cl::initializer<int> const&) + 60
15 libmlir_async_runtime.dylib 0x00000001172e18bc __cxx_global_var_init.2 + 116
16 libmlir_async_runtime.dylib 0x00000001172e19cc _GLOBAL__sub_I_Debug.cpp + 16
17 dyld                        0x00000001168d1230 invocation function for block in dyld4::Loader::findAndRunAllInitializers(dyld4::RuntimeState&) const + 164
18 dyld                        0x00000001168f9cbc invocation function for block in dyld3::MachOAnalyzer::forEachInitializer(Diagnostics&, dyld3::MachOAnalyzer::VMAddrConverter const&, void (unsigned int) block_pointer, void const*) const + 340
19 dyld                        0x00000001168f06b8 invocation function for block in dyld3::MachOFile::forEachSection(void (dyld3::MachOFile::SectionInfo const&, bool, bool&) block_pointer) const + 532
20 dyld                        0x00000001168bdf98 dyld3::MachOFile::forEachLoadCommand(Diagnostics&, void (load_command const*, bool&) block_pointer) const + 168
21 dyld                        0x00000001168f045c dyld3::MachOFile::forEachSection(void (dyld3::MachOFile::SectionInfo const&, bool, bool&) block_pointer) const + 192
22 dyld                        0x00000001168f9704 dyld3::MachOAnalyzer::forEachInitializer(Diagnostics&, dyld3::MachOAnalyzer::VMAddrConverter const&, void (unsigned int) block_pointer, void const*) const + 516
23 dyld                        0x00000001168d1170 dyld4::Loader::findAndRunAllInitializers(dyld4::RuntimeState&) const + 172
24 dyld                        0x00000001168d1314 dyld4::Loader::runInitializersBottomUp(dyld4::RuntimeState&, dyld3::Array<dyld4::Loader const*>&) const + 208
25 dyld                        0x00000001168d13e0 dyld4::Loader::runInitializersBottomUpPlusUpwardLinks(dyld4::RuntimeState&) const + 124
26 dyld                        0x00000001168e083c dyld4::APIs::dlopen_from(char const*, int, void*) + 508
27 mlir-cpu-runner             0x0000000102a431ec llvm::sys::DynamicLibrary::HandleSet::DLOpen(char const*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*) + 32
28 mlir-cpu-runner             0x0000000102a43504 llvm::sys::DynamicLibrary::getPermanentLibrary(char const*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*) + 48
29 mlir-cpu-runner             0x0000000103180914 compileAndExecute((anonymous namespace)::Options&, mlir::ModuleOp, llvm::StringRef, (anonymous namespace)::CompileAndExecuteConfig, void**) + 280
30 mlir-cpu-runner             0x000000010316c728 compileAndExecuteVoidFunction((anonymous namespace)::Options&, mlir::ModuleOp, llvm::StringRef, (anonymous namespace)::CompileAndExecuteConfig) + 228
31 mlir-cpu-runner             0x000000010316b5ac mlir::JitRunnerMain(int, char**, mlir::DialectRegistry const&, mlir::JitRunnerConfig) + 1296
32 mlir-cpu-runner             0x000000010245dea0 main + 200
33 dyld                        0x00000001168c10fc start + 520
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /tmp/llvm-project/build/bin/FileCheck /private/tmp/llvm-project/mlir/test/mlir-cpu-runner/async-error.mlir --dump-input=always

FWIW, I consistently get this as well. It looks like a .o file is being linked in multiple times somehow. I’m not sure what is going on.

Fwiw as well, this kind of error can result from memory smashing in unrelated initialization code that precedes option setup. The last time I saw it was from issues with pass statistics initialization (that specific thing can’t happen anymore). Because a lot of time can be spent looking for a link issue that may not exist, I’d recommend first running the given binary with ASAN to see if there are any initialization issues lurking.

mlir-cpu-runner runs the JIT engine, which confuses ASAN, so it may not be that helpful.

Only the async tests are failing?

Only these 4 async ones are failing yes.

This is because the async runtime .so is linking libSupport (and built with visibility hidden I think).

I don’t know why libSupport is linked there though, I was looking into this last week and couldn’t figure out where it is coming from: llvm-project/CMakeLists.txt at main · llvm/llvm-project · GitHub
(LLVM_PTHREAD_LIB only includes -pthread)

Actually it is added implicitly here: llvm-project/AddMLIR.cmake at main · llvm/llvm-project · GitHub

Oof - we should remove that and specify it right. The time bomb has gone off.

1 Like

(and I’m reasonably certain that this has bitten us in other esoteric cases but not been root caused)

Yes we likely should fix this in CMake. That won’t fix the AsyncRuntime problem: it actually needs libSupport because it is using the LLVM thread pool facility for example.

We’re paying the price of libSupport being too “fat” : it includes the ADT and other generic facility but also comes with global registry for options which is fairly intrusive.

1 Like

I would daresay that in that form, then, the AsyncRuntime is only usable at best as a toy. This is why parts of the “real” compilers in LLVM have very specific runtime library boundaries: You have to guard the kinds of dependencies that such things take, and there needs to be a defensible boundary between the things used to build the compiler and those things that the compiler emits references to. Those two can share code, but it needs to be designed that way – and this has not been.

It would be better in the short term if the AsyncRuntime literally copied the code that it needs to function versus introducing this kind of dependency.

It’s fine, I think it was the intent :slight_smile:
It is just meant as a “reference implementation” to showcase (and to test) the compiler generated code, but other implementation can be used for production (like TFRT for example, which implements the runtime APIs).

And indeed if we were to have a real thin runtime layer in MLIR, we should be very careful about layering it in isolation.

I agree.


Even if it is intended as a toy, it will not always be used as such. mlir-cpu-runner itself is a test vehicle, but I’ve seen people using it to report performance numbers.

Right. The question is “what is the fate of AsyncRuntime”. Is it just a testing vehicle or is it something serious. If it is something serious, then rearchitecting libsupport would make sense. If it is a testing vehicle, then doing something minimal to fix the linkage issues would be reasonable.

OTOH, if it was a drive by experiment then we can also just remove it from the tree, or hoist it up to TFRT.

The origin of this work was when the work on async started I pushed to make it happen upstream (instead of writing the entire async JIT compiler in TFRT), and decouple the compiler from the runtime implementation by defining clear interfaces. We also wanted MLIR to not depend on TFRT and still be able to actually test the generated code, so having a “simple” implementation behind the API in-tree was fairly natural (and it isn’t a lot of code).

Beyond AsyncRuntime: none of the runtime in lib/ExecutionEngine seems very principled to me, it looks more like a collection of convenient utility for testing/experimenting.

Yeah, I agree about ExecutionEngine. It’s also not a great name (carried over from old LLVM stuff with also-dubious design :slight_smile: :).

I’m not super familiar with the async dialect: is it actually genericly useful for something and does it have multiple clients? If not, then it is probably better to move the dialect itself to TFRT as well.


I actually mentioned something related here: Aync runtime method not being properly exported? - #5 by bondhugula. Async runtime needs StringMap and ThreadPool.
+1 to copying the code it needs. Big NO from me on moving the dialect out – it’s the only path I know currently to realize parallel loop execution via MLIR upstream (on CPUs).

I think the concept modeled by the async dialect are generic enough that it is worth having as part of the ecosystem. It isn’t coupled to TFRT in any way (that said I think there is a place for the lower-level aspect of TFRT in-tree as well, but that’s another topic…).
Also there is a nice technology we have in-tree that maps the async construct to the LLVM coroutines in a nice way (we’re the only in-tree client of LLVM coroutine other than clang as far as I know).

The async dialect is also used in the context of GPU launch operations now to model the inherent async aspect associated with it. I think folks are working on mapping it to multiple Cuda stream, but I haven’t followed the progress on this recently.

ExecutionEngine was basically me making sure the LLVM IR we emit can be turned into executable code. It’s also a testing facility that is now shared between the main code, C and Python APIs. It is mostly convenience APIs for interfacing with memrefs. Any serious use is better off creating LLVM IR modules and using LLVM’s JIT directly.

GPU lowerings use it for host execution.

I have an out-of-tree project that uses it and prefers not to depend on TFRT. I’ve also seen other projects using it in the wild.

OpenMP parallel constructs and workshare loops work well. When somebody finds time to review reduction support, that will work too :slight_smile:

1 Like