Exception handling in MLIR

Hello all,

In the MLIR codebase, I see mostly the use of asserts for checking certain conditions or using LogicalResult to propagate back to invoke signalPassFailure() method or use op->emitError like options for error handling. In the former case, the compiler suddenly stops when an assertion is failed. In addition, if there is a segmentation fault in the code base, that will stop the compilation too abruptly. So, I’m wondering if there is a way to gracefully exit the compilation? This graceful exit can be really important if the compiler built using MLIR is part of a bigger software stack. One immediate thought that comes to my mind is using C++ exceptions. But, there are some comments on increased compilation time using C++ exceptions. I wonder how the community is addressing this problem in their toolchains?

Thanks,
Prasanth

It isn’t clear to me how exceptions (or any mechanism) would be a replacement for assertions and segfault? In general these are bugs in the software, and almost by definition we can’t really manage/recover from a bug and we are using an “unsafe” language which will always have the possibility of memory corruptions.
Assertions are just a development tools to validate invariants during testing/development.

The way I’ve seen deployments in the past where recovering from a crash was needed has been to rely on processus isolation and RPC: a service can spawn the compiler in a separate process on-demand for each compilation request. To avoid spawning a new process on each compilation request, you may also have the compiler in a dedicated service that accepts compilation requests and auto-restart on crashes.

My intent is about using exceptions to gracefully exit rather than sudden stop which can happen via assertions or segmentation faults. I view exceptions as complementary instead of replacement. I was asking if there are any alternative ways to exit smoothly instead of exceptions?

Sorry, I’m still puzzled about how you’d use exceptions to gracefully exit in the case of a segfault?

I don’t know whether @prasanth is asking about this, but there are crash handlers in LLVM and MLIR.
https://llvm.org/doxygen/PrettyStackTrace_8cpp.html#abf5a4258beed4edd8ab42ec0d375f51e
https://mlir.llvm.org/doxygen/PassCrashRecovery_8cpp_source.html

@mehdi_amini : My bad, I was incorrect. For segfaults, we may have to go with signal handlers that catch SIGSEV and exit/return gracefully.

Overall, my main question is on what are the approaches to avoid a compiler breaking suddenly in the middle of an execution (maybe via segfaults or assert violations or something else)?

@kiranchandramohan : thanks for the pass crash recovery link and will look into this.

Spawning an other process as I mentioned above is the usual robust way to recover from failures. This is a well known techniques that goes beyond compilers: this is how web servers manages their CGI, look into the PHP-FPM architecture for example!

1 Like

As Mehdi says and since you asked what others do, IREE by default spawns a separate process for compilation and linking (the entry points for all tools are in the same shared library so this doesn’t correspond to multiple large executables and there are other tricks that can be done with such a setup). We also support an in process compilation flow but set things up so an external tool can be used instead if desired. The debugability and recoverability of an external process can’t be beat but we let you trade off to pure in process if needed.

An EH mechanism could replace the way that we use LogicalResult but would amount to, basically, a complete rewrite of the codebase. In my experience, the more insidious failures likely call for fail fast behavior anyway, and trying to recover makes assumptions that the process state has not been corrupted.

Maybe some day we don’t write such things in C++… Until then, I think this is as good as it gets.

1 Like

Thanks, @mehdi_amini and @stellaraccident – It looks like spawning a process for the compilation is a cleaner way.