We’re currently working on an MLIR-based C++ compiler and found some unexpected behavior when dealing with exception-related operations in the llvm
dialect. We know our use-case might be a bit different to what other people in the community are doing, so we understand not too many people might’ve run into these issues before. However, we don’t mind being the driving-force to fix these, not only for our project, but also for the MLIR community as a whole.
We have run into two issues until now, but we plan to report/address other issues we may run into in the future as long as our planning/human resources allow this.
ResumeOp::verify()
too restrictive
Currently, llvm.resume
cannot be passed a value coming from an operation other than llvm.landingpad
, as seen in ResumeOp::verify()
. However, this is too restrictive, as, according to the LLVM language reference:
The ‘resume’ instruction requires one argument, which must have the same type
as the result of any ‘landingpad’ instruction in the same function.
In fact, compiling a program as simple as:
void f();
void foo() {
try {
f();
} catch(...) {
throw 1;
}
}
Using clang++ -S -emit-llvm
(clang 13.0.1
) generates a resume
instruction with a non-landingpad input:
%lpad.val4 = insertvalue { i8*, i32 } %lpad.val, i32 %sel, 1
resume { i8*, i32 } %lpad.val4
That’s why we propose changing the definition of this verifier
function to something less restrictive and in line with the LLVM instruction. We even have a patch for that already up for review.
Representing invoke
to a block with phi
instructions
Note: Issue with this operation was already reported by @bitwalker here.
The lack of a phi
-like operation in MLIR makes the llvm.invoke
operation awkward to represent, as this operation can produce a value despite being a terminator and affect control flow, it needs to somehow reference itself to add the value it’s producing to the list of block arguments to be passed to its “normal” successor.
Because of this, mlir-translate
runs into issues when importing the following LLVM module:
declare i8 @__gxx_personality_v0(...)
declare i32 @foo(i32 %arg)
define i32 @test(i1 %cond) personality i8 (...)* @__gxx_personality_v0 {
entry:
br i1 %cond, label %call, label %add
call:
%invoke = invoke i32 @foo() to label %bb0 unwind label %bb1
add:
%addition = add i32 10, 1
br label %bb0
bb0:
%ret = phi i32 [ %addition, %add ], [ %invoke, %call ]
ret i32 %ret
bb1:
%resume = landingpad i32 cleanup
resume i32 %resume
}
In the program above, the bb0
block should receive an i32
block argument, but there is no current way to represent that with the current llvm.invoke
operation, as value %invoke
cannot be used in its definition:
%invoke = llvm.invoke @foo() to ^bb0(%invoke) unwind ^bb1 : () -> i32
Import workaround
It is obvious that llvm.invoke
is a different beast on its own, as it produces a value while being a terminator, so we might need to compromise the “1-to-1”-ish translation in order to fix this issue. The above program can be expressed as follows in MLIR creating a “dummy” normal successor block:
%invoke = llvm.invoke @foo() to ^dummy unwind ^bb1 : () -> i32
^dummy:
llvm.br ^bb0(%invoke : i32)
Note that mlir->llvm
translation needs no change, as this transformation would need to take place only when importing an LLVM module. Also, if the “normal” successor block has no phi
instructions, the translation should remain unchanged w.r.t. the current state as there would be no need for block arguments.
Replacing llvm.invoke
As an alternative solution, we could drop this operation altogether, as it is in fact violating one of the principles of the llvm
dialect. According to the llvm
dialect documentation:
Unless explicitly stated otherwise, the semantics of the LLVM dialect
operations must correspond to the semantics of LLVM IR instructions
and any divergence is considered a bug. The dialect also contains
auxiliary operations that smoothen the differences in the IR structure,
e.g., MLIR does not have phi operations and LLVM IR does not have a
constant operation. These auxiliary operations are systematically prefixed
with mlir, e.g. llvm.mlir.constant where llvm. is the dialect namespace prefix.
And replace it with two operations:
-
llvm.mlir.invoke_call
(CallOpInterface
): would represents the “call” semantics of theinvoke
instruction. -
llvm.mlir.invoke_br
(Terminator
,BranchOpInterface
): would represent the control flow semantics of theinvoke
instruction.
These operations should always go one after the other, i.e., llvm.mlir.invoke_call
’s verify
function should check the next operation in the block is a llvm.mlir.invoke_br
and llvm.mlir.invoke_br
’s verify
function should check llvm.mlir.invoke_call
is the previous operation.
Using these operations, we could represent the above LLVM module as:
%invoke = llvm.mlir.invoke_call @foo() : () -> i32
llvm.mlir.invoke_br to ^bb0(%invoke : i32) unwind ^bb1
In order to keep couples of these operations tied, we could make llvm.mlir.invoke_call
return a dummy result if the function being called is void (or always, in addition to the regular result). Leading to something like:
%invoke = llvm.mlir.invoke_call @foo() : () -> unit
llvm.mlir.invoke_br(%invoke : unit) to ^bb0 unwind ^bb1