[dialect conversion] How to truly erase an operation in conversion?

syheliel · January 23, 2024, 7:14am

In MLIR’s Q&A, it says

This operation can be marked “illegal” and you can just do speculatively rewriter.erase(op);. The operation won’t be actually removed right now, instead when mark something as erased you are basically saying to the driver “I expect all uses of this to go away by the time everything is over”. The conversion will fail if the operation you marked as erased doesn’t actually get erased at the end.

I’m bit confused by it. What is the exact time of the erase operation? Does the operation will finally be erased?

I have done some experiments. Suppose I want to delete fooOp. I write a conversion to match it, and then use rewriter.eraseOp to remove it. But nothing happens. Only when I use op.erase() can truly erase it.

The rewriter.eraseOp seems to do nothing related to erase.

mehdi_amini · January 23, 2024, 8:40am

It depends on the driver. The base implementation for eraseOp() will assert that the op has no result, and erase it.

The dialect conversion only mark it to be replaced by a nullptr and assuming all the users of its results gets converted it’ll be deleted after the conversion succeeds.

You can try to run with -debug to see a trace of what’s happening. Otherwise if you have a reproducer showing your issue, that would help.

syheliel · January 23, 2024, 12:57pm

Thanks. Your reply hits the point. The thing that I’m trying to do is to convert linalg into another CUDA-like IR. The problem is about to change funcOp’s signature according to the ops inside of funcOp.
In linalg, one can declare a new tensor by calling tensor.empty and take it as a return value. But in CUDA kernel, it’s impossible. If you map tensor.empty into malloc, the thing that you are doing is to create a uncontrollable memory area in device-side and return it to user.
So I want to add input arguments into the function according to the number of tensor.empty. In more detail, the transformation is like this:

func.func (){
%0 tensor.empty: () -> tensor<1024xf32> // function malloc a tensor automatically
... // do some math
return %0
}
  |
 \|/
// let user malloc the arg0 by themself, and pass it as a parameter
func.func (%arg0: tensor<1024xf32>){
... // do some math
return %arg0
}

In order to achieve it, I only mark func.FuncOp as an legal op when it doesn’t contain any tensor.empty operation. But as you have mentioned, eraseOp will not delete the op instantly. So even I mark all the tensor.empty using eraseOp, the number of tensor.empty OP will not change during the legalization process.
Currently my solution is to call tensor.empty.erase() directly. I’d like to know if there are some better solutions.

ftynse · January 30, 2024, 2:48pm

Don’t. You should not under any circumstances use direct mutation API from inside patterns. This leaves the internal maps of the dialect conversion in an inconsistent state, specifically there will be a dangling reference to the op you just deleted. The address of this reference may and likely will be reused by the allocator for the next op that is created, leading to an extremely difficult to catch error, invisible to any tooling including asan and valgrind. Errors of this kind have costed me weeks of debugging time, and I wrote parts of the conversion framework…

You can play tricks with attributes, e.g,. add an attribute on the tensor.empty indicating that it is scheduled for erasing and only declare as legal functions that contain tensor.empty with that attribute. Or just do the entire conversion inside the pattern that rewrites func.func marking it as valid, or even outside the conversion infra entrirely.

Topic		Replies	Views
Dialect Conversion: arbitrarily lookup "updated" operand of an OP MLIR	1	45	September 11, 2024
How to use replaceAllUsesWith to well erase an Operation? MLIR	3	1122	December 7, 2022
Error: failed to legalize operation XXX marked as erased MLIR mlir	6	397	December 22, 2023
Help with mlir::value when converting between dialects MLIR	2	227	September 16, 2021
IRRewriter.eraseOp failed to delete Op MLIR	3	56	February 20, 2025

[dialect conversion] How to truly erase an operation in conversion?

Related topics