This operation can be marked “illegal” and you can just do speculatively rewriter.erase(op);. The operation won’t be actually removed right now, instead when mark something as erased you are basically saying to the driver “I expect all uses of this to go away by the time everything is over”. The conversion will fail if the operation you marked as erased doesn’t actually get erased at the end.
I’m bit confused by it. What is the exact time of the erase operation? Does the operation will finally be erased?
I have done some experiments. Suppose I want to delete fooOp. I write a conversion to match it, and then use rewriter.eraseOp to remove it. But nothing happens. Only when I use op.erase() can truly erase it.
The rewriter.eraseOp seems to do nothing related to erase.
It depends on the driver. The base implementation for eraseOp() will assert that the op has no result, and erase it.
The dialect conversion only mark it to be replaced by a nullptr and assuming all the users of its results gets converted it’ll be deleted after the conversion succeeds.
You can try to run with -debug to see a trace of what’s happening. Otherwise if you have a reproducer showing your issue, that would help.
Thanks. Your reply hits the point. The thing that I’m trying to do is to convert linalg into another CUDA-like IR. The problem is about to change funcOp’s signature according to the ops inside of funcOp.
In linalg, one can declare a new tensor by calling tensor.empty and take it as a return value. But in CUDA kernel, it’s impossible. If you map tensor.empty into malloc, the thing that you are doing is to create a uncontrollable memory area in device-side and return it to user.
So I want to add input arguments into the function according to the number of tensor.empty. In more detail, the transformation is like this:
func.func (){
%0 tensor.empty: () -> tensor<1024xf32> // function malloc a tensor automatically
... // do some math
return %0
}
|
\|/
// let user malloc the arg0 by themself, and pass it as a parameter
func.func (%arg0: tensor<1024xf32>){
... // do some math
return %arg0
}
In order to achieve it, I only mark func.FuncOp as an legal op when it doesn’t contain any tensor.empty operation. But as you have mentioned, eraseOp will not delete the op instantly. So even I mark all the tensor.empty using eraseOp, the number of tensor.empty OP will not change during the legalization process.
Currently my solution is to call tensor.empty.erase() directly. I’d like to know if there are some better solutions.
Don’t. You should not under any circumstances use direct mutation API from inside patterns. This leaves the internal maps of the dialect conversion in an inconsistent state, specifically there will be a dangling reference to the op you just deleted. The address of this reference may and likely will be reused by the allocator for the next op that is created, leading to an extremely difficult to catch error, invisible to any tooling including asan and valgrind. Errors of this kind have costed me weeks of debugging time, and I wrote parts of the conversion framework…
You can play tricks with attributes, e.g,. add an attribute on the tensor.empty indicating that it is scheduled for erasing and only declare as legal functions that contain tensor.empty with that attribute. Or just do the entire conversion inside the pattern that rewrites func.func marking it as valid, or even outside the conversion infra entrirely.