[C API] Operation pointer re-use

If this has been discussed in the past, I apologize. I couldn’t find a discussion about it here.

TL;DR: I think we need something in the Operation C API to notify users when an Operation has been deleted.

The Python bindings cache a Python object for each MlirOperation indexed by the pointer within. I’d imagine that nearly all language bindings need to do a similar thing. This works as expected if Python is the only thing modifying the IR. If, however, we invoke a C++ transform which ends up deleting an Operation which happens to be in the cache (as I do in PyCDE), python doesn’t know that it’s “dead”. The object remains in the cache, which has two problems I’ve seen thus far:

  • Since the object doesn’t know that it is dead, it can cause segfaults if it does anything via the C API. Ideally, the object would know it’s dead and throw an exception instead of a segfault. (So that my python debugger doesn’t cause the process to die when it tries to display things automatically!)
  • More importantly, when the pointer gets re-used it can (and does) get resolved to the wrong object via the cache. This results in incorrect behavior which is pretty impossible to debug.

Ideally, there would be a C API which notified the user that an operation was deleted so the user could remove it from its cache and inform the object somehow. I’m assuming that Operation pointers don’t change during their lifetime (a quick skim of the Operation::create code indicates this), so I think we only need a delete notification.

I’m pretty sure that we just need a new function which allows users to register a function pointer to be called with the MlirOperation which has been deleted. The most straightforward (and foolproof) way to invoke it would probably be from the Operation destructor.

Am I understanding correctly? Does this make sense? Is there another mechanism which is intended to deal with this?

@River707 for an operation destructor hook. I assume this would need to be at the context level. I agree that such hooks are basically the only way to make completely memory safe bindings.

When we’ve discussed previously, we were still speculating based on the kind of bindings (build only, full IR access, etc). Since such a thing comes with a cost, best to consider it based on concrete need. I could still see reasons why we would deem the cost not worth it, but haven’t quantified it myself.

Just a warning here, but LLVM IR has a bunch of this sort of stuff to support the IR/ValueHandle.h functionality, including the ability to have callbacks through CallbackVH. This was well intended, but led to a lot of overly clever and incredibly fragile code in LLVM that doesn’t really work well. I hope we avoid such stuff in MLIR.

Bindings may well be a different issue, but we should be careful here :slight_smile:

Glad to see that I wasn’t missing anything.

Like per-context callback registrations? I don’t see why given that the user could handle routing to the correct handler (e.g. different handlers per-instance). Could be prudent to pass the MlirContext to the callback as well. In contrast to most MLIR features, I think this is a global, bound-language thing depending on how the bindings work (e.g. the Python bindings will want to be notified regardless of the context).

It is probably a good idea to support a set of callbacks to allow for multiple bindings in the same process.

When it comes to memory safety, my opinion is that we should err on the conservative side. That is, make the default assumption that every C API call could delete an Operation and some don’t – which would totally change the way the bindings work.

For the PyCDE use-case, a large number of C++ transforms are run during building. So we’d have to flush the cache and attempt to re-map it (if that’s even possible) constantly. I suspect this is going to be true for most “zero-abstraction” languages (as @clattner calls them). (Languages which are hosted in another language and just makes API calls into MLIR, as I understand it.) Not having this precludes a lot of use cases I suspect.

In terms of cost, I can see two potential expenses: code and performance. I don’t see the code to implement this being terribly complex, so I think the maintenance would be minimal. Performance: in the case where there are no callbacks registered, I think it’s just a nullptr check against a global pointer variable in the Operation destructor. (Again, I don’t think this is a context-specific thing.) It’s an easily predictable branch, so I think the performance overhead would be negligible.

Woof! I could see the ValueHandle stuff getting complex quickly. I’ve done stuff like that in the past. My experience is that it works fine at first, but when anything goes wrong it’s really hard to debug.

MLIR has a “listener” which can be attached to an OpBuilder. It’s observation-only, which I suspect keeps the complexity low. (And a C++ class which gets extended, something debuggers understand well.) The callback I’m talking about is merely a notification, just like the OpBuilder::Listener methods.

In Swift, MLIR concepts are represented by their pointer directly (meaning we strive for no extra state in the wrapper). We then push memory management to a higher level of abstraction, meaning the user of MLIRSwift is responsible for not accessing things that might have gone out of scope.

As you say, this is tricky to debug, so we try to not give end users the option to mess this up. One example is that Edith clones entire IR before, for instance, emitting Verilog. This is absolutely a trade-off but I think it is sensible for our use case.

So you never need to map Operations to Swift objects? I would have thought that Edith would keep around some pointers to MLIR Operations.

That is correct, we can always re-wrap an MLIR operation in a Swift type when we need it (since the Swift type is stateless). We will eventually need some way to refer to operations but that mechanism will be semantic, not pointer based. One example (which we don’t fully do yet) is we can run legalize-names which shouldn’t destroy any operations, then we can refer to things by their name since the rest of the pipeline shouldn’t modify those.

My personal philosophy/viewpoint is that MLIRSwift is “the thing you write a compiler pass with” and that Edith is “what a hardware designer uses”. The former isn’t memory safe, and I don’t see how you could (reasonably) make a fully safe MLIR binding with arbitrary IR mutations. The later is easy to make safe because you don’t allow unstructured mutations/deletions (generator frameworks are helpful/opinionated IRBuilder’s).

1 Like

However, I’m still not sold on whether maximum safety is what we want for this style of API :slight_smile: Previous discussions had allowed for a “global binding invalidation” after which any existing references to IR objects would be invalid and raise errors on use (vs illegal access). That is still an option to implement but puts limits on the style of API you want – I would view it more as a deflector for the foot gun vs a way to implement compiler internals in python.

This would force interaction models closer to how the swift bindings are operating, pushing things in more of functional/value semantics at boundaries.

I’m open to any of the options if we converge on something. All of my own python binding use cases are more in the build-and-forget category and therefore have some natural boundaries. I’m glad you’re speaking up about alternatives – the current state is a lazy compromise where we didn’t want to commit to something that lacked use cases and more detailed thought. These things are hard to roll back once committed to.