Ownership model in Python bindings

Context: [RFC] Starting in-tree development of python bindings

I’d like to clarify two points regarding the ownership model in python bindings. Ping @stellaraccident, @mehdi_amini and @zhanghb97 for visibility.

Lifetime of Context and Module

In C++, we have the following ownership expectations. The user of MLIR owns (at least one) MLIRContext. “Global” uniqued entities such as types and attributes are owned by that context and are live as long as the context itself. Non-uniqued things are owned by their parent objects: an operation owns its regions, which own their blocks, each of which owns further operations, and so on recursively. When operating on IR, the user owns one or more top-level modules through OwningModuleRef. Since a module is-a operation, by owning the top-level module the user owns the entire IR and can manipulate it as long as the module and the context are live. This is partially enforced at the API level by taking non-const references to context and module, i.e. a temporary is not allowed.

In Python, this becomes tricky as it is easy to write something like module = mlir.ir.parse_module("module {}", mlir.ir.create_context()) with little guarantees on the lifetime of the context. We need provide the guarantees ourselves. I propose the following convention:

  • any “global” object created in a context extends the lifetime of the context, i.e. the context remains live as long as the object itself (as if the object had a shared_ptr to the context);
  • any object that is created from scratch extends the lifetime of the context in which it is created, e.g. parsing or defining a top-level module;
  • any child object obtained from a parent object extends the lifetime of the said parent object, and transitively of all ascendants.

This boils down to, essentially, any live reference to an IR object maintaining the context alive and any live reference to a non-“global” IR objects additionally maintaining the top-level operation (module) alive. Such a model is common in garbage-collected languages.

Practically, this can be easily implemented with pybind11 by annotating the functions with py::keep_alive. Inside Python, it boils down to classes having an additional __keep_alive list that contains references to parent objects thus keeping them live for the GC.

Ownership of New IR Objects

The Limited use of globals section in the current bindings description proposes to create objects off their parent. This combines two currently separate actions in the C and C++ API: creation and insertion. Furthermore, the specific example it uses isn’t directly implementable: op.new_region() implies that a new region owned by the op will be created, but the list of op’s regions is fixed when the op itself is created and cannot be modified. Instead, the op constructor expects the caller to transfer the ownership of already-constructed regions.

My proposition here is to create new objects off the context instead, which seems in line with the rationale of limiting the use of globals. We could instead have op.ctx.new_region() and region.ctx.new_block() that create new IR objects that will be owned by the caller. These objects can be then attached to a parent IR object, with ownership transfer as usual: region.blocks.append(...) / region.blocks.insert(42, ...) or ctx.new_op("func", type, [region]).

Note that that section is mutable and has not been fully debated. It represents some experience I had on trying to do this previously.

I like this. This combined with the parent-child keep_alive rule means that the python reference to the block/op/etc keeps the producing context alive for at least as long as the reference itself is present. When ownership is transferred on the C++ side, the original python reference may still be keeping the context alive, but this is fine.

Do note that there is a further hazard here post transfer: post-transfer there is nothing stopping the backing child object being erased/freed/etc with no connection back to the original python reference. While not related to this post, we may want the Python wrapper for such transferable objects to contain a null pointer, which is checked on each access. On transfer operations, we could clear the pointer, rendering the original python object invalid for further use but not having a memory hazard.

I agree with these rules. Need to run to a meeting but have tried some examples and not found a hole. Note that the API needs to be structured so that keep_alive calls only extend the most immediate parent, as it is not generally possible to cast through a layer of indirection and, say, keep-alive a parent’s parent. If we find ourselves needing to do that, it likely means that we should rework or flatten the parent-child relationship somewhat to comply.

Small unrelated bike-shedding: I feel that such free-standing functions on the C++ side often layer nicer as a static method on the python class they primarily produce. In this case, I would just put a static parse method on the Module class. Alternatively, I might put the parse_module on the Context class. There are reasons why these are organized separately on the C++ side that have nothing to do with Python (more about object-file boundaries).

It may be a less “pythonic” way but I don’t really understand why is it more tricky than in C++? You need to keep a reference to the Context object that outlive the IR.

Region does not have any relationship with the Context, does it?

There is a general expectation that python should not segfault if a dependent object goes out of scope as a result of it being collected prior to its use. A lot of older, simple C wrapper bindings violate this and are quite finicky and non obvious to use as a result, ime. In older setups, this was often papered over by having a swig’d low level binding and then having pure python wrappers around it to help with these edges. In general, with pybind11, it is usually possible to get there without such a two layer solution (if structured right).

Having bindings that are structurally crash prone for such common idioms is more severe than just not being pythonic, imo.

There are of course limits: I don’t think we’re going to be successful at eliminating all of the illegal access hazards that come with allowing arbitrary IR manipulations of unmanaged memory (at least without significant API awkwardness), but getting lifetimes right for parent-child relationships like this should be possible and I think is important.

I guess the reason why it’s more tricky than in C++ is because the parseSourceString API for C++:

OwningModuleRef parseSourceString(llvm::StringRef moduleStr,
                                  MLIRContext *context);

assumes that user keeps a reference for MLIRContext object outlive the IR. However, in the Python side, user can use mlir.ir.create_context() as the parameter directly without keeping the reference outlive the IR. In this case, for the Python side, we should keep the context object alive as long as the module object is live, by using the py::keep_alive in pybind11.

I was thinking about explicit nullification post-transfer for a different reason: if no other piece of the IR owns a given object, we need to call mlirBlock/Region/OperationDestroy when the last Python reference goes out of scope. But we shouldn’t do that after transfer.

This does create some awkwardness in the API though:

block = op.ctx.create(...)
op.regions[0].append(block)
# cannot use block here

An alternative could be to add a reference to the parent when adding an object to the parent, and set up some internal “moved-out” flag without nulling the backing object. This will keep the object usable and make it consistent with objects obtained by inspection (i.e. after append, block will behave as if it were obtained by op.regions[0][-1]). This will not however remove the problem of attempting to add the same object to different parents, we can only check that it doesn’t already have a parent and throw otherwise.

I’m happy with static functions, I just followed the examples already in the doc :slight_smile: Mehdi has a good point:

regions and blocks don’t need a context to be constructed. So we can have mlir.ir.Block.create(...) and mlir.ir.Region.create(...). That being said, Block constructor accepts types defined in a context, so it needs to extend the lifetime of the context (or that of types), at least until the block is added to something that also extends the lifetime of the context.

Operations get the context transitively from Location, which is a “global” uniqued object that will extend the lifetime of the context, so maybe they should just extend the lifetime of the Location instead.

Python doesn’t make it easy to reason about lifetimes and ownership, unlike C++.

def foo():
  ctx = mlir.ir.Context()
  do_something(ctx)
  # is ctx still usable?
  # is the generational GC allowed to collect ctx here?
  do_something_else()

Wow, thanks for the example, I really thought that the ctx handle wasn’t released before foo() terminates.

Good point: I had avoided this in my previous work and hadn’t considered it here. Operation, Block, and Region can exist in this detached state. Any others?

Before reading this paragraph, I was thinking similarly. Taking PyOperation as an example, it would really need to represent either a detached operation or an attached operation. The former would have unique_ptr semantics. The latter, raw pointer. In previous work, I modeled this as different PyOperation subclasses with a virtual Operation* getOperation() method (actually, I only did this for blocks, but same difference). The issue with that is that you can’t have API niceties like you propose: the type can’t switch between the two states on the fly. I think just adding a detached flag to such wrapper classes is ok. The ability for a python reference to an Operation to switch from detached to attached as a result of being added to something (and throwing if already attached) seems like it could work. We could print the detached state in the repr for clarity.

There is an additional awkwardness: when appending to a parent, you can lifetime extend the receiving pseudo container but you can’t remove the existing keep-alive on the context. That doesn’t seem to cause any harm, though (aside from a bit of waste in the keep alive lists).

One potential improvement that diverges from the c/c++ API but I have found nice in practice: where you have methods like append that transfer ownership, also provide an append_new that acts like the context-create version but skips the detached state, returning the attached reference directly. It doesn’t cover all cases of IR construction but is by far the common case, ime. This is how I avoided the detached state switching in my previous work (and it yields a small efficiency win, I suppose). Having that pattern for the common case but preserving the ability to create detached and transfer in advanced ways (ie. Insert at position, etc), seems like a good API balance to me.

I actually don’t know, my comments are questions :slight_smile: From what I gather, most of the gc-related behavior is unspecified, and reference counting is not a required behavior. The basic CPython implementation does use reference counting and it would be reasonable to expect that it decreases the refcount on exiting the block. At the same time, name binding rules require the that the whole block is parsed completely, so one may be tempted to also inesrt refcount decrements after the last use in the same sweep. And there are other implementations that may or may not use refcounting. So I’m going for a solution that doesn’t require me to dig into all those details…

It is not (well) defined. In practice, if the value is bound to a named slot in a closure scope, it will persist until the scope is destroyed (or reassigned). However, if it is a pure local that is only realized on the stack, then its lifetime is indeterminate and it’s recount could drop to zero at the last point it is popped from the stack. Iirc, C-Python emits byte code to read and write such values to the local scope, which will then bound their lifetime, and it historically has been pretty conservative on its byte code optimizations at this level.