Context: [RFC] Starting in-tree development of python bindings
I’d like to clarify two points regarding the ownership model in python bindings. Ping @stellaraccident, @mehdi_amini and @zhanghb97 for visibility.
Lifetime of Context and Module
In C++, we have the following ownership expectations. The user of MLIR owns (at least one) MLIRContext. “Global” uniqued entities such as types and attributes are owned by that context and are live as long as the context itself. Non-uniqued things are owned by their parent objects: an operation owns its regions, which own their blocks, each of which owns further operations, and so on recursively. When operating on IR, the user owns one or more top-level modules through OwningModuleRef
. Since a module is-a operation, by owning the top-level module the user owns the entire IR and can manipulate it as long as the module and the context are live. This is partially enforced at the API level by taking non-const references to context and module, i.e. a temporary is not allowed.
In Python, this becomes tricky as it is easy to write something like module = mlir.ir.parse_module("module {}", mlir.ir.create_context())
with little guarantees on the lifetime of the context. We need provide the guarantees ourselves. I propose the following convention:
- any “global” object created in a context extends the lifetime of the context, i.e. the context remains live as long as the object itself (as if the object had a
shared_ptr
to the context); - any object that is created from scratch extends the lifetime of the context in which it is created, e.g. parsing or defining a top-level module;
- any child object obtained from a parent object extends the lifetime of the said parent object, and transitively of all ascendants.
This boils down to, essentially, any live reference to an IR object maintaining the context alive and any live reference to a non-“global” IR objects additionally maintaining the top-level operation (module) alive. Such a model is common in garbage-collected languages.
Practically, this can be easily implemented with pybind11 by annotating the functions with py::keep_alive
. Inside Python, it boils down to classes having an additional __keep_alive
list that contains references to parent objects thus keeping them live for the GC.
Ownership of New IR Objects
The Limited use of globals section in the current bindings description proposes to create objects off their parent. This combines two currently separate actions in the C and C++ API: creation and insertion. Furthermore, the specific example it uses isn’t directly implementable: op.new_region()
implies that a new region owned by the op will be created, but the list of op’s regions is fixed when the op itself is created and cannot be modified. Instead, the op constructor expects the caller to transfer the ownership of already-constructed regions.
My proposition here is to create new objects off the context instead, which seems in line with the rationale of limiting the use of globals. We could instead have op.ctx.new_region()
and region.ctx.new_block()
that create new IR objects that will be owned by the caller. These objects can be then attached to a parent IR object, with ownership transfer as usual: region.blocks.append(...)
/ region.blocks.insert(42, ...)
or ctx.new_op("func", type, [region])
.