Context:
Per discussion on the above PR and other observations, I think we need to revise our approach to ownership and validity in the python bindings. The pattern we have been following has been to use the py::keep_alive
facility provided by pybind11 and avoid keeping explicit references back to the python-land hierarchy. This has resulted in a couple of problems:
- It really only models direct parent-child relationships where the parent must always be an argument of the function (incl. this/self). This makes the API awkward and divergent from the C++ API and ownership model where, typically, objects are dependent on the
MlirContext
regardless of how they are created (i.e. if the creation method takes an implictMlirContext
or can derive it through another means). To get around this, we introducepy::keep_alive
on intermediate objects, which has the desired effect but is wasteful and imprecise, extending the lifetime of various temporaries and introducing more accounting lists and activities behind the scenes. - Without being able to track back from an
MlirContext
(which can always be derived) to aPyContext
(the python-land wrapper), we are missing a place to perform other accounting, interning and the like. This makes it impossible to provide robust lifetime management for mutable objects, since there is no python-API side management point where we can add logic to, say, invalidate all mutable python wrappers after a bulk-mutation, or intern (Operation|Value)->python wrappers in a way that lets us invalidate wrappers in the presence of python-initiated erase calls.
I’m proposing that we make all objects in the Python API be able to directly retain a reference to the owning PyContext
(which wraps an MlirContext
), eschewing the py::keep_alive
mechanism. I’m further proposing that we introduce new base classes for immutable/uniqued context-owned objects and mutable client-unowned objects. For the former, they have no restrictions on being accessed for their lifetime. For the latter, we would enforce a “generation check” at the PyContext
level that will let us perform bulk invalidation when crossing boundaries where we can no longer be certain that pointers will not be left dangling (i.e. a good example would be when invoking the PassManager).
This also gives us a place to add more interning/accounting as needed with the goal of supporting arbitrary python mutations in a way that is safe/cannot produce dangling pointers, although elaborating on that is left to followups that introduce such mutations.
Here is some rough sample code of the class hierarchy:
// Holds a C++ PyContext and associated py::object, making
// it convenient to have an auto-releasing C++-side reference
// to a PyContext.
class PyContextRef {
public:
PyContextRef(PyContext &referrent, py::object object)
: referrent(referrent), object(std::move(object)) {
}
PyContext &operator->() { return referrent; }
PyContext &referrent;
private:
py::object object;
};
// Bound context, the root of the ownership hierarchy.
// MlirContexts are interned so that there can only be one
// PyContext for an MlirContext, and it is always possible to
// track back from an arbitrary MlirContext to its associated
// PyContext.
class PyContext {
public:
// Returns a PyContext for an associated MlirContext, creating if
// necessary.
static PyContextRef forContext(MlirContext context) {
// ... details to avoid spurious copies/conversions ...
py::handle existing = ... lookup ...;
if (!existing) {
// create
}
return PyContextRef(nativeRef, pyRef);
}
~PyContext() {
liveContexts.erase(context.ptr);
mlirContextDestroy(context);
}
// The current "mutation generation". Mutable objects managed by the
// context can only be validly accessed when the generation has not
// changed.
int getScopedGeneration() { return scopedGeneration; }
// Invalidates all scoped objects. Should be called for any bulk
// mutation that leaves the IR in an unknown state where pointers
// may be left dangling.
void invalidate() {
scopedGeneration += 1;
}
MlirContext context;
private:
PyContext(MlirContext context, py::handle handle)
: context(context) {
liveContexts[context.ptr] = handle;
}
int scopedGeneration = 0;
// Interns the mapping of live MlirContext::ptr to PyContext instances,
// preserving the relationship that an MlirContext maps to a single
// PyContext wrapper. This could be replaced in the future with an
// extension mechanism on the MlirContext for stashing user pointers.
// Note that this holds a handle, which does not imply ownership.
// Mappings will be removed when the context is destructed.
static llvm::DenseMap<void *, py::handle> liveContexts;
};
// Base class for all objects that directly or indirectly depend on an
// MlirContext. The lifetime of the context will extend at least to the
// lifetime of these instances.
// Immutable objects that depend on a context extend this directly.
class BaseContextObject {
public:
BaseContextObject(MlirContext context) : context(PyContext::forContext(context)) {}
PyContext &getContext() { return context.referrent; }
private:
PyContextRef context;
};
class PyAttribute : BaseContextObject { ... };
// Base class for any "scoped" objects owned by some top level object
// associated with a context. This includes all mutable objects in the
// operation hierarchy whose state is indeterminate when a bulk IR mutation
// takes place. Such objects must guard all accesses with `checkValid()`
// which will ensure that the context has not had its managed, mutable
// object invalidated.
class ScopedContextObject : public BaseContextObject {
public:
ScopedContextObject(MlirContext context)
: BaseContextObject(context),
currentGeneration(getContext()->getScopedGeneration()) {}
// Checks that the object is still valid to access.
void checkValid() {
if (currentGeneration != getContext()->getScopedGeneration())
throw py::raiseException(...);
}
private:
int currentGeneration;
};
class PyOperation : public ScopedContextObject { ... };
Alternatives considered:
- Using a C+±side shared_ptr/shared_from_this to wrap the MLIRContext. I took this approach with the IREE and npcomp iteration of these bindings, and it turns into a real mess, making some of the lifetime issues slightly better but still leaving corner cases. Also shared_ptr is… not great… and then you end up with double reference counting. Also, the magic that makes it work only applies if directly wrapping the C++ API, not the C API.
- Using a context manager and thread local tracking of activations, restricting visibility of mutable IR objects to withing the
with context:
block. From an API standpoint, I would rather avoid this if possible since it makes the API hard to use. On the implementation side, it also tends to require even heavier accounting, creates implicit bindings to the current thread, etc. I was able to convince myself that the above approach gives us the tools we need and we don’t need to use the bigger hammer of awith
context manager to scope IR access. (note: we may still want a context manager for higher level APIs like EDSCs, etc, but it gets more natural at that level).
Any ideas/preferences? @ftynse @mehdi_amini