[RFC] First step Python bindings for C API

context - Next steps on python bindings
prototype - D85481 - [mlir][WIP] Initial python bindings for C API

First Step Requirements

According to the Next steps on python bindings by @stellaraccident, I implement the first patch D85481 for the C API Python bindings to meet the requirements. As expected for the first step, my patch achieves the following things:

  • Binding for MlirContext

I use a wrapper class PyMlirContext, and then bind the wrapper class as MlirContext to make sure the context object can be created and destroyed correctly:

/// Wrapper around MlirContext.
class PyMlirContext {
  PyMlirContext() { context = mlirContextCreate(); }
  ~PyMlirContext() { mlirContextDestroy(context); }
  ... ...

  MlirContext context;
  • Binding for MlirModule

Similar to MlirContext, I define a wrapper class PyMlirModule for MlirModule:

/// Wrapper around MlirModule.
class PyMlirModule {
  PyMlirModule(MlirModule &moduleRef) : module(moduleRef) {}
  ~PyMlirModule() { mlirModuleDestroy(module); }
  ... ...

  MlirModule module;

Then binding the PyMlirModule as MlirModule:

py::class_<PyMlirModule>(m, "MlirModule")
    ... ...
  • MlirContext.parse() method that returns an MlirModule

The parse() is defined in the PyMlirContext class, it takes an ASM (const std::string &) and returns PyMlirModule *:

PyMlirModule *PyMlirContext::parse(const std::string &module) {
  auto moduleRef = mlirModuleCreateParse(context, module.c_str());
  return new PyMlirModule(moduleRef);

Then binding the PyMlirContext::parse() as MlirContext.parse(). To extend the lifetime of the context object, I use py::keep_alive here:

py::class_<PyMlirContext>(m, "MlirContext")
    ... ...
    .def("parse", &PyMlirContext::parse, py::keep_alive<0, 1>());
  • MlirModule.dump() method

As suggested in the Next steps on python bindings topic, I bind the MlirModule.dump() with mlirModuleGetOperation and mlirOperationDump:

void PyMlirModule::dump() { mlirOperationDump(mlirModuleGetOperation(module)); }

Then binding the PyMlirModule::dump() as MlirModule.dump().

Code Structure and Style

According to the Composable modules and Submodules sections of MLIR Python Bindings document, I organized the code as follows:

  • Separate the populating functions ( populateIRSubmodule) from PYBIND11_MODULE global constructor. The declaration and implementation of populating functions and wrapper classes are in the IRModules.h and IRModules.cpp
  • Design in submodule: mlir.ir. Define the submodule (irModule) in the outer module, and then pass the submodule to populating functions.

My current patch use the following naming styles for the Python side:

  • Use snake_case for methods, such as parse and dump.
  • Use CamelCase for classes, such as MlirContext and MlirModule.

Questions and problems

  1. Memory model

My current patch use a wrapper class to take care the memory management. The constructor and destructor are responsible for creating and destroying objects. For example, for context object, the constructor calls mlirContextCreate() and the destructor calls mlirContextDestroy(MlirContext):

PyMlirContext() { context = mlirContextCreate(); }
~PyMlirContext() { mlirContextDestroy(context); }

While I have questions about that:

  • Should we define the wrapper class for all the opaque types?
  • Is it the proper way to use wrapper class for memory management?
  1. Linking libraries

Based on the boiler-plate for python bindings, I wrote the IRModules outside of MainModule. The MainModule defines submodule and passes it to populateIRSubmodule function that defined in IRModules:

// MainModule.cpp - IR submodule.
// Define and populate IR submodule.
auto irModule = m.def_submodule("ir", "MLIR IR Bindings");

As for MainModule and IRModules, I’m not sure should I build libraries and link them separately? When I tried to build them separately and link the IRModules to MainModule, it reports that undefined reference to 'populateIRSubmodule(pybind11::module&)'.