Flushing the queue on C/Python API work

Hey @ftynse, not super time sensitive because I’m context switching to something else for a few days, but would appreciate your review on these diffs (they are stacked and this is the right order):

After these, there are still some ownership nits but the biggest missing functional diff wrt the C-API is mapping Values. I’ll likely do that in one or two more diffs and then checkpoint about where to go next.

I suspect that the next piece that needs design work is dialect registration in the C/Python API. On the Python side, I think that also intertwines with the mechanism for providing custom Op subclasses that (ultimately) can be tablegen’d. I’m working over the design in my head and there are cross-overs with your recent RFC.

Beyond that, I think we’re nearly at the point where we just need a list of things that need to be plumbed through. Maybe we can write those down and solicit help?

  • Stella

Done. Values sound a bit special because they also need to keep their ancestor tree live but their immediate parent may be either a block or an operation. We don’t have BlockArgument / OpResult hierarchy exposed in the C API but it should be easy.

For dialects in C, it looks like the easiest way is to have each dialect provide a registration function, e.g. mlirContextRegisterStandardOpsDialect(MlirContext)

Custom op subclasses are tricky, and doubly so if we need to go through C. We have a lot of string-based manipulation of C++ code. Especially in places like interfaces and injected methods. I don’t think having the same level of API detail as C++ in Python is achievable automatically, and we probably don’t need that anyway. Autogenerated (from input/output lists) builders and getters looks feasible provided some type translation mechanism. We can probably bind ArrayRef and various range types, but something else is probably too annoying. Custom builders and custom functions – I’d say if somebody needed to write them in C++, they can also write them in Python.

Once the basic infra is in place, it feels like the rest should be need-driven. The use case I have for Python bindings requires AffineExpr/AffineMap/IntegerSet, ops from the Affine dialect and built-in FunctionOp.

Bigger chunks of work with a design component are:

  • pass registration and management;
  • diagnostics;
  • execution engine.

On the Python side, we can get away with that for things in core, but we likely are going to need some kind of other mechanism for link-timey discovery of additional native dialects. I hesitate to bring it up given that we just scalpaled that out of the C side, but in my opinion, the linkage issues surrounding the python ecosystem in the wild, and the relative sharp edges that exist if we require every dialect to expose a Python registration hook are going to be rough if we don’t have some link-time mechanism.

In Python, I can imagine the following scheme. We already keep track of all context objects. We additionally expose a hook that registers dialects with all known contexts. Any new dialect is build as a Python package which, in its __init__.py, registers itself with all contexts in the process. So import my_mlir_dialect makes the dialect automagically available.

In C, on the other hand, doing something link-time sounds nasty. We will have to essentially replicate the static registration mechanism we used to have in C++ and compile that into the library we use from C.

I guess we’ll need to do one to see it. There is going to be a fair bit of boilerplate and build support to get it right, and in the interests of not inventing that wheel a bunch of times, we should spend some time making that common/reusable.

I started a spreadsheet with the work to complete, as I see it: MLIR C/Python Bindings TODO - Google Sheets

(feel free to request edit access - just restricted due to sharing with the internet)