Personally, I’d like to see whatever we do on this next to be something that we see as reasonably principled and able to carry forward, and as someone who has had to write this two times quickly out of tree just to have something, I would consider a project in this area to be a success even if it just provided a seed with good patterns that others can build on.
There is no question in my mind that a good cffi based API for some of the higher level parts is completely possible/straightforward. The parts I am thinking of here include context creation, asm printing/parsing, pass invocation, and diagnostic handling. IREE has a pybind11 based implementation of this level of things here. Such an API is useful for tools that want to invoke MLIR based tools. In IREE’s case, we link in some project specific translators to run the compiler pipeline and generate artifacts: the linkage story here is non trivial and means that we will want to do this in a way such that projects can extend MLIR on the c/c++ side, have those symbols made public and bolt additional language bindings on dynamically.
What I can’t quite see is the path to a principled and usable cffi based setup for IR manipulation. Probably some combination of c-apis for OpState
(for construction), Operation*
, Region
, Block
, Attribute
, Type
to at least get the core data structures is a pre-requisite. There are some thick mechanics there but the data structures are simple. Also, it seems like anything usable would need a language specific tablegen story to match.
The issue I ran into pretty quickly was how to handle the extensibility of the Type/Attribute system, which is all c++ APIs and I don’t see a way around having implementation specific APIs to handle them (with, say, baseline parsing/printing coming for free). For construction, my attempt at this with pybind11 was to create a DialectHelper
base class and implement accessors for the std stuff directly (see a c++ custom subclass and python side). It works well enough for me for a single project, but I’m not thrilled with how it would layer in a more open system (and it relies on python multiple inheritance and dynamic class construction – sure signs that a python programmer has been here).
We now have three different pybind11 out of tree bindings, each built with admittedly pragmatic aims. I would welcome for whatever comes next in tree to at least get some foundational pieces in place that we/others can build on and extend. At this point, I would prioritize foundations over completeness, because, as an outside project, I can always add a new ad-hoc entrypoint to create or introspect my weird new type, but in the current state, there isn’t enough there that I can just show up with that new entrypoint: I have to start by defining… What is a context, how do I parse, etc.
Regarding the c vs pybind11 angle – as a python developer looking to wrap something, I will always choose pybind11 (for many reasons). However, MLIR is not principally a python project, and I have observed that when people stand up and say “we should have a good C API and base things on that” they’ve rarely been wrong. It has higher startup costs but is the only way to solve various issues that arise when trying to scale usage outside of a single project/binary. If we think that is the right direction for MLIR, I would consider the GSOC project successful if it got the pattern established and got to the point that we could start rebasing some of our uses on top of it. I’d also suggest taking some time to prototype it – these things tend to be highly sensitive to getting some basics right.
If we’d rather take an intree pybind11 approach, then we now have several examples to choose from and should probably have a discussion about what worked/didn’t and what to carry forward.