Lately, some of the folks at CIRCT have been working on bindings. We currently have Swift, NodeJS, and Python bindings to CIRCT, and we are getting to the point that we would like to collaborate on some upstream improvements.
For example, I would like to expose some of CIRCT’s custom types to Python. I have an idea of how I can do that with some refactoring of the upstream Python binding code, but I would also be interested in making a more general improvement that can benefit the other host languages like Swift.
We (the CIRCT folks and @stellaraccident) threw out the idea of having a face-to-face meeting to get to know the various stakeholders interested in bindings to MLIR projects, discuss our thoughts, and come up with some concrete next steps we can collaborate on to improve the situation.
Would it be worth discussing this in an ODM? Should we break out an have a separate session just for this?
I’m curious to hear who else (if anyone) is interested in these topics, and when would be a good time.
If we could find a way to improve this area, it would be really welcome. Here are some thoughts I had on it last year, but getting everything brought up took priority over putting any more time/thought into it: Structured, custom attributes and types (for non-C++ bindings)
I don’t hold the MLIR ODM schedule but would be +1 on either that forum or another one. I think there is plenty to talk about and enough people have had their hands in it that it would be good to sync in some fashion.
If it came to it, I could prepare a few slides on the state of the world as I know it in upstream. That is going to be incomplete. Maybe if Mike/the circt folks did the same and we did the join in real time?
Maybe if Mike/the circt folks did the same and we did the join in real time?
That sounds good to me. I can put together some slides of how CIRCT has been following the documented approach for Python bindings and what we are interested in supporting.
Thanks all for participating in this discussion. I will type up some of my notes here for the record. Please correct me if I’m misrepresenting something.
It would be great to have the ability to serialize from tablegen in a generic way (i.e. YAML)
This should include all the juicy information that ODS has worked out
As opposed to serializing raw tablegen records, which don’t have this information
From there it should be possible to deserialize that information and create C++ classes in a completely mechanical way
We can test this approach in-tree
Other host languages could also consume the serialized information to do their own thing
Complementary to the above, we can start looking into generating C APIs for things in ODS
We were specifically talking about custom Types and Attributes
For example, to generate constructors, accessors, and isa
Could be pretty specific, but might be nice to have a generic API like how you can generically build operations
We can start small with a simple use case, and find the rough edges where we might need more structure and less C++ (i.e. a declarative assembly format for types)
If we have a pressing need to have Python binding support for custom types and attributes, there are ways we can better expose this in a Python-specific way without doing any of the above
Am I missing anything else?
I’ll be learning more about tablegen so I can actually write an intelligible RFC about this.
There’s a lot of really useful pybind11 stuff in llvm-project/mlir/lib/Bindings/Python at main · llvm/llvm-project (github.com). Is the plan to ditch pybind11 in favor of the C API + a pile of Python (as much as possible autogenerated from tablegen)? Or is it to replace the python currently generated by tablegen with autogenerated C APIs and a YAML spec for the ops/types/attributes and have the python continue to be built on the pybind11-exposed C++ classes? (I’m pretty sure the latter, but wanted to make sure it’s not the former.)
If I assume correctly, it’d be useful the classes in IRModule.h exposed publicly so out-of-tree users could build pybind11 modules with functions which accept those types. Is this just a simple matter of moving that header file into the ‘include’ directory or is there also a non-trivial amount of cmake work involved?
To be clear: all of the pybind11 stuff is only using the MLIR C API and never the C++ API.
In terms of layering, MLIR builds a lib/libMLIRPublicAPI.so which exposes the C API and not the C++ API (and acts as the only interface to MLIR native code), and the pybind11 binding are built in a few python extensions (the main one is _mlir.cpython-38-x86_64-linux-gnu.so on my machine for example) which links to lib/libMLIRPublicAPI.so.
(and I don’t think we plan to change this layering right now)
Exposing the IRModule.h is possible, with the caveat that we were aiming to have the C API be the point of stability, IRModule looked to me more like an internal implementation of the Python API.
The issue is that cross module C+±level dependencies are still tricky: possible but when you go there, you ingest a lot of complexity and support costs. As is, IRModule.h has too much stuff, much of which reaches into what should be the private implementation details of the _mlir.so module itself.
What I was thinking of doing was providing some similar template classes to PyConcreteType and co plus some similar casting helpers but intended for external use, purely on the C-API types (i.e. MlirType vs PyType C++ wrapper). This isn’t as hard as it sounds, I think and I partially implemented it once in npcomp (in a patch that we abandoned for other reasons).
The basic idea is that outside of the core module, you use the capsule API to interop, providing type casters and some other scaffolding that avoids direct C++ linkage to cross module internals. I didn’t actually implement all of the niceties but am pretty sure that I could produce a public IRPybindHelpers.h which outside projects could use to mostly just have things work.
There might be other subsets of the IRModule.h which would be safe to export but I feel like there are also singletons and other things in there that should be segmented.
It is also possible to throw caution to the wind and allow cross module C++ deps. Projects do do that (i.e. PyTorch does this), but it is very fragile and I bias towards approaches that minimize the amount of weird platform specific linking issues I need to triage as a matter of construction.
Yes, by “C++ classes” I meant the classes within the Python bindings specifically intended for pybind11 bindings (e.g. PyBlock). Not the MLIR C++ classes themselves. Sorry about the confusion.
When you say cross-module C++ level dependencies, you mean cross python module dependencies? Not having done it myself, I don’t really understand the issues at play so I’m happy to defer to your expertise.
I’m actually not too worried about the Type and Attribute stuff. I’m confident that we (you) have a workable plan there. What I want access to (in addition to a types/attributes solution) are:
class PyBlock;
class PyInsertionPoint;
class PyLocation;
class PyMlirContext;
class PyValue;
My use case is that I have a random function (wrapped using pybind11) which mutates the IR. It takes a builder (for the insertion point), a location, and the op to transform. In the future, I might want to provide an API which have PyValue arguments or returns.
Are these classes safe to export?
Despite my employer, I gave up on supporting Windows some time ago. It’s too different for a lifelong Linux user like myself to understand! I barely understand complex linking issues on Linux, which is a black magic sometimes. A million thanks to you for enduring that torture on behalf of the rest of us!
The problem is that cross-module support in pybind11 does require some careful alignment of stars (and exact compiler versions, runtime library, etc) to function, and whether you know it or not, trying to use these directly will trigger that path.
Would it be ok if your function was declared like:
I can give you type casters in a header file that will make that just work with no weird linking stuff (i.e. you can just .def("my_function", &myFunction) and it will do the right thing.
Believe it or not, once this stuff goes bad, Linux is a pretty bad offender too, ime. The dynamic linker allows a lot of things to happen at runtime that break C++ in various ways that only happen sometimes in the field.
Oh I believe it. That’s the reason I said “too different”. Not necessarily worse, just different. Linking is one of those things which conceptually seems like it should be easy (and is often glossed over and taken for granted) but the details kill you. Works fine if you use it in standard ways, breaks if you look at if funny. So its bailing wire and duct tape unless you take the time to understand it.
I copied your PybindUtils.h, #included it, and the pybind type casters seem to be getting called, but I get the following error when I run the following python:
Traceback (most recent call last):
File "/home/jodemme/circt/../rtl.py", line 15, in <module>
esi.buildWrapper(m.operation)
AttributeError: '_mlir.ir.Operation' object has no attribute '_CAPIPtr'
import circt
from circt import esi
from circt.dialects import rtl
from mlir.ir import *
from mlir.dialects import builtin
with Context() as ctx, Location.unknown():
circt.register_dialects(ctx)
m = builtin.ModuleOp()
esi.buildWrapper(m.operation) # C++: void pyBuildWrapper(MlirOperation)
I also tried just m instead of m.operation. Am I doing something stupid or is there some detail I’m missing?