Hi all – @mehdi_amini and @ftynse were offline discussing next steps for the work on MLIR python bindings that @zhanghb97 has started as part of a GSOC assignment. We realized that discussing around this had taken a bit of a twisty path through the forum and was never raised specifically as an RFC. This message seeks to rectify that.
Background on prior threads:
- Original discussions on the beginners channel
- Prior message on approach via a C-API
- This week’s message that prompted this RFC
Approach
We would like to begin taking incremental steps to in-tree Python bindings that:
- Seek to provide a Python-flavored but largely isomorphic mapping of the low-level MLIR API surface area (
IR
,Pass
, etc). - Are structured as a native pybind11 extension.
- Are built on top of a co-developed C-API.
I would be happy to discuss/defend any of those design points, but in previous interchanges, there have not been serious objections, so I will leave further elaboration to be done by request versus just writing a long RFC if there is already general agreement on approach.
We would like to approach this incrementally, starting with committing the boiler-plate code to build/test, then defining the Context facilities to parse asm, query the IR and traverse the core Operation structure. Then we will seek to add mutation/construction and builders. Finally, interop with outside/custom types and attributes will be something that likely requires a more detailed design/prototype and will be easier to see once the basics are in place.
Potential issues
We are concerned and keeping an eye on how this evolves on a couple of fronts:
- If we only provide the low-level interfaces without any of the ODS generated code, this starts to feel like a “JSON of IRs” which is not deemed a great final state (i.e. it may be appropriate for some integrations to use these low level APIs but there should be a high-level ODS-integrated story to be considered complete). There are a number of ways these higher level parts of the API can be constructed and it isn’t clear yet what the right answers are.
- It may be prohibitive to model everything with a pure C-API, and it is trivially easy to get started by wrapping some of the C++ classes. However, this creates long term ABI issues that would be best avoided. We may, however, proof some aspects out with a C++ wrapping while co-developing the C-API. For “core” parts, such intermediate states should not be considered indicative of where this will go.
- It is clear that higher level APIs, geared towards easy meta-programming are likely quite attractive. We would like to revisit these after the core API layer is established.
Related work
- Some pragmatic bindings have been written in npcomp. It is a goal to eventually get to functional parity or at least be able to redirect these to depend on/augment core bindings exported by MLIR directly.
- The original OSS repo had some Python bindings developed around EDSCs that have served as some inspiration (these represent something that we would like to enable the creation of, built on top of the lower level work we are proposing here).
- I wrote down integration and style principles here and testing principles here.
Prototypes
- Initial boiler-plate for Python bindings patch – I wrote this last night to identify how to integrate build/testing for such a thing. If there are no objections to this RFC, I would like to decide on final names/directory structures and commit this patch.
- “Reboot” of the CAPI boilerplate – Alex wrote this last night as an example and will be following up with his own RFC. Presented here to give an idea of naming/structure.
Thanks!