In [mlir python] Port Python core code to nanobind. by hawkinsp · Pull Request #118583 · llvm/llvm-project · GitHub MLIR’s python bindings is being ported to nanobind. Repeating the discussion there here for visibility. Please try out the PR locally and give feedback if you run into any issues. We will let the final PR sit for at least five days post review to await for any feedback from downstream user testing during this period.
Why? Why another binding library? - nanobind documentation says it better than but the primary motivation for this change is to improve MLIR IR construction time.
For a complicated Google-internal LLM model in JAX, this change improves the MLIR lowering time by around 5s (out of around 30s), which is a significant speedup for simply switching binding frameworks.
To a large extent, this is a mechanical change, for instance changing pybind11::
to nanobind::. Folks using PybindAdaptors.h should require no to little changes. But it may also be an opportunity to port your dialects and eliminate pybind dependency once it lands.
There are some notable changes in the above PR:
- The above PR does not port the in-tree dialect extension modules. They can be ported in a future PR.
- Removed the
py::sibling() annotations from def_static and def_class in PybindAdapters.h. These ask pybind11 to try to form an overload with an existing method, but it’s not possible to form mixed pybind11/nanobind overloads this ways and the parent class is now defined in nanobind. Better solutions may be possible here.
- nanobind does not contain an exact equivalent of pybind11’s buffer protocol support. It was not hard to add a nanobind implementation of a similar API.
- nanobind is pickier about casting to
std::vector, expecting that the input is a sequence of bool types, not truthy values. In a couple of places I added code to support truthy values during casting.
- nanobind distinguishes bytes (
nb::bytes) from strings (e.g., std::string). This required nb::bytes overloads in a few places.
7 Likes
@mikeurbach , @jdd for circt
@makslevental for bindings generally
@marbre to make sure this is on IREE’s integrate calendar to test
2 Likes
philass
3
Thank you @jpienaar, @stellaraccident for broadcasting this change.
I am the owner of Groq’s LLVM dependency. We are a heavy user of the python bindings. I am actively working on trying this change out in our codebase, and hopefully it will be straight forward.
I’d like to raise some concerns that we have with the choice to move to nanobind. These concerns are around the ubiquity, robustness, and ownership of a third-party package that all downstream users must depend on.
- What other projects depend on nanobind. Is there a project of similar scale to MLIR that uses nanobind.
- This pull request sets a dependency on nanobind 2.4 which was only released in the last week. Is that level of churn healthy? Are there any other examples of LLVM setting such a tight window on a third-party dependency versions? Will this window be changed frequently when/if other bugs are found?
Though the code changes may be mechanical, the packaging of a third-party dependency isn’t always simple. In our case we use a nix based package management system which long predates our compiler. I’m hoping that getting nanobind into our monorepo is straight forward. But I will likely have to work cross functionally with our internal infrastructure team based on the gotcha’s of our build system.
For a complicated Google-internal LLM model in JAX, this change improves the MLIR lowering time by around 5s (out of around 30s), which is a significant speedup for simply switching binding frameworks
I do want to stress that I am appreciative of the work done to improve the usability on the python bindings. The wins of this pull request are clear. My concerns are really about the handling of third-party packages and possibly starting a dialogue on some rules for MLIR depending on third-party packages
1 Like
Hey,
I’ll start with the latter half questions as that part is easier (also I see others typing that may know more).
The only reason for this short time frame is that a feature was introduced to enable a simple conversion from Pybind here and makes it simpler for downstream folks. If bugs are found, then I think an update is good. I doubt it would be frequent: if I recall since Jax first transitioned to nanobind last year November, the version has been updated 3 times.
Nix side I don’t really know ( I see python3.11-nanobind - MyNixOS, but have no idea how these fit). So please let us know how it goes.
I think you are considering the python bindings as part of MLIR, and I can understand that, others would consider that separate. I believe only third party dependencies (but could be shown wrong) for the core is in testing on gtest, and I don’t see that changing and others being added. Keeping it self contained. I won’t say never as I don’t think that’s documented.
Now python side has been different from start, mostly be necessity to not reinvent wheel here (folks would point out that one could create Python bindings with plain C here and no extra library, but it would have been a larger endeavor). But even with that, I think Pybind was only real dep, with optional additional features if one had installed numpy and ml_types. This indeed changes it over from one 3p dependency to one or two, mostly to enable folks to migrate at their pace/if they see benefit migrating their dialects.
It is a valuable discussion, but definitely don’t consider this as changing the current status quo of keeping dependencies minimal.
Let us know how it goes. The project has no set policy on any of this, and I somewhat arbitrarily requested that the author include a 5 day testing window specifically because I wanted to err on the side of stress-reduction while not imposing a tax on the contributor to account for all usage. Consider “5” an adjective vs a ticking clock. If we’re still trying to figure out how to make it work or right up against the holidays or something, my personal opinion is that we should aim to not add stress to people’s lives unnecessarily. But let’s make decisions based on feedback/experience vs speculation.
A few other comments inline.
IREE and its derivatives switched 1.5 years go. It has been one of the single best dep decisions I’ve made. Not only is it much-much faster to compile, it produces smaller binaries and has a much more lean interface to the underlying Python machinery that all adds up to significant performance improvements. Worked exactly like it said on the tin.
I’m not there, but based on back-channels in lead up to this, I’m told that Google has had significant, positive experiences too.
It won’t be the last, unfortunately: the Python ecosystem has entered a period of high churn as all of the binding tooling adapts to free-threaded python, the various memory management improvements, and new features, but it had previously been in a pretty long valley of relative stability (and maybe will again – hard to predict).
For both pybind and nanobind projects, we’ve had to move to the latest release as soon as available a couple of times in the last year in order to continue to target the newest Python runtimes and features. I’m sorry to say but I think this churn is more the cost of the Python ecosystem being in a phase of rapid evolution and we’re probably just going to have to keep up. We also have a pending patch that would have required a bump to head on pybind in order to advertise compatibility with certain Python 3.13 features, so there would have been a version bump around now no matter what.
I can’t say I’m fond of this and I know it causes overhead. The only thing I will add is that it is a relatively simple thing to integrate from a packaging standpoint and the author of nanobind is a pro who seems to have a good track record of producing solid releases. There’s really not that much to it, compared to some libraries, so I’m pretty sure there will be a path to integrate it anywhere needed.
philass
6
I appreciate you mentioning this. That reduces my stress levels with this change greatly.
I spent some hours today trying to tackle the nanobind packaging from the nix side. I’ll give it a few more hours of effort tomorrow before I escalate it to our nix infra team.
I’ll continue to be responsive with our status on testing this change internally.
I’m going to be taking a look at this from the CIRCT side shortly. Apologies for the delay… was out last week. In general, we are very happy to switch to nanobind.
One question came up when we discussed today… I’m reviewing the PR now to try and understand this for myself, but we were under the impression upstream bindings could switch to nanobind, and if we wanted, we could keep using pybind downstream. Not that we want to, but more that we don’t have to block on our nanobind migration to integrate LLVM past the upstream switch.
Specifically, it looks like at least in the initial PR things we use in PybindAdaptors.h like mlir_attribute_subclass remain… but I’m not sure that would actually work with the upstream switch to nanobind. Is it expected folks like us using those facilities will need to migrate to nanobind to integrate LLVM past the point upstream switches? Again, that’s not huge deal; we want to, I’m just trying to understand how urgently we’ll need to work on it if so.
I’m not the author of the patch, but I believe what we are attempting here is allowing downstream dialect extension libraries to be built with either pybind or nanobind. That support was added a couple of weeks ago. Then the author went all the way and ported the _mlir extension itself to nanobind, which is what makes nanobind non optional. I believe that even in tree, it is left for follow-up work to port dialect extension libraries.
The intent is that those activities are completely decoupled for both in tree and out of tree use since the core extension module and dialect extensions, by design, only ever interop via capsules and the C API – so effectively, how any piece is built is independent. Someone could code a compatible extension library in pure C for example. This is different from some other projects which share state between extensions via the binding framework itself. If that were done here, it would be a hard, atomic switch for everyone.
Now granted, we’ve never before put weight on this specific design point. I don’t think there should be issues… But you know.
The intent is no. But you will need the nanobind dep satisfied in order to compile the core _mlir extension. The choice of binding framework does not escape any individual extension binary. In theory.
FYI – looks like the iree team has concluded that the PR works and the mixing works. I don’t think they’ve run the full test suite yet (which may shake out actual bugs if any): Discord
If you all have problems, can ask there since they have state.
1 Like
Awesome, thanks for confirming @stellaraccident.
Hey @philass ,
Do you have any update here? Others have reported low friction when integrating but wanted to check.
philass
13
Thanks for checking in!
I was able to build LLVM with nanobind in our monorepo. So we are unblocked on that end!
I still need to do a little work to convert our nix rules for internal bindings from pybind to nanobind. But that is not a blocker for our uplifts.
Thanks for the patience and collaboration @jpienaar and @stellaraccident. No complaints on our end if you merge this.
1 Like
I’ve finished testing on the CIRCT side and it’s all good. We had to avoid some uses of multiple inheritance that nanobind wasn’t happy with, and there is a little awkwardness with some of our pybind11 functions no longer automatically casting enums that moved to nanobind in the core library, but nothing substantial. I’ve updated CIRCT already and will integrate with LLVM now that nanobind has been enabled for the core library. Up next we’ll switch to nanobind ourselves!
1 Like