Study route of MLIR python bindings

Hi all,

I am willing to attend the MLIR projects of Google Summer of Code 2020.
After learning the concept of MLIR and viewing all the open projects, I’m interested in the MLIR python bindings project.
So could I have a study route or suggestion for the MLIR python bindings project? And what should I prepare for the project before the application period of GSoC 2020?




On the path to get started with this project, I can think of two items:

  • understanding MLIR concepts: the Toy tutorial is likely a good start.
  • getting to understand how python bindings works in general and the possible options. LLVM is exposing a C API that is then wrapped in python using ctypes. But there are other possibilities, for example LLDB is using swig. More recent approaches include clif and pybind11.

I’m interested in getting a good comparison of these options before moving forward with one, and I’m sure there are people on the community who have experience with some of these frameworks and have opinions on the pros/cons.

Thanks for your suggestion!
Actually, I’m following the Toy tutorial to understand MLIR concepts. Then I will try to understand those approaches of python bindings and figure out the difference between them. If there are further questions, I will discuss them in the community.

Thanks for your interest!

Please follow what @mehdi_amini suggested. @nicolasvasilache and I are listed as mentors for the project, don’t hesitate to ping us should you decide to work on this project.

A couple of other pointers: at some point, we explored using pybind11 to have Python APIs closer to the C++ ones, but it was focused on specific parts of MLIR and we did not push it further. The main hurdle for constructing the IR is the templated Op::build APIs that are non-trivial to replicate (on one hand, we don’t want to have to write new bindings code for every operation, on the other hand, using the “generic” Operation seems to verbose).

Also, I’ve stumbled upon GitHub - spcl/pymlir: Python interface for MLIR - the Multi-Level Intermediate Representation, which does not seem to be bindings but an MLIR parser implemented in Python.

1 Like

Thanks for your response!

I will try my best to get familiar with the corresponding knowledge about pybind11 and pymlir. If there are questions or ideas, I’m going to discuss them in the community. And the moment I’m ready for the project, I will contact you.

Hi @ftynse,

After having quick learning about Toy tutorial and pybind11, I come to understand the main hurdle for constructing the IR. In my opinion, when we define a Dialect, there are lots of corresponding operations. And for each operation, we should implement the templated Op::build APIs. As for the python bindings file, we should create a binding for each Op::build API in the PYBIND11_MODULE, which causes duplication of work.

Is my understanding correct? If I catch the point, how could I learn more about it? And I can’t find MLIR python bindings examples in the llvm-project, could I have a demo to try out the python bindings?


Yes, your evaluation goes in the right direction. Op::build methods are different for all operations and, furthermore, they are called indirectly through OpBuilder::create function template. Users are not expected to call *Op::build APIs themselves. In C++, we rely on templates to forward arguments from OpBuilder::create to the relevant Op::build function, but it is unclear how this can be achieved in Python.

Also, many of the Ops are generated from ODS (Operation Definition Specification (ODS) - MLIR), which we could try and use for generating Python bindings as well.

And I can’t find MLIR python bindings examples in the llvm-project, could I have a demo to try out the python bindings?

We only explored it, there is no publicly available code for the bindings, hence the open project. :wink:

Please consider also looking at different ways of exposing the bindings as @mehdi_amini mentioned

LLVM is exposing a C API that is then wrapped in python using ctypes. But there are other possibilities, for example LLDB is using swig. More recent approaches include clif and pybind11.

we experimented with pybind11 because that was the one we knew best.

I am so excited I find the right way! I also found that the ODS framework supports Dialect Operations with TableGen. And I’m considering how to emit the python bindings automatically, which is equivalent to OpBuilder::create and mlirGen().

I will keep learning different bindings to explore how to solve the problem. And could I apply for the GSoC 2020 with this project?

As Mehdi mentioned before, we would like to have a rationale on choosing which bindings library to use, and whether or not to generate the op-level bindings. (Personally, I would consider starting with the generic IR concepts such as Type and Operation before tacking Op-specific constructors). There are further potential issues with auto-generating bindings that I’ll let you discover by looking at how ODS works.

It is a bit too early to talk about GSoC. We will have to wait for Google to approve the participation of the LLVM organization in GSoC and allocate (or not) the slots to the organization. Then we will need to decide, within LLVM, how many of the slots are available for MLIR-related projects and ultimately which projects and candidates to accept. If the LLVM organization is accepted, the student application period will open on March 16. You can find a more detailed timeline here How it Works | Google Summer of Code. Personally, I will prefer to endorse an applicant who already contributed to the project before the application is due.

Hi @ftynse,

I have already learned the Toy Tutorial systematically, and I also know the design principle of MLIR. Meanwhile, I read several papers about python bindings to find some ideas, and I am learning the ODS framework now. Furthermore, I tried the pybind11 and embedded it into the Toy source code directory. Now I’m a little confused about the binding level, as you said before:

Is it means that we should start at binding the Operation class, like TransposeOp, rather than the specific functions TransposeOp::build?

Apart from that, I also want to know some problems of our project. What is the purpose of realizing the python binding? Which scene will our python binding result be used? As far as I know, after we achieve the goal of python binding, users can use MLIR infrastructure in python, but why they want to use the Operation information in a python application?


No it means bindings the MLIR generic class (Operation, Attributes, etc.) directly and not the specific dialects to begin with.

In general creating and manipulating IR without having to write C++ is something that is appealing when all you language is in python. It allows rapid prototyping and is more compelling for python programmers.
At some point someone have to bridge the python level to the native code, if we don’t have bindings then every single python project has to re-invest in their own.

OK, I got the point. It means that we should bind the libraries in the llvm-project/mlir/include/mlir/IR/ directory. And in this case, we can use python to realize dialects with the python bindings. Is that correct?

I’m not sure what you mean by “realize dialects”, but the point is being able to inspect or construct IR. Defining new operations and dialects from python is at a whole different level of complexity.

Sorry for my ambiguous words. I mean the libraries in the llvm-project/mlir/include/mlir/IR/ directory are responsible for operating the elements of dialects. If we bind those libraries, we can use python to realize the process when we want to inspect or construct IR.

For example, the TransposeOp in the Toy tutorial. I draw a sketch of relationship between the files:

As for the example above, we should bind the Builder.h and generate the shared library In this case, we can import the shared library and realize MLIRGen with python. Did I find the right binding point?

The suggested classes to look at were mentioned above:

Which are defined in,,

Thanks! I know the classes now, and I will try it out with those python binding tools.

There is another question about GSoC2020, I found MLIR is not in the technologies list in Google Summer of Code

But MLIR also belongs to the GSoC2020 project in

Is MLIR still in the GSoC2020 projects? Could I attend GSoC2020 with a proposal about MLIR python binding?

MLIR is a part of the LLVM project and is clearly listed in the description on the page you linked. There is a description with a list, point 5 is about MLIR. The “technologies” list seems there for cataloging, and I wouldn’t call clang a “technology” anyway.

Let me reiterate what I said above.

Please do consider following the advise from above

OK, I got it! Recently, I’m trying to make a summing up of the features and differences between those python binding tools.

Sorry for my beginning level question, because it is my first time applying for a GSoC project and an LLVM project. Is this mean I should contribute code with Phabricator before the application deadline or just run a python binding demo with a proposal?

There is no requirements.
However your application may be stronger if you show some contributions and it helps flesh out the project and the milestones.

MLIR is now a “technology” as well on GSOC :wink:

Hi @mehdi_amini @ftynse,

I am considering how to compare those python binding frameworks intuitively, and I try to build a repository on GitHub to illustrate pybind11’s capacity for MLIR python binding: GitHub - zhanghb97/pybind11_test: Test pybind11 with simple examples

First of all, I extracted MLIR requirements from Operation.h, Attributes.h, and Types.h. Then I searched for solutions and listed references in a table. After that, I used some simple examples to simulate requirements of MLIR, and realize the python bindings with pybind11. At last, test cases were built to verify those bindings are working successfully.

Is this a feasible way to test frameworks for our python binding project? If it’s feasible, I will test more frameworks (swig, clif, etc.) and compare their capacity. If it’s not a feasible way, could you please give some suggestions?