Maybe what I’m about to ask for is completely out of the scope of MLIR, I’m a bit lost about what MLIR should and should not do.
I’ve worked with passes in LLVM and I used to use clang to generate LLVM IR from C/C++ code. Then I used LLVM passes to work with that IR. However, in the case of MLIR, it looks like there is no way in which you can generate MLIR IR from C/C++ code. Maybe this is because MLIR is not focused on that kind of thing, as LLVM is. In the toy example, a new language (called toy) is defined, which is later transformed in the toy IR, so it starts working with MLIR from a custom language, not a standard one like C/C++.
The only way that I can think of to accomplish this is to generate LLVM IR from C/C++ code, and since MLIR has an LLVM dialect, one could work from this starting point to work on successive transformations and lowering. But to me, it looks like this is discouraged because the whole point is to get to LLVM IR by lowering the previous IR so this final LLVM IR can be finally translated into machine code.
Is it possible to use C/C++ as “source language”, or is it needed to define custom language and semantics to work with MLIR?
From what I understand, MLIR was designed so that you could build a hypothetical C/C++ compiler on top of MLIR, and it integrated many learnings from
clang development. That said, I am not aware of a C/C++ compiler utilizing MLIR for the pieces currently implemented in
clang (the AST and related transforms being one of the big pieces), this would be cool though!
Depending on your use case, you may be interested in generating MLIR from a DSL embedded in C++, for which there are some facilities. Such an effort would share some design space with current binding efforts in Swift and Python, but a C++ variant would benefit from not having to go through the C bindings.
Alright, thanks for your thoughts.
So another point of view in this LLVM/MLIR comparison could be:
- LLVM: clang (compiler) → LLVM IR → executable
- MLIR: MLIR IR → LLVM IR → executable
Where MLIR lacks of a C/C++ to IR compiler (which could be cool, as you said) and MLIR IR is designed to be multi-level (which is only one of many differences of those IR).
@wsmoses, @kumasento, @ftynse, and I built up a tool to enter the MLIR lowering pipeline from C or C++ source code. At the moment, we walk the Clang AST and emit proper MLIR constructs; in the future, we will go for a more dialect approach where C or C++ constructs are modeled within MLIR as a dialect (contributions and ideas are welcome in this direction).
See GitHub - wsmoses/Polygeist and https://github.com/wsmoses/MLIR-GPU/tree/main/mlir/tools/mlir-clang on how to install.
The “core” MLIR living in the LLVM monorepo doesn’t have a C++ compiler indeed. Writing one is non-trivial and although there are many folks who expressed interest, nobody has had the time to invest in doing it in production mode so far.
There is an experimental project mentioned above that emits a combination of LLVM, Standard, SCF and Affine dialects.
There are two high-level points I want to make regarding this question:
- There is no “MLIR IR”, or at least not in the sense there is LLVM IR; MLIR lets you define arbitrary collections of attributes, types and operations and convert between them.
- MLIR was designed to cover the representation domains that are not already covered by LLVM IR. Both at a higher level (e.g., TOSA is at a significantly higher abstraction level than C) and at a lower level (I know of projects that use MLIR to emit target-specific ISA). Many languages afford a higher-level abstraction to perform language-specific transformations. Those could benefit from MLIR and that’s why there are DSLs targeting different dialects. Working exactly at the same level as LLVM IR sounds meaningless at this point, LLVM does it perfectly fine. So emitting only the LLVM dialect from Clang AST will only give you overhead, there are virtually no transformations MLIR does at that level.
Thanks for your clarifications, ftynse. I find quite interesting both of your two points.
I probably have explained myself badly about the “MLIR IR” concept. It’s important to note that, as you said, there is not “MLIR IR” as such. Sorry for the confusion.
I have a much more clear vision about MLIR and now I have my question answered
I think there is a use case for supporting clang dialect in mlir. We can enable HLS like optimization targeting custom hardware like ASICs and FPGAs.
One such example is, transforming computer vision applications written in c++ progressively into clang dialect and then into hw dialect(from circt project) and eventually target fpga’s and asic.
Just a food for thought!
Some of us are doing exactly that with the Polygeist project right now. If you’re curious, the most recent CIRCT design meeting had a brief summary of the current status. I can’t speak to anyone’s plans for a Clang dialect, but the MLIR that Polygeist emits (Affine, SCF, and Standard dialects, among others) is already quite useful for HLS.
I think MLIR is actually not recognized as concrete IR. Instead, it is an infrastructure to help to construct IR which is called Dialect. So I think the problem is there is not a dialect to become high level IR to express c/c++ feature better than LLVM IR used directly. We need design a high level IR above LLVM IR, and it can be lowered to LLVM IR. Or the existing dialects are good enough to express C/C++ semantics so that we can combine those dialect to express different language feature in C/C++.
High level IR can make some optimization passes easier to write and apply because the grain of LLVM IR is too low to analyze and transform better. Every kind of pass has its better abstract level to do.
As other modern programming language such as Swift/Rust, the lowering path contains high level IR such as SIR/RIR before LLVM IR. So it may called CIR to express C/C++ language features. Does anybody know such project to construct CIR?
I think the key point is the design or spec doc to describe the IR concretely and clearly. Is there any docs about Polygesit IR description?
There is a paper - Polygeist: Raising C to Polyhedral MLIR | IEEE Conference Publication | IEEE Xplore. Note that as @mikeurbach said above, Polygeist emits (mostly) a composition of upstream MLIR dialects which are documented here Dialects - MLIR. There is only a handful of operations proper to Polygeist, most of which are related to the memref/pointer interfacing (casts) and pointer-style address calculation on memrefs (
Documentation can always be improved
Swift is going AST → SIL → LLVM
Rust is going AST → a couple of IRs → LLVM
Flang is going AST → FIR+OpenMP+OpenACC → LLVM
Clang is actually C languages. I would expect to go from the AST to OpenMPIL, OpenACCIL, OpenCLIL, CudaIL, CPPIL, CIL, and more. Are ABIs passes or dialects?
Do the Cuda and OpenCL mode have different sets of passes?
Do you lower ACLE SVE intrinsics to the CIL, vector, or SVE dialect?
Do you always go through LLVM or could you use the SPIR-V infrastructure?