MLIR and different views

I am new to MLIR.

  1. CFG (Control Flow Graph)
    I assume MLIR can represent a program in CFG form, with block of code linked together with branches.
  2. AST (Abstract Syntax Tree)
    Generally the first step in compiling a program written by a person. It retains structure such as for/next and while loops etc.
    For example the CLANG compiler parse the program written by the user into an AST, and then lowers this to a CFG that LLVM uses.
  3. Data Flow Graph
    A view that shows the data flow at a high level and can allow for more efficient algorithms to be applied to it. For example recognizing different ways for the data set to be chopped up such that more work can be done in parallel or reducing the amount of data passed around needlessly.

Can MLIR work at all these levels?
I am interesting in passes that do progressive highering, as opposed to progressive lowering. I.e. Taking CFG and transforming into a higher data flow form to apply transformations before lowering it again.
or taking CFG and transforming it into an AST to make it more easily reusable.

Have a look at:

1 Like

If you are interested in raising, you can also take a look at this paper Progressive Raising in Multi-level IR - Inria which is the spiritual predecessor of Polygeist.

That being said, I’m not sure I see the interest of raising to AST specifically. It contains plenty of syntax-related information that sounds not very useful for transformations unless you want to reconstruct the high-level language from the IR. I am interested to learn more though.


In Verona [1], we did try to use MLIR as an AST, so we could do type inference, reification etc on it and, after trying it for a bit, we gave up.

This was 2 years ago, and MLIR had more cumbersome dialect types (which was a problem for us), but we also found that the overall infrastructure wasn’t suitable to replace an AST because of the weight it carries around.

An AST is usually a simple structure of lists of structures, that can be easily accessed via field or offset (depending on what’s typed or untyped) and you have the freedom to create whatever you want using the best model. Compilers can optimise that kind of code really well.

In contrast, MLIR needs multiple levels of declarations (table-gen, C++ structures), carry around contexts, lists, maps that aren’t relevant and “force” you to use attributes and operands that aren’t easy to optimise and don’t quite fit with the semantics you want/need.

Before Verona, in Knossos [2], we had a similar problem, where the transformations in MLIR were slower than the ones in direct hand-coded LISP IR, because the field accessors and how to navigate the graph was cumbersome and needed multiple levels of indirection / function calls. For reinforcement learning, the number of cheap transforms you can do per second really matter.



I also feel that while you can write your AST with MLIR, it does not mean you should: it’ll likely be quite unpleasant.