About MLIR polyhedral optimization


I am recently interested in polyhedral technology in mlir.

I had read this paper “High Performance Code Generation in MLIR: An Early Case Study with GEMM” – uday. In this paper, @bondhugula mentioned:

There are two ways of achieving this tiling in MLIR: one by calling [mlir::tile] (which is also what the loop tiling pass in MLIR uses), and then performing the desired loop interchange via mlir::interchangeLoops The other is by implementing a higher order polyhedral (HOP) approach based on domains and schedules. We use this latter approach here since MLIR’s affine analysis machinery does not yet have the necessary simplification to get rid of certain redundant bounds resulting from tiled code generation in advanced cases. The HOP approach depends on an external library, ISL, and we implement this as part of the -hopt pass.

Looking at the MLIR repo, it seems that MLIR tends to exclude ISL, and explore new approach to solve the code optimization/generation problem.

I have a few questions about that, and I am a new comer on this area. Any reply/suggestion would be much appreciated.

  • What is the current state of polyhedral code generation in MLIR ?

    • Are there some test cases or documentation about auto schedule ? – comparison with TensorComprehensions for example.
    • What is your current opinion about the code optimization/generation thoughts ?
  • What is the difference between MLIR new approach with the ISL approach ?

    • Or there is no essential difference ?
    • What is the problem of pluto + ISL approach which MLIR target to solve ?
  • What is the plan ?

    • Will there new algorithm based on new IR ? – comparison with pluto for example.
    • Will there be any auto schedule implementation ?
1 Like

isl is able to generate code given any arbitrary affine schedule. MLIR doesn’t have such a functionality. Instead, each affine transformation utility / pass in MLIR has the code generation step it needs built into it (say tiling, fusion, permute, unroll, unroll-and-jam, separate full and partial tiles, if/else hoisting) . This also means that each of those is limited in what it is able to currently handle when compared to a generic polyhedral code generator like ISL that can deal with any affine schedule. For eg., fusion currently doesn’t work with non-constant loop bounds, tiling of non-rectangular loop nests isn’t available, there is no code generator for loop skewing, permute for triangular loop nests isn’t available, etc.

There is currently no auto scheduler / auto transformation framework in MLIR. There is only infrastructure to allow you to build one.

1 Like

Thanks very much for the reply. This makes me more clear about this problem.

I am not very understanding about the design decision. Why MLIR design/implement different pass/utility separately? Will MLIR cover all ISL abilities?

Compared with ISL, is there any essecial difference in building such auto scheduler / auto transformation framework? What is the benefit of MLIR new approach?

What is the design boundary of MLIR polyhedral technology ? The passes / transformations in Linalg / Affine Dialect are awesome resources to this question. XLA is a good reference to study where MLIR is going, but any other alternative references?

Appreciated for the feedback.

First of all, we have a rationale document for the MLIR’s take on polyhedral compilation. It uses an extremely outdated syntax (basically ignore all code in that), but the arguments are still valid.

MLIR does not consider ISL since the question of including ISL into LLVM has been discussed repeatedly in the past, and the community was never supportive. ISL is tens of thousands of lines of mathematics-heavy C code written with coding standards very different from LLVM, so having somebody adapt, review and maintain that code seems almost as complex as writing everything from scratch.

Uday already said there is no auto-scheduling, but there is not that much polyhedral scheduling in TC either. The Linalg dialect is a spiritual successor to TC, developed mostly by the same people, and with some amount of glue, it should be possible to a search strategy similar to TC that combines tiling/fusion/specialization/device-mapping.

Everybody has their own :wink:

Mine (and currently predominant in the community) is that there is no “one size fits all” approach to optimization and we should have infrastructural support for optimization techniques ranging from peephole transformations on loops-as-instructions to transformation strategies a la Halide, to full-fledged automated polyhedral frameworks. You can look at Chris’ talk about this.

Not using isl.

Practically, most polyhedral source-to-source compilers cannot inspect the internals of a statement and treat it as, essentially, text. MLIR changes that and offers opportunities to combine polyhedral and more classical IR-based transformations. A trivial example is splitting a complex arithmetic expression into several with intermediate caches and pipelining.

Large-scale compilers are usually structured into passes. It is easier to develop, maintain, debug, replace and combine. One of the recurrent critiques of the polyhedral compilation is that it is essentially a black box that takes some code, returns some transformed code and gives you no understanding about it did to the code, why and what could be fixed if the code does not run as fast as you wanted to.

Isl is primarily a library for manipulating integer sets, as its name indicates. MLIR is a compiler infrastructure. It’s hard to compare the two. MLIR may eventually have an affine scheduler, or even an infrastructure for spawning various forms of schedulers. I am not aware of any mid-term plan implementing this upstream.

You have to write your own ILP solver. And a schedule data structure. :slight_smile: I’d say it’s easier to say what is common that what is different. The common part would ultimately be the representation of loop nests as polyhedra, and the way of casting the scheduling problem as an optimization problem. The rest is likely to be different in various ways.

To answer what the benefits are, we would need to do it first :wink: This sounds like a good research paper.

I wouldn’t say XLA is where MLIR is going, quite the inverse.

1 Like

Thanks for the detailed reply by @ftynse . It is very very helpful.

This argument is very reasonable. I have learned the difference / commonality between MLIR polyhedral expression and ISL schedule tree.

As far as I know, the Linalg / affine dialect has different objective from TC / affine scheduler. Is there existing / potential project based on MLIR polyhedral expression and target end-to-end code generation ? I mean, we already have the awnson infrastructure, but where are the users? As a deep learning compiler, XLA is a great user of MLIR basic infrastructure. What about others, especially on the road where MLIR is going? Or we are looking forward to it?

Thanks for teaching me that.

It depends on the perspective. The ultimate goal is to generate code that performs well without spending too much time on compilation. We are not there yet. For context, it took 16 years of research between Featurier’s formulation of Farkas lemma-based affine scheduler and Pluto that was doing enough transformations for performance. If we can do a similar step in 4 years, I would consider it an achievement…

Affine schedulers are also known to be expensive in terms of compile time, sometimes prohibitively so, which Linalg attempts to address. There is a growing trend in the literature on the polyhedral model that the model itself can be used and useful without affine scheduling, e.g. for memory/dependence analysis, task splitting, etc.

I am aware of some work-in-progress research that attempts source-to-binary compilation with various MLIR flows.

4 years…That will be a long trip. Let’s looking forward to it…

May I ask for some links or references about these topics?

Thanks again. I have got a large amount of information…

What specifically are you interested in? Affine scheduling? Uses of the polyhedral model that are not scheduling?

I am not at liberty to discuss the work-in-progress I am aware of, unless all concerned people agree.

I am interested in the new trend…

  • polyhedral model without affine scheduling, e.g. for memory/dependence analysis, task splitting, etc.
  • compilation with various MLIR flows.(or MLIR DL compiler users)

Some published references / links will be enough if existing…

Here’s a random assortment of recent things:

MLIR is relatively new for the academic publication turnaround, so there is not much yet. We don’t even have the MLIR paper published yet :wink: I expect diligent authors to cite the preprint [2002.11054] MLIR: A Compiler Infrastructure for the End of Moore's Law, you can monitor citations of that. I’ve also seen people citing the presentation MLIR Primer: A Compiler Infrastructure for the End of Moore’s Law – Google Research before the preprint was available. Here’s a recent preprint using MLIR [2005.13014] Domain-Specific Multi-Level IR Rewriting for GPU.

That is very kind of @ftynse . I need to dive into these references for a while. Thanks again.