How do you start matmul optimization with MLIR for my Nvidia GTX 1650?

First step in LLVM was to produce an LLVM IR using a command like

clang++ -O3 -emit-llvm -S matmul.cpp -o matmul.ll

but what is the first step in MLIR? How do I get my .cpp code appearing as a module with blocks with regions with ops (is that the correct hierarchy?) in the top dialect?

Do I have to rewrite my code in MLIR syntax myself?

Having a convo with an expert on LLVM, I was told:

Me:

“What is the point of MLIR if I cannot get MLIR representation of my own source code in Python, C++, etc?”

"to rewrite existing high-level language code (like Python or C++) directly into MLIR. Instead, the idea is to:

  • Use frontend compilers (like Clang for C++) to convert the high-level language code to LLVM IR.
  • Optionally, use tools (under active development) to translate LLVM IR to MLIR for advanced optimizations, especially those that LLVM might not handle well (like certain loop transformations or targeting specific accelerators).
  • Lower the optimized MLIR back to LLVM IR or directly to machine code for execution.
  1. Current State of Tooling: As of now, tooling for converting LLVM IR to MLIR and vice versa is in development. This means that for many practical applications, especially in common programming environments, MLIR is not yet a drop-in solution for optimization.
  2. Why MLIR Exists: The purpose of MLIR is to explore and enable optimizations that are difficult or impossible to express in LLVM IR or high-level languages. It’s particularly useful for research, compiler development, and specialized domains like high-performance computing, machine learning, and hardware design.
  3. Relevance to General Programming: For most developers, especially those working in high-level languages for general-purpose applications, MLIR might not yet offer direct benefits. Its advantages become apparent in specialized areas and advanced compiler research.

In summary, MLIR isn’t intended to replace traditional compilers or make you rewrite existing high-level language code into MLIR directly. Its strengths lie in specialized optimization scenarios and as a research tool in compiler technology. For general programming needs, traditional compilation and optimization workflows remain the go-to approach."

Is that the gist of it?

So for example, I learn information on how to optimize matmul for matrices of varying sizes on my GTX 1650, then the final MLIR is what I have to read and then go back and refactor my CUDA C++ or other code that I wrote the original matmul.cpp in using the MLIR as a blueprint?

Hey,

You are trying to compare:

  • Clang - a compiler, with
  • MLIR - a framework for building compilers.

It’s important to realise that these are two very different things :slight_smile: In particular, there’s no one canonical way to “compile” something with MLIR - it’s for you to decide what path to take. For matrix multiplication, you can either:

  • write it yourself using one of the dialects available upstream, or
  • use linalg.matmul.

In either case you will need “define” how that’s lowered to LLVM. For linalg.matmul there’s plenty of examples within MLIR, e.g.:

Note, it’s an example for CPU - I’m not familiar with GPU lowering pipelines. It the example that I shared, we used Transform Dialect - MLIR to define the “optimization” part of the compilation pipeline.

use tools (under active development) to translate LLVM IR to MLIR for advanced optimizations,

If possible, I’d avoid that :slight_smile: LLVM IR is “lower” level than MLIR. When lowering from some higher level representation to LLVM IR, you are loosing information. Lifting that to MLIR means trying to recover that information, but that’s challenging. It’s better to optimize as much as possible at higher levels and only then lower to LLVM IR. Of course, sometimes that’s not possible. Also, like you hinted, there are projects that can do this for you.

HTH,
-Andrzej

@banach-space thanks for the help! I found: https://polygeist.llvm.org/

but will check out those examples you linked to.

@banach-space already provided a lot of good info, just wanted to add this other good example in-tree for matmul on Hopper, it won’t work on your Turing-era GPU but it may serve a source of inspiration about how folks are playing with this.

1 Like