Request from a complete noob who is coming to LLVM/MLIR world. Asking for advice/suggestions

Hello

I am a complete noob to LLVM/MLIR world. My goal is to make transformation from python/C++ src code to CUDA and HLS C/C++ ( for Xilinx FPGA) language using MLIR.

First let me give you some idea about the current understandings I have

  1. Know how to write CUDA and HLS (for Xilinx FPGA) accelerators for HW environment. And also managing the environment.
  2. Basic understanding of cmake config. And also C++ applications.
  3. Python scripting and web application development.
  4. Very very high level (birds eye view) idea of how a compiler works. More like theory. Never built, or tested anything of such kind. Donot have Computer Science background at bachelor level.
  5. In recent days, setting up LLVM/MLIR for learning purpose. Work-in-progress.

Now What I am trying to do

  1. Have to work on accelerating algorithms (e.g. GEMM, Backpropagation, etc.)
  2. Select a src language (C++/Python) or ML framework (e.g. Pytorch, Tensorflow).
  3. Apply transformations (e.g. Polyhedral, Linalg) through MLIR for the algorithms.
  4. Spit out optimized kernel codes for CUDA and Xilinx HLS environment.

How I am planning to do

  1. First try to go through with Toy example and understand how MLIR works.
  2. Then try to play with dialect example.
  3. Search for such MLIR template which will help me to put my things on top of it.
  4. Develop.

Constraints I have

  1. Time
  2. and time. :unamused:

I have to develop a Proof-of-concept with just one algorithm (e.g. GEMM), for first CUDA and then HLS within 2 months. It is not mandatory that the transformations should be overly complex or highly optimized.

Now my questions are

  1. Is it doable/possible from my current level of skills/understanding? I know answering such question is tough. But atleast if there is hope, special way to grab ideas faster, I’m just trying to know.
  2. Do you have any advice or suggestion regarding my goal, to-do-list and given constraint?

Where I work, I am the first one who is going to work with such staffs. So I have no one to ask for guidance or advice or suggestions. And while I am going through the MLIR youtube channel, reading blogs, sometimes I am getting scared. I need some directions to go faster. :pray: :pray:

I donot know if such questions are appropriate here. If no, I will delete it.

Thanks to everyone in advance. Probably this post will be acting as lighthouse to someone who came with no prior knowledge at all.

Kind Regards.

Glad to hear your interest! I happen to have some familiarity with MLIR on the HLS side of things, so I can share some pointers to existing work in that area that might be an inspiration. Others might be able to chime in on the CUDA side.

ScaleHLS seems very close to what you’re trying to do: GitHub - hanchenye/scalehls: A scalable High-Level Synthesis framework on MLIR. This even includes emitting C++ suitable for Xilinx HLS flows.

SODA is also interesting in this area: GitHub - pnnl/soda-opt, but I am not as familiar with SODA.

There is also the CIRCT project, which I contribute to: GitHub - llvm/circt: Circuit IR Compilers and Tools. CIRCT can do lots of things related to hardware compilers, one of which is supporting HLS flows that want to use MLIR to directly emit Verilog (instead of C++ for HLS). There are various HLS flows, and you can bring your intermediate representation into CIRCT (after optimizations) at different levels of abstraction, and get out different kinds of hardware designs. This talk gives an overview: [LATTE 22] #2: HLS from PyTorch to System Verilog with MLIR and CIRCT - YouTube.

2 Likes

Thanks @mikeurbach for all the resources and your advice.

Actually I was trying to understand ScaleHLS for last couple of weeks. It’s bit challenging though :smiling_face_with_tear:

  1. I was also looking for resources that can help me to understand Polyhedral Transformation as a beginner. For example, I am seeing, in lot of research, engineers are using polyhedral mapping for their ops. I am also following the Youtube - MLIR channel to grab the ideas. Is there any other resources to help me to understand how to start implementing affine transformations in MLIR in a guided way?

  2. How to choose a frontend (e.g. Pytorch, Tensorflow, etc.)? Is there any guideline or it depends on the application? I am confused because both of them give us a powerful framework to run ML algorithms. So where is the difference?

Thanks in advance.

For 1., I am no polyhedral expert myself… I’m trying to remember what resources were helpful as I was getting started. I don’t remember where I went for the basics, but to see the kinds of transformations that are useful for HLS flows, AutoSA was useful. There are good examples in the paper, and if you squint, you can see how their nomenclature for iteration domains, access relations, and schedules could map to the MLIR Affine dialect. Not sure if you saw it, but there are some good talks at LLVM dev meetings about polyhedral analysis in MLIR, for example, this one. That talk mentioned loop fusion, so looking at how those kinds of transformations are implemented in MLIR might help.

For 2., I think the beauty of MLIR is you can focus your efforts where they make sense. If you want to innovate at the level of polyhedral transformations, you can work at the level of the Affine dialect, and whatever frontends can lower to the Affine dialect can work with your project. For example, there should be paths to the Affine dialect from both PyTorch and TensorFlow.

1 Like

Thank you very very much. :grinning:

I would definitely have a look on the resources that you shared.

BTW, more questions, if you dont mind. :sweat_smile:

  1. Is there any “simple” library/application/repo/online-webapp that can take a loop as input, and spit a loops with affine transformations? I am using “simple”, because I am seeing there are some solutions from different research lab, but they are too much complex to deal from the perspective of a beginner. e.g. APOLLO

  2. This is completely my understanding. Correct me if I am wrong. When engineers write affine transformation with MLIR, first they have to have the affine transformation map in hand. Then they just write the implementation in the dialect. Am I correct? If that so, how they have the affine transformations in the first place? They do it manually? Or, they use some kind of tool to do it? You shared one resource Precise Polyhedral Analyses For MLIR using the FPL Presburger Library, which is pretty much closer to the question I have asked. To clarify more, let me give you an example:

/* Source Loops */
/* Sequential */

for(i= 1; i < N; i++){
    for(j= 1; j < M; j++) {
        a[i][j] += a[i - 1][j] * a[i][j - 1];
    }
}
/* After affine skewing */
/* Parallel using OpenMP */
for(x=2; x<=N+M-2; x++){
    
    #pragma omp parallel for
    for(y = max(1, (x - N + 1)); y <= min(x - 1, M - 1); y++) {
        a[x - y][y] += a[x- y - 1][y] * a[x - y][y - 1];
    }
}

So the engineers do such kind of thing manually first? Or use any tool to do that?

I am not marking any of your answer as solution, because I want to keep this discussion going on. Just not for myself, but also for the noobs who are stepping in this world.

But if you want, I can mark your reply as solution. :pray:
Thanks!

I’m not sure if there is a simple playground to try these things out. I imagine you’d probably do some manual experiments with the kinds of transformations you can do first, and then encode them in a tool that applies the transformations for your use case. If you are interested in ultimately using MLIR for polyhedral transformations, I would suggest running some of the passes in the Affine dialect to see how they work. These transformations might ultimately be used via their C++ APIs in your tool, but you can try them out manually in mlir-opt. For example, you could look at tiling: https://github.com/llvm/llvm-project/blob/main/mlir/test/Dialect/Affine/loop-tiling.mlir#L1

1 Like

Absolutely wonderful idea!! :smiley: :smiley:
Thanks a lot. :pray:

1 Like

Related to going from C/C++ to MLIR, you could try VAST. There’s also Macroni, where we are playing with the idea of using C macros as lightweight syntax extensions to C / C++. Macroni, however, relies on PASTA, which uses a patched version of Clang, built as part of cxx-common, so it isn’t as easy to reuse/distribute.

1 Like

Thanks for your suggestion. :pray:

Now my objective is to automate polyhedral transformation with MLIR for generating Src code -> Polyhedral Transform -> Src code or Src code -> Polyhedral Transform -> Device bin (for GPU).

Right now I am trying to understand how Polygeist works in conjunction with MLIR. If I understood correctly, they have used 4 external libs to automate Polyhedral Transformation with MLIR. The possible transformation flow is something like, MLIR -> OpenScop -> Pluto -> OpenScop -> Cloog (Using of Cloog, not sure) -> MLIR.

Beg your pardon in advance for my noob question, can I achieve similar kind of thing faster + easier way by using the libs you have suggested? Or are there any other ways?

Thanks in advance!

Beg your pardon in advance for my noob question, can I achieve similar kind of thing faster + easier way by using the libs you have suggested? Or are there any other ways?

  1. Select a src language (C++/Python) or ML framework (e.g. Pytorch, Tensorflow).

If you want to do anything with C or C++ as your source language, then I believe VAST and ClangIR are the only things right now that can convert those to an MLIR representation. Both VAST and ClangIR are works-in-progress, though, and so the level of support for those languages and their features varies. VAST provides more high-level data in its IRs than ClangIR, but has much less support for C++. ClangIR has more support for C++, but the representation of data types is closer in spirit to those of LLVM types, I think.

1 Like