Searching for a opensource MLIR end-to-end Nvidia GPU lowering example project

Hello

I am searching for an end-to-end opensource (from github or, gitlab) project/example repo, in which it is demonstrated:

  1. Take simple C or Python code as input for a sample operation (e.g. vector add). Or even take just MLIR (i.e. .mlir file) as input
  2. Then progressively lowered using different features of GPU dialect to exploit Nvidia GPU HW resources.
  3. Finally the cubin generation.

I would be really grateful, if someone shares repository like this.

There are Tensorflow or Pytorch implementations available. But I want something more “simple” to understand, how a custom end-to-end GPU implementation (Nvidia or even AMDGPU) works through MLIR compilation pipeline.

Also I share another context, from which I am searching for such thing. Let’s see the following code example (shared from here)

// example.cu ()
__global__ void fill(int n, float a, float *x) {
  int i = threadIdx.x + blockIdx.x * blockDim.x;
  if (i < n)
    x[i] = a;
}
int main(void) {
  int n = 1 << 20;
  float *x;
  cudaMallocManaged(&x, n * sizeof(float));
  fill<<<(n + 255) / 256, 256>>>(n, 2.0f, x);
  cudaDeviceSynchronize();
  cudaFree(x);
}

All the information I can find from MLIR doc, focuses on MLIR features only for GPU kernel (e.g. fill kernel in example code). And some information on GPU offloading such as, how can I call this kernel from host. But it would be really nice, if I could see an example GPU offloading repository showing how can I use MLIR for managing memory buffers similar to cudaMalloc, number of block or threads from the host, etc.

In simple words, I am searching for something, from which I can have a compact idea on, how can I built an end-to-end pipeline.

Thanks in advance! :slightly_smiling_face:

What about our integration tests? Like this one for example: https://github.com/llvm/llvm-project/blob/main/mlir/test/Integration/GPU/CUDA/printf.mlir

1 Like

Wonderful!!! :innocent:
Thanks a lot! :heart: :heart: :heart:
Just bothering you little more with some related question

  1. Should I use the latest MLIR/LLVM release (e.g. now 17.0.6) to get all the new features for MLIR + GPU dialect?
  2. Could you please provide some expert suggestions for newbies like me? Such as, things to consider, things to-do/not-to-do while developing this GPU compilation pipeline?

Thanks again :pray:

Probably the current main branch, depending what you’re looking for (Hopper support is steadily progressing for example, Sparse compilation as well).

Sorry: this is a bit vague of a question for me to answer right now :slight_smile:
Maybe others will chime in?