Hello
I am searching for an end-to-end opensource (from github or, gitlab) project/example repo, in which it is demonstrated:
- Take simple C or Python code as input for a sample operation (e.g. vector add). Or even take just MLIR (i.e.
.mlirfile) as input - Then progressively lowered using different features of
GPU dialectto exploit Nvidia GPU HW resources. - Finally the
cubingeneration.
I would be really grateful, if someone shares repository like this.
There are Tensorflow or Pytorch implementations available. But I want something more “simple” to understand, how a custom end-to-end GPU implementation (Nvidia or even AMDGPU) works through MLIR compilation pipeline.
Also I share another context, from which I am searching for such thing. Let’s see the following code example (shared from here)
// example.cu ()
__global__ void fill(int n, float a, float *x) {
int i = threadIdx.x + blockIdx.x * blockDim.x;
if (i < n)
x[i] = a;
}
int main(void) {
int n = 1 << 20;
float *x;
cudaMallocManaged(&x, n * sizeof(float));
fill<<<(n + 255) / 256, 256>>>(n, 2.0f, x);
cudaDeviceSynchronize();
cudaFree(x);
}
All the information I can find from MLIR doc, focuses on MLIR features only for GPU kernel (e.g. fill kernel in example code). And some information on GPU offloading such as, how can I call this kernel from host. But it would be really nice, if I could see an example GPU offloading repository showing how can I use MLIR for managing memory buffers similar to cudaMalloc, number of block or threads from the host, etc.
In simple words, I am searching for something, from which I can have a compact idea on, how can I built an end-to-end pipeline.
Thanks in advance! ![]()