Converting CUDA program to mlir (gpu, linalg etc.)

RSCERE · May 15, 2025, 6:07pm

Hi all,

I am trying to convert a CUDA program (.cu) that does vector addition. I was able to lower the CUDA (.cu) to llvm (.ll) and then to llvm-mlir using clang and mlir-translate. But I need to lower it to mlir with other dialects like linalg or gpu etc. Is there any way of doing so? Thanks!

PS: I tried Polygiest but was not able to figure out.

bcardosolopes · May 19, 2025, 8:32pm

ClangIR (incubator, upstream hasn’t reached that point yet) has support to go from CU → CIR → LLVM, see tests in clangir/clang/test/CIR/CodeGen/CUDA at main · llvm/clangir · GitHub

Since ClangIR also has a CU → CIR → Core Dialects pipeline, you could try playing with that and see what you can get (note sure anyone has tested that yet though).

ftynse · May 20, 2025, 7:22am

Also @wsmoses @chelini for Polygeist, it should support CUDA. It would have been more helpful if you specifically explained what you tried and what the problem was.

RSCERE · May 20, 2025, 8:38am

Thank you. Will explore it.

RSCERE · May 20, 2025, 8:44am

Sure, so I did this using Polygeist:

One of the .cu file:

#include <stdio.h>
__global__ void hello() {
    printf("Hello from GPU!\n");
}
int main() {
    hello<<<1, 1>>>();
    cudaDeviceSynchronize();
    return 0;
}

Polygeist build commands:

cmake -G Ninja ../llvm \
  -DLLVM_ENABLE_PROJECTS="mlir;clang" \
  -DLLVM_TARGETS_TO_BUILD="X86;NVPTX" \
  -DMLIR_ENABLE_CUDA_RUNNER=ON \
  -DMLIR_ENABLE_EXECUTION_ENGINE=ON \
  -DLLVM_ENABLE_RTTI=ON \
  -DLLVM_ENABLE_ASSERTIONS=ON \
  -DCMAKE_BUILD_TYPE=Debug \
  -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc \
  -DCMAKE_CUDA_TOOLKIT_INCLUDE_DIRECTORIES=/usr/local/cuda/include

And,

cmake -G Ninja .. \
  -DMLIR_DIR=$PWD/../llvm-project/build/lib/cmake/mlir \
  -DCLANG_DIR=$PWD/../llvm-project/build/lib/cmake/clang \
  -DPOLYGEIST_ENABLE_CUDA=1 \
  -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc \
  -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda \
  -DLLVM_ENABLE_ASSERTIONS=ON \
  -DLLVM_ENABLE_RTTI=ON \
  -DCMAKE_BUILD_TYPE=Debug \
  -DCMAKE_CUDA_TOOLKIT_INCLUDE_DIRECTORIES=/usr/local/cuda/include

After building the cgeist and polygeist-opt, I run:

Exp/Polygeist/build/bin/cgeist --immediate --cuda-gpu-arch=sm_86 --cuda-path=/usr/lib/cuda  --nocudainc  -I/usr/lib/gcc/x86_64-linux-gnu/11/include -I/usr/include test2.cu

I get this error:

test2.cu:4:1: error: unknown type name '__global__'
    4 | __global__ void hello() {
      | ^
test2.cu:9:10: error: use of undeclared identifier __cudaPushCallConfiguration
    9 |     hello<<<1, 1>>>();
      |          ^
test2.cu:10:5: error: use of undeclared identifier 'cudaDeviceSynchronize'
   10 |     cudaDeviceSynchronize();
      |     ^

RSCERE · May 20, 2025, 1:37pm

Got it working. Had to set the environment variable as export CPLUS_INCLUDE_PATH=Exp/Polygeist/llvm-project/build/lib/clang/18/include and then ran Exp/Polygeist/build/bin/cgeist test2.cu --cuda-gpu-arch=sm_86 --cuda-path=/usr/local/cuda --nocudalib -I/usr/local/cuda/include -I/usr/lib/gcc/x86_64-linux-gnu/11/include --immediate

The IR consists of llvm, arith, scf, gpu etc. dialects though.

ftynse · May 21, 2025, 6:56am

I was going to say --nocudainc looks suspicious, you don’t want to remove CUDA definitions from consideration. You may also need to pass --resource-dir to cgeist explicitly by first querying it from clang --print-resource-dir using the clang you built.

This is correct. What did you expect?

RSCERE · May 21, 2025, 7:43am

Ok, will run with the modified flags. I was expecting these only except the llvm one. Thanks.

RSCERE · May 21, 2025, 8:04am

Ran with the modified flags, it generates the same code with or without the --nocudalib, I added the --resource-dir. Also, Polegeist dialect op also appears.

ftynse · May 21, 2025, 8:22am

nocudalib disables linking with libdevice, so it will not have any effect on compilation itself. The original compilation string had nocudainc, which disables inclusion of system-level CUDA headers leading to undeclared symbol errors.

LLVM dialect remains for things that cannot be easily (or at all) converted to a higher-level dialect. This proejct GitHub - EnzymeAD/Enzyme-JAX: Custom Bindings for Enzyme Automatic Differentiation Tool and Interfacing with JAX. has more raising passes, namely -affine-cfg (which incidentally does other kinds of raising) and -libdevice-funcs-raise that converts libdevice calls to arith/math dialect.

Topic		Replies	Views
Request from a complete noob who is coming to LLVM/MLIR world. Asking for advice/suggestions MLIR gpu , clang	9	640	August 23, 2023
'cuStreamSynchronize(stream)' failed with 'CUDA_ERROR_ILLEGAL_ADDRESS' on MLIR CUDA regression test MLIR	2	899	November 8, 2021
MLIR GPU libdevice linking support MLIR gpu	12	935	May 31, 2023
[RFC] Extending MLIR GPU device codegen pipeline MLIR gpu , llvm	65	4292	October 30, 2023
Polygeist and OpenCL MLIR gpu , clang	3	606	June 24, 2022

Converting CUDA program to mlir (gpu, linalg etc.)

Related topics