Hello everyone,
I’d like to introduce a simple e2e MLIR example which runs on the GPU via OpenCL. (i.e., for non-nvidia GPUs)
There are some existing solutions that use OpenCL runtime to run a GPU kernel generated by MLIR, but not many are available in public (yeah, I’ve just googled again ‘mlir opencl’ and got nothing new) . So, I’ve made an example to easily try.
This doesn’t add any implementation to the MLIR core, instead MLIR compiles/passes opencl kernel binary and run it on python. It’s not a perfectly standard way and there could be more and better options.
There’s only one test available under examples/
folder now, it simply adds up two 1024xf32 memrefs, using linalg.elemwise_binary.
It lowers and parallel maps to the GPU threads using upstream passes. And lowers down to the llvm dialect via rocdl dialect so it can be used by the AMD rocm OpenCL stack. (It shouldn’t be hard to make a spirv pipeline for the Intel GPU)
I’ve encountered an issue using gpu-module-to-binary
pass in python[1], used mlir-opt
host tool for the last step and had to manually convert the stringfied GPU binary in python. Now the GPU executable is obtained, all done from MLIR in this example.
Finally it creates an OpenCL kernel from the binary and runs it on the GPU with the Numpy in/output data using pyopencl. (inputs are random)
(mlirdev) $ python3 ./memrefAdd.ocl.bin.rocm.py
A : [0.45386615 0.15114269 0.76626986 ... 0.8001651 0.37151784 0.60125226]
B : [0.69195175 0.74433124 0.36322945 ... 0.03635408 0.842453 0.9205129 ]
Validating A + B ...
Numpy : [1.1458179 0.89547396 1.1294993 ... 0.8365192 1.2139709 1.5217652 ]
GPU : [1.1458179 0.89547396 1.1294993 ... 0.8365192 1.2139709 1.5217652 ]
Pass : True
Hope this helps and any question is welcome here.
[1] gpu-module-to-binary
pass fails via python binding while it runs fine with mlir-opt
pm.run(module.operation)
mlir._mlir_libs._site_initialize.<locals>.MLIRError: Failure while executing pass pipeline:
error: unknown: cannot be converted to LLVM IR: missing `LLVMTranslationDialectInterface` registration for dialect for op: gpu.module