OpenCL example

Hello everyone,

I’d like to introduce a simple e2e MLIR example which runs on the GPU via OpenCL. (i.e., for non-nvidia GPUs)
There are some existing solutions that use OpenCL runtime to run a GPU kernel generated by MLIR, but not many are available in public (yeah, I’ve just googled again ‘mlir opencl’ and got nothing new) . So, I’ve made an example to easily try.
This doesn’t add any implementation to the MLIR core, instead MLIR compiles/passes opencl kernel binary and run it on python. It’s not a perfectly standard way and there could be more and better options.

There’s only one test available under examples/ folder now, it simply adds up two 1024xf32 memrefs, using linalg.elemwise_binary.
It lowers and parallel maps to the GPU threads using upstream passes. And lowers down to the llvm dialect via rocdl dialect so it can be used by the AMD rocm OpenCL stack. (It shouldn’t be hard to make a spirv pipeline for the Intel GPU)

I’ve encountered an issue using gpu-module-to-binary pass in python[1], used mlir-opt host tool for the last step and had to manually convert the stringfied GPU binary in python. Now the GPU executable is obtained, all done from MLIR in this example.

Finally it creates an OpenCL kernel from the binary and runs it on the GPU with the Numpy in/output data using pyopencl. (inputs are random)

(mlirdev) $ python3 ./ 
A     :  [0.45386615 0.15114269 0.76626986 ... 0.8001651  0.37151784 0.60125226]
B     :  [0.69195175 0.74433124 0.36322945 ... 0.03635408 0.842453   0.9205129 ]
Validating A + B ...
Numpy :  [1.1458179  0.89547396 1.1294993  ... 0.8365192  1.2139709  1.5217652 ]
GPU   :  [1.1458179  0.89547396 1.1294993  ... 0.8365192  1.2139709  1.5217652 ]
Pass  : True

Hope this helps and any question is welcome here.

[1] gpu-module-to-binary pass fails via python binding while it runs fine with mlir-opt
mlir._mlir_libs._site_initialize.<locals>.MLIRError: Failure while executing pass pipeline:
error: unknown: cannot be converted to LLVM IR: missing `LLVMTranslationDialectInterface` registration for dialect for op: gpu.module

The error is hinting at the issue “missing LLVMTranslationDialectInterface”, the mlir-opt tool is registering this translation interface but your python environment does not. This is done by calling mlir::registerGPUDialectTranslation.

1 Like

It doesn’t seem there’s an existing code path to mlir::registerGPUDialectTranslation from python binding at the moment.

Some random findings,

  1. Registereverything for the python binding looks different from what we can do in C-API mlir/lib/CAPI/RegisterEverything/RegisterEverything.cpp. It lacks of calling mlirRegisterAllLLVMTranslations but not sure if that’s the intended behaviour. Also calls to mlirRegisterConversionPasses() and mlirRegisterTransformsPasses() are redundant since underneath functions are already called in mlirRegisterAllPasses().

  2. I felt bit hacky while unsure where’s the best place for the function, tried to directly call C-API function from python as below and it works.

import os
import ctypes
from ctypes import *
from mlir._mlir_libs import get_lib_dirs
capi_so = os.path.join(get_lib_dirs()[0], "")
capi_functions = CDLL(capi_so)

ctx = Context()
ctx_capsule = ctx._CAPIPtr

MLIR_PYTHON_CAPSULE_CONTEXT = c_char_p("".encode())
ctypes.pythonapi.PyCapsule_GetPointer.restype = ctypes.c_void_p
ctypes.pythonapi.PyCapsule_GetPointer.argtypes = [ctypes.py_object, ctypes.c_char_p]
ctx_ptr = ctypes.pythonapi.PyCapsule_GetPointer(ctx_capsule, MLIR_PYTHON_CAPSULE_CONTEXT)
capi_functions.mlirRegisterAllLLVMTranslations.argtypes = [ctypes.c_void_p]
  1. I can’t clearly understand the usage of the _CAPIPtr capsule. All tests only check if they can recreate the same object from the capsule, I suppose that’s not all we can do with it. Is above case included in the expected use case?

It’s likely that “register everything” should also register the translations, feel free to send a PR!

1 Like

Hope it’s in the correct place.
Register all translations in the RegisterEverything for python by jungpark-mlir · Pull Request #70428 · llvm/llvm-project (