OpenCL example

jungpark · October 17, 2023, 2:35pm

Hello everyone,

I’d like to introduce a simple e2e MLIR example which runs on the GPU via OpenCL. (i.e., for non-nvidia GPUs)
There are some existing solutions that use OpenCL runtime to run a GPU kernel generated by MLIR, but not many are available in public (yeah, I’ve just googled again ‘mlir opencl’ and got nothing new) . So, I’ve made an example to easily try.
This doesn’t add any implementation to the MLIR core, instead MLIR compiles/passes opencl kernel binary and run it on python. It’s not a perfectly standard way and there could be more and better options.

There’s only one test available under examples/ folder now, it simply adds up two 1024xf32 memrefs, using linalg.elemwise_binary.
It lowers and parallel maps to the GPU threads using upstream passes. And lowers down to the llvm dialect via rocdl dialect so it can be used by the AMD rocm OpenCL stack. (It shouldn’t be hard to make a spirv pipeline for the Intel GPU)

I’ve encountered an issue using gpu-module-to-binary pass in python[1], used mlir-opt host tool for the last step and had to manually convert the stringfied GPU binary in python. Now the GPU executable is obtained, all done from MLIR in this example.

Finally it creates an OpenCL kernel from the binary and runs it on the GPU with the Numpy in/output data using pyopencl. (inputs are random)

(mlirdev) $ python3 ./memrefAdd.ocl.bin.rocm.py 
A     :  [0.45386615 0.15114269 0.76626986 ... 0.8001651  0.37151784 0.60125226]
B     :  [0.69195175 0.74433124 0.36322945 ... 0.03635408 0.842453   0.9205129 ]
Validating A + B ...
Numpy :  [1.1458179  0.89547396 1.1294993  ... 0.8365192  1.2139709  1.5217652 ]
GPU   :  [1.1458179  0.89547396 1.1294993  ... 0.8365192  1.2139709  1.5217652 ]
Pass  : True

Hope this helps and any question is welcome here.

[1] gpu-module-to-binary pass fails via python binding while it runs fine with mlir-opt

     pm.run(module.operation)
mlir._mlir_libs._site_initialize.<locals>.MLIRError: Failure while executing pass pipeline:
error: unknown: cannot be converted to LLVM IR: missing `LLVMTranslationDialectInterface` registration for dialect for op: gpu.module

mehdi_amini · October 17, 2023, 9:09pm

jungpark:

     pm.run(module.operation)
mlir._mlir_libs._site_initialize.<locals>.MLIRError: Failure while executing pass pipeline:
error: unknown: cannot be converted to LLVM IR: missing `LLVMTranslationDialectInterface` registration for dialect for op: gpu.module

The error is hinting at the issue “missing LLVMTranslationDialectInterface”, the mlir-opt tool is registering this translation interface but your python environment does not. This is done by calling mlir::registerGPUDialectTranslation.

jungpark · October 23, 2023, 1:23pm

It doesn’t seem there’s an existing code path to mlir::registerGPUDialectTranslation from python binding at the moment.

Some random findings,

Registereverything for the python binding looks different from what we can do in C-API mlir/lib/CAPI/RegisterEverything/RegisterEverything.cpp. It lacks of calling mlirRegisterAllLLVMTranslations but not sure if that’s the intended behaviour. Also calls to mlirRegisterConversionPasses() and mlirRegisterTransformsPasses() are redundant since underneath functions are already called in mlirRegisterAllPasses().
I felt bit hacky while unsure where’s the best place for the function, tried to directly call C-API function from python as below and it works.

import os
import ctypes
from ctypes import *
from mlir._mlir_libs import get_lib_dirs
capi_so = os.path.join(get_lib_dirs()[0], "libMLIRPythonCAPI.so")
capi_functions = CDLL(capi_so)

ctx = Context()
ctx_capsule = ctx._CAPIPtr

MLIR_PYTHON_CAPSULE_CONTEXT = c_char_p("mlir.ir.Context._CAPIPtr".encode())
ctypes.pythonapi.PyCapsule_GetPointer.restype = ctypes.c_void_p
ctypes.pythonapi.PyCapsule_GetPointer.argtypes = [ctypes.py_object, ctypes.c_char_p]
ctx_ptr = ctypes.pythonapi.PyCapsule_GetPointer(ctx_capsule, MLIR_PYTHON_CAPSULE_CONTEXT)
capi_functions.mlirRegisterAllLLVMTranslations.argtypes = [ctypes.c_void_p]
capi_functions.mlirRegisterAllLLVMTranslations(ctx_ptr)

I can’t clearly understand the usage of the _CAPIPtr capsule. All tests only check if they can recreate the same object from the capsule, I suppose that’s not all we can do with it. Is above case included in the expected use case?

mehdi_amini · October 24, 2023, 5:30am

It’s likely that “register everything” should also register the translations, feel free to send a PR!

jungpark · October 27, 2023, 1:50pm

Hope it’s in the correct place.
Register all translations in the RegisterEverything for python by jungpark-mlir · Pull Request #70428 · llvm/llvm-project (github.com)

Topic		Replies	Views
Searching for a opensource MLIR end-to-end Nvidia GPU lowering example project Beginners mlir	3	278	November 30, 2023
Any example project/code for mlir nvidia GPU offloader inside llvm-project? Or can I find it in any other place? Beginners mlir	0	103	January 18, 2024
MLIR omp.target for gpu offloading MLIR	23	719	November 8, 2023
GPU code generation status: NVidia, OpenCL MLIR	6	3189	October 23, 2020
Making linalg.matmul to GPU runnable code MLIR	6	1326	April 19, 2022

OpenCL example

Related Topics