How to run ExecutionEngine with bf16 dtype in MLIR python bindings?

when invoke ExecutionEngine in python, I noticed we need to convert numpy array to ctypes_args, for example:

 # Compile.
    engine = compiler.compile_and_jit(module)

    # Set up numpy input and buffer for output.
    a = np.array(
        [
            [1.1, 2.1, 3.1, 4.1, 5.1, 6.1, 7.1, 8.1],
            [1.2, 2.2, 3.2, 4.2, 5.2, 6.2, 7.2, 8.2],
            [1.3, 2.3, 3.3, 4.3, 5.3, 6.3, 7.3, 8.3],
            [1.4, 2.4, 3.4, 4.4, 5.4, 6.4, 7.4, 8.4],
            [1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5],
            [1.6, 2.6, 3.6, 4.6, 5.6, 6.6, 7.6, 8.6],
            [1.7, 2.7, 3.7, 4.7, 5.7, 6.7, 7.7, 8.7],
            [1.8, 2.8, 3.8, 4.8, 5.8, 6.8, 7.8, 8.8],
        ],
        np.float64,
    )
    b = np.ones((8, 8), np.float64)
    c = np.zeros((8, 8), np.float64)

    mem_a = ctypes.pointer(ctypes.pointer(rt.get_ranked_memref_descriptor(a)))
    mem_b = ctypes.pointer(ctypes.pointer(rt.get_ranked_memref_descriptor(b)))
    mem_c = ctypes.pointer(ctypes.pointer(rt.get_ranked_memref_descriptor(c)))

    # Allocate a MemRefDescriptor to receive the output tensor.
    # The buffer itself is allocated inside the MLIR code generation.
    ref_out = rt.make_nd_memref_descriptor(2, ctypes.c_double)()
    mem_out = ctypes.pointer(ctypes.pointer(ref_out))

    # Invoke the kernel and get numpy output.
    # Built-in bufferization uses in-out buffers.
    engine.invoke("main", mem_out, mem_a, mem_b, mem_c)

My question is how to handle the mlir with bf16 input when bf16 is not supported by numpy and _ctypes.

I’ve heard tell of people having success with this in some fashion: GitHub - jax-ml/ml_dtypes: A stand-alone implementation of several NumPy dtype extensions used in machine learning.

My group doesn’t use the ExecutionEngine for anything real, so I couldn’t say beyond that.

@stellaraccident Thanks, I also noticed this, but seems it can not fix the ctype issue, like the following code and we do not have a ctypes.c_bfloat16in python

 ref_out = rt.make_nd_memref_descriptor(2, ctypes.c_double)()

Yeah, I don’t know. Haven’t looked at that ExecutionEngine code in years, and it is really just a testing tool.

I did approve torch-mlir taking a deep in the ml_types library a while back for this reason. I think someone got this to work there, but the mechanism is different. Might be a lead though.

@stellaraccident
hi, there
I haven’t found any PR related to fixing this in different mechanism yet. and I have a simple fix for this with the help of the ml_dtypes. I think introducing this third-party library is reasonable in the current situation where mlir Python relies on numpy but numpy is missing bf16.
Could you help review this this? [MLIR][Python] add ctype python binding support for bf16 by xurui1995 · Pull Request #92489 · llvm/llvm-project · GitHub