How to pass Memref or Numpy arrays to ExecutionEngine.invoke through python binding?

Hi,

I am developing a dialect, and I am trying to create a C++ API in my dialect and expose it to python level.

Basically, I want to pass in a list of MLIR modules from python. For each MLIR module, I also pass in a list of input arguments (i.e., module_args) and invoke the execution engine one by one. Here is C++ API I am trying to expose through pybind11

static bool executeMods(
    std::vector<MlirModule> &modules,
    std::vector<std::vector<py::array_t<float>>> &module_args) {
      
      for (auto& mod: modules ) {
        auto mod = unwrap(mlir_mod);
        auto maybeEngine = mlir::ExecutionEngine::create(mod);
        assert(maybeEngine && "failed to construct an execution engine");
        auto &engine = maybeEngine.get();

        // How should I pass in args sent from python into execution engine here
        auto invocationResult = engine->invoke("top", .....);
    }

  return true;
}

.....
my_dialect_m.def("execute_multiple_mods", &executeMods);

My question is – how should I pass in these numpy arguments from python? I tried different ways but neither can actually work (the dialect can compile, but when calling this C++ API, I will run into a SegFault)

PS: When using MLIR ExecutionEngine’s python binding, I can use ctypes to create packed args like this. I tried to pass this packed_args into my C++ API, but it does not work.

ctypes_args = [ctypes.pointer(ctypes.pointer(
        get_ranked_memref_descriptor(arg))) for arg in numpy_args]

packed_args = (ctypes.c_void_p * len(ctypes_args))()
for argNum in range(len(ctypes_args)):
  packed_args[argNum] = ctypes.cast(ctypes_args[argNum], ctypes.c_void_p)

execution_engine.invoke(name, packed_args)

For the python solution, did you look at our tests? We have some examples here:

For the native one, if you invoke your C++ with numpy array you need to extract the data from there: 14.5. C++ and the Numpy C API — Python Extension Patterns 0.1.0 documentation
And then you can take inspiration from these unit-tests to see how to wrap this in Memref: llvm-project/Invoke.cpp at fdec50182d85ec0b8518af3baae37ae28b102f1c · llvm/llvm-project · GitHub

I’m not sure we have helpers to do this but it would be worth adding to our python native infra! (it may exists in torch-mlir otherwise)

@mehdi_amini thanks. the information is helpful. I was able to invoke execution engine in python. But I still need to get the C++ version of execution engine work in order to connect with rest of my project (written in C++)

I found a good example to follow here: llvm-project/ExecutionEngine.cpp at fdec50182d85ec0b8518af3baae37ae28b102f1c · llvm/llvm-project · GitHub

Here is the code I am using right now after referring to the examples and many rounds of trail and errors.

static bool runTaskFlowExecutor(
    std::map<std::string, MlirModule> &modules,
    std::map<std::string, std::vector<py::array_t<float>>> argsMap) {

      for (auto& [stage, mlir_mod]: modules ) {
        auto mod = unwrap(mlir_mod);
        mlir::registerLLVMDialectTranslation(*mod->getContext());

        auto maybeEngine = mlir::ExecutionEngine::create(mod);
        if (!maybeEngine)
          throw std::runtime_error("maybeEngine failed");

        auto engine = std::move(*maybeEngine);
        auto entryPoint = StringRef("top");
        auto expectedFPtr = engine->lookupPacked(entryPoint);
        if (!expectedFPtr)
          throw std::runtime_error("not found entryPoint top");

        auto modArgs = argsMap[stage];
        void** args = (void **) malloc(sizeof(void *) * modArgs.size());;

        size_t index = 0;
        for (auto tensor: modArgs) {
          py::buffer_info buf = tensor.request();
          args[index] = buf.ptr;
          index++;
        }

        // void (*fptr)(void **) = *expectedFPtr;
        // (*fptr)((void**)args);

        llvm::Error error = engine->invokePacked(entryPoint, 
            llvm::MutableArrayRef<void *>{args, (size_t)0});
        if (error)
          return false;
      }

     return false;
}

But I still run into this mysterious SegFault. Here is the stack trace. I cannot really understand the trace; it does not provide much useful information why invokePacked failed.

 #0 0x00007f048a9f8b3f PrintStackTraceSignalHandler(void*) Signals.cpp:0:0
 #1 0x00007f048a9f682c SignalHandler(int) Signals.cpp:0:0
 #2 0x00007f0512060630 __restore_rt sigaction.c:0:0
 #3 0x00007f05129e5096 
 #4 0x00007f048ae4d089 mlir::ExecutionEngine::invokePacked(llvm::StringRef, llvm::MutableArrayRef<void*>) (/scratch/users/sx233/hcl-dialect-prototype/build/tools/hcl/python_packages/hcl_core/hcl_mlir/_mlir
_libs/libHCLMLIRAggregateCAPI.so.15+0x15c6089)
 #5 0x00007f0488d459b2 runTaskFlowExecutor(std::map<std::string, MlirModule, std::less<std::string>, std::allocator<std::pair<std::string const, MlirModule>>>&, std::map<std::string, std::vector<pybind11::
array_t<float, 16>, std::allocator<pybind11::array_t<float, 16>>>, std::less<std::string>, std::allocator<std::pair<std::string const, std::vector<pybind11::array_t<float, 16>, std::allocator<pybind11::arr
ay_t<float, 16>>>>>>) //scratch/users/sx233/hcl-dialect-prototype/lib/Bindings/Python/HCLModule.cpp:192:0
 #6 0x00007f0488d955c3 bool pybind11::detail::argument_loader<std::map<std::string, MlirModule, std::less<std::string>, std::allocator<std::pair<std::string const, MlirModule>>>&, std::map<std::string, std
::vector<pybind11::array_t<float, 16>, std::allocator<pybind11::array_t<float, 16>>>, std::less<std::string>, std::allocator<std::pair<std::string const, std::vector<pybind11::array_t<float, 16>, std::allo
cator<pybind11::array_t<float, 16>>>>>>>::call_impl<bool, bool (*&)(std::map<std::string, MlirModule, std::less<std::string>, std::allocator<std::pair<std::string const, MlirModule>>>&, std::map<std::strin
g, std::vector<pybind11::array_t<float, 16>, std::allocator<pybind11::array_t<float, 16>>>, std::less<std::string>, std::allocator<std::pair<std::string const, std::vector<pybind11::array_t<float, 16>, std
::allocator<pybind11::array_t<float, 16>>>>>>), 0ul, 1ul, pybind11::detail::void_type>(bool (*&)(std::map<std::string, MlirModule, std::less<std::string>, std::allocator<std::pair<std::string const, MlirMo
dule>>>&, std::map<std::string, std::vector<pybind11::array_t<float, 16>, std::allocator<pybind11::array_t<float, 16>>>, std::less<std::string>, std::allocator<std::pair<std::string const, std::vector<pybi
nd11::array_t<float, 16>, std::allocator<pybind11::array_t<float, 16>>>>>>), std::integer_sequence<unsigned long, 0ul, 1ul>, pybind11::detail::void_type&&) && /work/shared/users/phd/sx233/conda/envs/py36/l
ib/python3.6/site-packages/pybind11/include/pybind11/cast.h:1441:0
 #7 0x00007f0488d8ed63 _ZNO8pybind116detail15argument_loaderIJRSt3mapISs10MlirModuleSt4lessISsESaISt4pairIKSsS3_EEES2_ISsSt6vectorINS_7array_tIfLi16EEESaISE_EES5_SaIS6_IS7_SG_EEEEE4callIbNS0_9void_typeERPF
bSB_SJ_EEENSt9enable_ifIXntsrSt7is_voidIT_E5valueESS_E4typeEOT1_ /work/shared/users/phd/sx233/conda/envs/py36/lib/python3.6/site-packages/pybind11/include/pybind11/cast.h:1410:0
 #8 0x00007f0488d842f5 void pybind11::cpp_function::initialize<bool (*&)(std::map<std::string, MlirModule, std::less<std::string>, std::allocator<std::pair<std::string const, MlirModule>>>&, std::map<std::
string, std::vector<pybind11::array_t<float, 16>, std::allocator<pybind11::array_t<float, 16>>>, std::less<std::string>, std::allocator<std::pair<std::string const, std::vector<pybind11::array_t<float, 16>
, std::allocator<pybind11::array_t<float, 16>>>>>>), bool, std::map<std::string, MlirModule, std::less<std::string>, std::allocator<std::pair<std::string const, MlirModule>>>&, std::map<std::string, std::v
ector<pybind11::array_t<float, 16>, std::allocator<pybind11::array_t<float, 16>>>, std::less<std::string>, std::allocator<std::pair<std::string const, std::vector<pybind11::array_t<float, 16>, std::allocat
or<pybind11::array_t<float, 16>>>>>>, pybind11::name, pybind11::scope, pybind11::sibling>(bool (*&)(std::map<std::string, MlirModule, std::less<std::string>, std::allocator<std::pair<std::string const, Mli
rModule>>>&, std::map<std::string, std::vector<pybind11::array_t<float, 16>, std::allocator<pybind11::array_t<float, 16>>>, std::less<std::string>, std::allocator<std::pair<std::string const, std::vector<p
ybind11::array_t<float, 16>, std::allocator<pybind11::array_t<float, 16>>>>>>), bool (*)(std::map<std::string, MlirModule, std::less<std::string>, std::allocator<std::pair<std::string const, MlirModule>>>&
, std::map<std::string, std::vector<pybind11::array_t<float, 16>, std::allocator<pybind11::array_t<float, 16>>>, std::less<std::string>, std::allocator<std::pair<std::string const, std::vector<pybind11::ar
ray_t<float, 16>, std::allocator<pybind11::array_t<float, 16>>>>>>), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&)::'lambda1'(pybind11::detail::function_call&)::operator()(pybind
11::detail::function_call&) const /work/shared/users/phd/sx233/conda/envs/py36/lib/python3.6/site-packages/pybind11/include/pybind11/pybind11.h:249:0

Did you look at the unit-test I sent above for how to use the execution engine with MemRef in C++?

I think something fairly key in the ABI with the JIT depends on the actual content of the module (MLIR module). Attributes like attributes { llvm.emit_c_interface } impacts the ABI as well.

I did. but those uni-test cases are not too helpful. The uni-test cases use OwningMemRef struct to instantiate an input MemRef, and finally use invoke(, memrefs...) to invoke

In my case, the number of input memrefs are variadic, so I switched to invokePacked instead.

Thanks for the info.

hmmm… I printed out the IR module in python before calling this C++ API. It looks good tho (it has that emit_c_interface attribute)

I was able to create an execution engine from that module and run it smoothly (completely in python side). But it becomes problematic once going into C++ side

Did you have a look at the sparse compiler integration tests, for PyTACO as well as for plain Python (for example, test_SpMM.py?). This suite contains many end-to-end tests that invoke the execution engine with dense and sparse parameters (so the former are of interest to you). I just noticed that the suite needs some cleaning up in the sense that we still build some IR through strings (which really should be building IR through our Python building API for everything), but that part is not relevant to your question.

This looks suspicious, the second argument to the MutableArrayRef constructor is the size of the array and zero is passed when args seems non-empty from the code above.

This also looks suspicious: invokePacked accepts a list of pointers to actual arguments. If the argument itself is a pointer, then you need to pass a pointer to pointer into invokePacked. The code seems to be just passing the pointer to data rather than the pointer to pointer to data.

Finally, memref is not (just) a pointer. The callee may expect more information, in particular the sizes and strides, either in the form of a struct or as separate arguments (see LLVM IR Target - MLIR). OwningMemRef and friends take care of this; bypassing them requires user code to take care of this.