How to transfer data to GPU?

In the file memref.cpp, I used the function ‘_mlir_ciface_main_graph’ which was defined in lenet-2.host.so. I complied it with the instruction “g++ --std=c++11 -O3 memref.cpp ./lenet-2.host.so -o lenet-2.noentry -I $ONNX_MLIR_INCLUDE”. When I run ‘lenet-2.host’, the following errors occured :

‘cuMemHostRegister(ptr, sizeBytes, 0)’ failed with ‘CUDA_ERROR_INVALID_VALUE’
‘cuMemHostRegister(ptr, sizeBytes, 0)’ failed with ‘CUDA_ERROR_INVALID_VALUE’
‘cuMemAlloc(&ptr, sizeBytes)’ failed with ‘CUDA_ERROR_ILLEGAL_ADDRESS’
‘cuModuleLoadData(&module, data)’ failed with ‘CUDA_ERROR_ILLEGAL_ADDRESS’
‘cuModuleGetFunction(&function, module, name)’ failed with ‘CUDA_ERROR_INVALID_HANDLE’
‘cuLaunchKernel(function, gridX, gridY, gridZ, blockX, blockY, blockZ, smem, stream, params, extra)’ failed with ‘CUDA_ERROR_INVALID_HANDLE’
‘cuModuleUnload(module)’ failed with ‘CUDA_ERROR_INVALID_HANDLE’
‘cuMemAlloc(&ptr, sizeBytes)’ failed with ‘CUDA_ERROR_ILLEGAL_ADDRESS’
‘cuMemAlloc(&ptr, sizeBytes)’ failed with ‘CUDA_ERROR_ILLEGAL_ADDRESS’
‘cuModuleLoadData(&module, data)’ failed with ‘CUDA_ERROR_ILLEGAL_ADDRESS’
‘cuModuleGetFunction(&function, module, name)’ failed with ‘CUDA_ERROR_INVALID_HANDLE’
‘cuLaunchKernel(function, gridX, gridY, gridZ, blockX, blockY, blockZ, smem, stream, params, extra)’ failed with ‘CUDA_ERROR_INVALID_HANDLE’
‘cuModuleUnload(module)’ failed with ‘CUDA_ERROR_INVALID_HANDLE’
‘cuMemAlloc(&ptr, sizeBytes)’ failed with ‘CUDA_ERROR_ILLEGAL_ADDRESS’
‘cuModuleLoadData(&module, data)’ failed with ‘CUDA_ERROR_ILLEGAL_ADDRESS’
‘cuModuleGetFunction(&function, module, name)’ failed with ‘CUDA_ERROR_INVALID_HANDLE’
‘cuLaunchKernel(function, gridX, gridY, gridZ, blockX, blockY, blockZ, smem, stream, params, extra)’ failed with ‘CUDA_ERROR_INVALID_HANDLE’
‘cuModuleUnload(module)’ failed with ‘CUDA_ERROR_INVALID_HANDLE’
‘cuStreamSynchronize(stream)’ failed with ‘CUDA_ERROR_ILLEGAL_ADDRESS’
‘cuStreamDestroy(stream)’ failed with ‘CUDA_ERROR_ILLEGAL_ADDRESS’
‘cuMemHostRegister(ptr, sizeBytes, 0)’ failed with ‘CUDA_ERROR_ILLEGAL_ADDRESS’
‘cuMemHostRegister(ptr, sizeBytes, 0)’ failed with ‘CUDA_ERROR_ILLEGAL_ADDRESS’
‘cuStreamCreate(&stream, CU_STREAM_NON_BLOCKING)’ failed with ‘CUDA_ERROR_ILLEGAL_ADDRESS’
‘cuMemAlloc(&ptr, sizeBytes)’ failed with ‘CUDA_ERROR_ILLEGAL_ADDRESS’
‘cuMemAlloc(&ptr, sizeBytes)’ failed with ‘CUDA_ERROR_ILLEGAL_ADDRESS’
‘cuModuleLoadData(&module, data)’ failed with ‘CUDA_ERROR_ILLEGAL_ADDRESS’
‘cuModuleGetFunction(&function, module, name)’ failed with ‘CUDA_ERROR_INVALID_HANDLE’
‘cuLaunchKernel(function, gridX, gridY, gridZ, blockX, blockY, blockZ, smem, stream, params, extra)’ failed with ‘CUDA_ERROR_INVALID_HANDLE’
‘cuModuleUnload(module)’ failed with ‘CUDA_ERROR_INVALID_HANDLE’
‘cuStreamSynchronize(stream)’ failed with ‘CUDA_ERROR_INVALID_HANDLE’
‘cuStreamDestroy(stream)’ failed with ‘CUDA_ERROR_INVALID_HANDLE’
Segmentation fault (core dumped)

I tried ‘gpu.host_register’ on the inputs of function ‘_milr_ciface_main_graph’, and got the following errors:

lenet-2.host: /data/login_home/licc/llvm-project/mlir/lib/ExecutionEngine/CudaRuntimeWrappers.cpp:189: void mgpuMemHostRegisterMemRef(int64_t, StridedMemRefType<char, 1>*, int64_t): Assertion `strides[i] == denseStrides[i] && “Mismatch in computed dense strides”’ failed.
Aborted (core dumped)

Function ‘_milr_ciface_main_graph’ is defined as :

llvm.func @_mlir_ciface_main_graph(%arg0: !llvm.ptr<struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)>>, %arg1: !llvm.ptr<struct<(ptr, ptr, i64, array<4 x i64>, array<4 x i64>)>>) attributes {input_names = [“image”], output_names = [“prediction”]}

the contents of memref.cpp :

include

#include
using namespace std;
template<typename T, size_t N>
struct MemRefDescriptor {
T *allocated;
T *aligned;
intptr_t offset;
intptr_t sizes[N];
intptr_t strides[N];
};
extern “C” void _mlir_ciface_main_graph(MemRefDescriptor<float,2> output, MemRefDescriptor<float,4> input);
float img_data={…}
float prediction[10] = {0,0,0,0,0,0,0,0,0,0};
intptr_t offset = 1;
MemRefDescriptor<float, 4> input{img_data, img_data, offset, {1,1,28,28},{1,1,1,1}};
MemRefDescriptor<float, 2> output{prediction, prediction, offset, {1,10},{1,1}};
int main(){
_mlir_ciface_main_graph(output, input);
for (int i = 0; i < 10; i++)
cout << prediction[i];
return 0;
}
So I guess the input argument was wrong. Could someone tell me how to fix it ?

You create your MemRefDescriptor with all strides being 1. This is likely not what you are trying to express. For the identity layout, the strides should be the product of sizes. So for a memref of rank k, you would have the i-th stride be the product of sizes from i+1 to k-1.

In your example for the memref with size 1, 10 you would have strides 10, 1. For sizes 1, 1, 28, 28 you would have 1 * 28 * 28, 28 * 28, 28, 1 as strides.

Thank you for your help!