How to handle host-side global data automatically when lowering to GPU with MLIR?

lilil · January 15, 2025, 7:28am

When using MLIR to generate GPU code, I noticed that many host-side global data are passed to gpu.launch_func as input parameters, which seems to cause some issues.（Someone kindly answered this question for me before.）

After examining examples in the test directory, I found that this issue appears to be resolved only by manually adding data transfers from host side to device side. Is there a way to avoid manually modifying auto-generated IR, perhaps through some passes or other mechanisms?

[Additional context that might help: I’m looking for a more automated approach to handle host-to-device data transfers when lowering MLIR to GPU, rather than having to modify the IR manually. Has anyone encountered similar issues or developed passes to handle this automatically?]

Let me know if you’d like me to clarify or expand on any part of this question.

lilil · January 15, 2025, 8:46am

Additionally, it seems that besides global data, if I create a value using memref.alloc, it cannot be directly passed as an input parameter to gpu.launch_func either, otherwise it will result in errors similar to the following:

'cuStreamSynchronize(stream)' failed with 'CUDA_ERROR_ILLEGAL_ADDRESS'
'cuStreamDestroy(stream)' failed with 'CUDA_ERROR_ILLEGAL_ADDRESS'
'cuModuleUnload(module)' failed with 'CUDA_ERROR_ILLEGAL_ADDRESS'

grypp · January 15, 2025, 12:39pm

Managing GPU-CPU data transfers automatically might require a significant amount of compiler work, and sometimes, it’s not even possible without full visibility into the entire program. It depends on what you want to support. my5cents

One way of handling copies automatically is to use the vendor’s unified memory or managed memory solutions. Unified memory might require specific systems, but managed memory is widely available.

In MLIR, you can allocate memref data using %memref = gpu.alloc host_shared () : memref<10xf32> and enable GPU-CPU read-write access. The underlying driver and hardware will manage the virtual pages and handle data transfers automatically.

Topic		Replies	Views
Failure lower gpu.alloc with global memory MLIR	2	137	May 27, 2024
How to transfer data to GPU? MLIR gpu	2	719	May 19, 2022
Simple GPU Memory Allocation MLIR gpu , mlir	5	179	December 11, 2024
Lowering memref.alloc to cudaHostAlloc MLIR	1	271	April 26, 2022
There is an issue when lowering the GPU dialect MLIR mlir	5	91	March 27, 2025

How to handle host-side global data automatically when lowering to GPU with MLIR?

Related topics