Proposal to add stream/queue as an optional argument to few GPU dialect ops

Hardcode84 · March 23, 2023, 12:28pm

We are still going to use async tokens and gpu.wait for dependency tracking. Stream itself doesn’t introduce any dependency info. All gpu runtimes need some sort of ‘context’ where you submit your kernels and goal of this proposal to make it explicit. Without it you will need to ether use global variables or to pass stream as async token (which is a hack and also, doesn’t allow diamond dependencies). It also allows more flexibility and things like outlining and caching stream creation or interleaving execution on multiple devices.

On level zero side we are going to implement stream as zeCommandQueueCreate/zeCommandListCreateImmediate and async tokens as events. Queues can asyncronous in L0 (i.e. execute submitted kernels in different order) so we need to sychronize them via events in any case.

Topic		Replies	Views
Is there a way to create dynamic number of streams on gpu? MLIR gpu	0	299	September 1, 2023
How to lower the combination of async gpu ops in `gpu` Dialect MLIR gpu	18	987	April 22, 2025
[RFC] Stream Dialect CIRCT rfc	21	1285	June 23, 2022
[RFC] Add NV-GPU dialect (HW specific extension of GPU dialect for Nvidia GPUs) MLIR	21	1827	April 15, 2022
[RFC] Add XeGPU dialect for Intel GPUs MLIR	21	12544	February 22, 2024

Proposal to add stream/queue as an optional argument to few GPU dialect ops

Related topics