Proposal to add stream/queue as an optional argument to few GPU dialect ops

We are still going to use async tokens and gpu.wait for dependency tracking. Stream itself doesn’t introduce any dependency info. All gpu runtimes need some sort of ‘context’ where you submit your kernels and goal of this proposal to make it explicit. Without it you will need to ether use global variables or to pass stream as async token (which is a hack and also, doesn’t allow diamond dependencies). It also allows more flexibility and things like outlining and caching stream creation or interleaving execution on multiple devices.

On level zero side we are going to implement stream as zeCommandQueueCreate/zeCommandListCreateImmediate and async tokens as events. Queues can asyncronous in L0 (i.e. execute submitted kernels in different order) so we need to sychronize them via events in any case.

1 Like