If not, how does MLIR launches multiple GPU kernels in parallel?
'gpu' Dialect - MLIR (gpu.launch with async)
async