Where do gpu async tokens come from?

In the GPU dialect, most operations have the option of being predicated on previous operations completing through async tokens. At the same time, most GPU operations appear to require a token to be lowered to LLVM (see all the uses of llvm-project/mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp at main · llvm/llvm-project · GitHub in this file).

From looking at the dialect documentation, I don’t see a way to conjure a token for an operation with no predecessors. Consider the following psuedocode:

func.func() {
  %0 = gpu.alloc() ...
  gpu.launch (something that uses %0)
}

I can predicate something on the result of the gpu.launch, but I don’t know how to give a token to that first gpu.alloc() – without it, the AllocOp cannot be lowered to LLVM. I’ve been working around this by hacking the GPU to LLVM converter to pass null pointers into the gpu runtime layer if an async dependency isn’t present, but this doesn’t seem like the best way to do things.

In general when I’m wondering about this kind of things, I look for tests in the repo like this one for example: llvm-project/mlir/test/Conversion/GPUCommon/lower-sparse-to-gpu-runtime-calls.mlir at main · llvm/llvm-project · GitHub

2 Likes

gpu.wait op
It’s present all over the test cases and well-documented as well.

1 Like