Correct MLIR pass to compile `test/Integration/GPU/CUDA/async.mlir`

I was trying to compile the examples in https://github.com/llvm/llvm-project/tree/main/mlir/test/Integration/GPU/CUDA to executables. I can compile all of them except async.mlir.

My passes for async.mlir are like this:

mlir-opt async.mlir \
    --gpu-kernel-outlining \
    --pass-pipeline='gpu.module(strip-debuginfo,convert-gpu-to-nvvm,gpu-to-cubin)' \
    --gpu-async-region \
    --gpu-to-llvm \
    --async-to-async-runtime \
    --async-runtime-ref-counting \
    --convert-async-to-llvm \
    --convert-std-to-llvm 2>&1 >bin/async.mlir.out

mlir-translate bin/async.mlir.out --mlir-to-llvmir \
    | opt -O3 -S | llc -O3 | as - -o bin/async.o

clang++-11 bin/async.o -lcuda \
    $HOME/opt/llvm/lib/libmlir_cuda_runtime.so \
    $HOME/opt/llvm/lib/libmlir_runner_utils.so \
    $HOME/opt/llvm/lib/libmlir_c_runner_utils.so \
    -o bin/async

When I run .\async, Iā€™m getting this error ā€“

PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0.      Program arguments: llc -O3 
1.      Running pass 'Function Pass Manager' on module '<stdin>'.
2.      Running pass 'X86 DAG->DAG Instruction Selection' on function '@async_execute_fn'
#0 0x00007f3653e1842f llvm::sys::PrintStackTrace(llvm::raw_ostream&) (/usr/lib/x86_64-linux-gnu/libLLVM-11.so.1+0xaa642f)
#1 0x00007f3653e16790 llvm::sys::RunSignalHandlers() (/usr/lib/x86_64-linux-gnu/libLLVM-11.so.1+0xaa4790)
#2 0x00007f3653e18905 (/usr/lib/x86_64-linux-gnu/libLLVM-11.so.1+0xaa6905)
#3 0x00007f365334a3c0 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x153c0)
#4 0x00007f365437c730 (/usr/lib/x86_64-linux-gnu/libLLVM-11.so.1+0x100a730)
#5 0x00007f3654385ee8 (/usr/lib/x86_64-linux-gnu/libLLVM-11.so.1+0x1013ee8)
#6 0x00007f3654387331 (/usr/lib/x86_64-linux-gnu/libLLVM-11.so.1+0x1015331)
#7 0x0000000002177e02 
{standard input}: Assembler messages:
{standard input}: Warning: end of file not at end of a line; newline inserted
{standard input}:140: Error: bad register name `%rd'
{standard input}: Error: open CFI at the end of file; missing .cfi_endproc directive
clang: error: no such file or directory: 'bin/async.o'

However, I can run using mlir-cpu-runner without compiling, like this ā€“

mlir-opt async.mlir \
    -gpu-kernel-outlining \
    -pass-pipeline='gpu.module(strip-debuginfo,convert-gpu-to-nvvm,gpu-to-cubin)' \
    -gpu-async-region \
    -gpu-to-llvm \
    -async-to-async-runtime \
    -async-runtime-ref-counting \
    -convert-async-to-llvm \
    -convert-std-to-llvm \
| mlir-cpu-runner \
    --shared-libs=/$HOME/opt/llvm/lib/libmlir_cuda_runtime.so \
    --shared-libs=/$HOME/opt/llvm/lib/libmlir_async_runtime.so \
    --shared-libs=/$HOME/opt/llvm/lib/libmlir_runner_utils.so \
    --entry-point-result=void -O0 

and there is no problem, it runs.

But how do I compile it to executable to run on GPU?