Problem lowering nested async

I am having some difficulties with nested async to the extent that I am wondering if this feature is functional outside of toy cases.

Here is my input IR:

func private @foo()

func @bar() {
  %c1 = arith.constant 1 : index
  %c0 = arith.constant 0 : index
  %c4 = arith.constant 4 : index
  %0 = async.create_group %c4 : !async.group
  scf.for %arg3 = %c0 to %c4 step %c1 {
    %token = async.execute {
      call @foo() : () ->()
      %8 = async.create_group %c4 : !async.group
      scf.for %arg4 = %c0 to %c4 step %c1 {
        %token_1 = async.execute {
          call @foo() : () ->()
          async.yield
        }
        %9 = async.add_to_group %token_1, %8 : !async.token
      }
      async.await_all %8
      async.yield
    }
    %1 = async.add_to_group %token, %0 : !async.token
  }
  async.await_all %0
  return
}

Case 1

Trying to lower with mlir-opt -async-to-async-runtime fails with:

error: failed to legalize operation 'scf.for' that was explicitly marked illegal
      scf.for %arg4 = %c0 to %c4 step %c1 {
      ^

Upon inspecting the code, SCF is only dynamically legal when not nested.

Case 2

Unfortunately, trying to lower to cfg first to make the conversion happy (i.e. mlir-opt -convert-scf-to-std) triggers another issue:

error: 'async.execute' op expects region #0 to have 0 or 1 blocks
    %token = async.execute {

The intersection of these 2 constraints seems unfeasible to me.

I would be interested in understanding the key reason why nested makes the scf dialect illegal in Case 1.

What is the right course of action here?

Thanks!

@herhut @ezhulenev @mehdi_amini @benvanik

FWIW, locally making scf legal in async-to-async-runtime works for me and executes correctly.
I do not know the deeper implications at this time though.

Yes, it’s a known problem, but I forgot why exactly :slight_smile: If I remember correctly awaiting in the scf.for region is the problem, because coroutines require a very specific CFG structure, and you can’t setup a CFG inside the scf.for body.

It’s on my list of things to fix, I think about adding lower-to-cfg patterns to async-to-async runtime pass, because for coroutines to work SCF must go away.

You can if you wrap it into an scf.execute_region.