RFC: LLVM Coroutine Representation, Round 2

Gor_Nishanov1 · July 11, 2016, 1:47pm

Hi all:

Thank you very much for the feedback during the first round! Special thanks to
Eli, Sanjoy, Hal and Chandler for their detailed public and private comments.
The model is simpler now and the intrinsics have nice symmetry to them:
coro.begin + coro.end, coro.alloc + coro.free. Compared to the previous RFC,
coro.fork is gone, coro.init and coro.start are collapsed into one coro.begin,
coro.suspend is three-way as opposed to two-way. coro.elide and coro.delete
renamed coro.alloc and coro.free.

All the changes implemented and tested. Full document, (nicely formatted by
github is at:

https://github.com/GorNishanov/llvm/blob/coro-rfc/docs/Coroutines.rst

Below is a summary of a proposal to add coroutine support to LLVM.
The proposal is motivated primarily by the desire to support C++ Coroutines [1],
however the llvm representation is language neutral and can be used
to support coroutines in other languages as well.

Clang + llvm coroutines allows you to take this code:

  generator<int> range(int from, int to) {
     for(int i = from; i < to; ++n)
        co_yield i;
  }
  int main() {
     int sum = 0;
     for (auto v: range(1,100))
        sum += v;
     return sum;
  }

And translate it down to this:

  define i32 @main() #5 {
  entry:
    ret i32 4950
  }

You can also use coroutines in plain C, by calling the builtins mapping to the
intrinsics described by this proposal, so that your coroutine can look like:

#include "Inputs/coro.h"
void print(int v);

void* f(int n) {
CORO_BEGIN(malloc);

      while (n-- > 0) {
        print(n+1);
        CORO_SUSPEND();
      }

CORO_END(free);
}

    // CHECK-LABEL: @main
    int main() {
      void* coro = f(4);
      CORO_RESUME(coro);
      CORO_RESUME(coro);
      CORO_DESTROY(coro);
    }

github.com

GorNishanov/clang/blob/coro-rfc/test/Coroutines/coro.c

// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fcoroutines -emit-llvm %s -o - -O3 | FileCheck %s
#include "Inputs/coro.h"
void print(int);

void* f() {
  CORO_BEGIN(malloc);

  for (int i = 0;; ++i) {
    print(i);
    CORO_SUSPEND();
  }

  CORO_END(free);
}

// CHECK-LABEL: @main
int main() {
  void* coro = f();
  CORO_RESUME(coro);
  CORO_RESUME(coro);

This file has been truncated. show original

github.com

GorNishanov/clang/blob/coro-rfc/test/Coroutines/Inputs/coro.h

void free(void *ptr);
void *malloc(unsigned int);

#define CORO_SUSPEND_IMPL(IsFinal)                                             \
  switch (__builtin_coro_suspend(__builtin_coro_save(0), IsFinal)) {           \
  case 0:                                                                      \
    if (IsFinal)                                                               \
      __builtin_trap();                                                        \
    break;                                                                     \
  case 1:                                                                      \
    goto coro_Cleanup;                                                         \
  default:                                                                     \
    goto coro_Suspend;                                                         \
  }

#define CORO_SUSPEND() CORO_SUSPEND_IMPL(0)
#define CORO_FINAL_SUSPEND() CORO_SUSPEND_IMPL(1)

#define CORO_BEGIN(AllocFunc)                                                  \
  void *coro_hdl =                                                             \

This file has been truncated. show original

I prototyped llvm changes to support this proposal and extended clang coroutine
implementation [2] to emit proposed intrinsics to smoke test the proposed
design. See "More Attention Required" in the doc/Coroutines.rst [4]for details
on what additional work is required beyond cleanup and bug fixes.

I would like to continue the discussion on the overall design, representation
and a roadmap to start upstreaming the changes with the intention to continue
the development in tree incrementally.

Roadmap:

Alen_Skondro1 · July 20, 2016, 11:44am

Hi,

Hi all:

Thank you very much for the feedback during the first round! Special thanks to
Eli, Sanjoy, Hal and Chandler for their detailed public and private comments.
The model is simpler now and the intrinsics have nice symmetry to them:
coro.begin + coro.end, coro.alloc + coro.free. Compared to the previous RFC,
coro.fork is gone, coro.init and coro.start are collapsed into one coro.begin,
coro.suspend is three-way as opposed to two-way. coro.elide and coro.delete
renamed coro.alloc and coro.free.

All the changes implemented and tested. Full document, (nicely formatted by
github is at:

https://github.com/GorNishanov/llvm/blob/coro-rfc/docs/Coroutines.rst

Reading through that document, I have a couple of questions (maybe
overlooking or misunderstanding some important explanations):

- As the caller of a coroutine, how do I know whether a coroutine
  reached its cleanup point or not? This seems essential for
  implementing iterators and looping on them, yet @llvm.coro.done only
  works if a particular suspend point was designated as "final" (which
  doesn't sound possible to do in the general case, i.e. halting
  problem).

- Is the promise type restricted to atomic LLVM types, or can I use
@llvm.coro.promise.p0i8 (together with a bitcast) if the promise is an
arbitrary complex structure?

- Is the promise allocated/copied inside the coroutine frame? If not,
how does the sharing work even if the coroutine is passed to different
callers?

Regards

Antoine.

vadimcn · July 21, 2016, 4:58pm

cc llvm-dev

Gor_Nishanov1 · July 21, 2016, 6:43pm

Hi Vadim:

Does you design support resumption with parameter(s)? (such as Python's
generator.send(x)). I suppose the "promise" could be used for passing data
both ways,

Correct.

The docs/Coroutines.rst states that:

"A coroutine author or a frontend may designate a distinguished `alloca`
that can be used to communicate with the coroutine."

What kind of communication happens is determined by the source language
coroutine constructs. An example later in the section shows how, as a consumer
of a coroutine, you can read the data from the promise. But that communication
can go the other way if you, instead, will be storing into the promise in main,
and loading from it in the coroutine.

coro.promise intrinsic gives you the address of the promise associated with a
particular coroutine instance and you (as frontend writer, or coroutine library
designer is free to do whatever you want with it).

please mention this explicitly in the design doc.

Would you suggest changes to the exact wording to make it clearer? I put it up
for review at: ⚙ D22603 [coroutines] Part 1 of N: Documentation, so you can just mark up the
changes you would like to see.

Also, how is loading/storing to promise going to be lowered?

The coro.promise intrinsics just gives you an address of the coroutine promise.
You, as frontend / library writer, know what is the promise and you emit
appropriate load or stores instructions.

If you have a synchronous scenario and a promise is plain int, you would use
plain loads and stores. If your promise is some complicated data structure
which has atomics and you use it from different threads, you would use
appropriate loads, stores, fences and synchronization primitives.

But all this is out of scope for docs/Coroutines.rst . It is up to the frontend,
library writer to decide how it wants to use the promise. The only special
handling that LLVM does for the coroutine promise is:

1) places the promise at deterministic offset in the coroutine frame
2) allows you to get an address of the promise given a coroutine handle and
vice versa.

Cheers,
Gor

P.S. If you'd like to see a library implementation of a generator, please see:

Generator itself:

github.com

GorNishanov/clang/blob/coro-rfc/test/Coroutines/Inputs/generator.h

#include "coroutine.h"

template <typename _Ty> struct generator {
  struct promise_type {
    _Ty current_value;
    std::suspend_always yield_value(_Ty value) {
      this->current_value = value;
      return {};
    }
    std::suspend_always initial_suspend() { return {}; }
    std::suspend_always final_suspend() { return {}; }
    generator get_return_object() { return generator{this}; };
    void return_void() {}
  };

  struct iterator {
    std::coroutine_handle<promise_type> _Coro;
    bool _Done;

    iterator(std::coroutine_handle<promise_type> Coro, bool Done)

This file has been truncated. show original

coroutine_handle (a C++ level abstraction mapping to llvm intrinsics)

github.com

GorNishanov/clang/blob/coro-rfc/test/Coroutines/Inputs/coroutine.h

#pragma once

namespace std {
template <typename R, typename...> struct coroutine_traits {
  using promise_type = typename R::promise_type;
};

template <typename Promise = void> struct coroutine_handle;

template <> struct coroutine_handle<void> {
  static coroutine_handle from_address(void *addr) noexcept {
    coroutine_handle me;
    me.ptr = addr;
    return me;
  }
  void operator()() { resume(); }
  void *address() const { return ptr; }
  void resume() const { __builtin_coro_resume(ptr); }
  void destroy() const { __builtin_coro_destroy(ptr); }
  bool done() const { return __builtin_coro_done(ptr); }

This file has been truncated. show original

and the use:

github.com

GorNishanov/clang/blob/coro-rfc/test/Coroutines/generator.cpp

// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fcoroutines -emit-llvm %s -o - -std=c++14 -O3 | FileCheck %s
#include "Inputs/generator.h"

generator<int> fib(int n) {
  for (int i = 0; i < n; ++i)
    co_yield i;
}

extern "C" void yield(int);

// CHECK-LABEL: @main
int main() {
  for (auto v : fib(5))
    yield(v);
// CHECK: call void @yield(i32 0)
// CHECK: call void @yield(i32 1)
// CHECK: call void @yield(i32 2)
// CHECK: call void @yield(i32 3)
// CHECK: call void @yield(i32 4)
}

Well, if start looking at those tests, you may enjoy coroutines in C via macros,
lowered to the same coroutine intrinsics :-).

github.com

GorNishanov/clang/blob/coro-rfc/test/Coroutines/coro.c

// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fcoroutines -emit-llvm %s -o - -O3 | FileCheck %s
#include "Inputs/coro.h"
void print(int);

void* f() {
  CORO_BEGIN(malloc);

  for (int i = 0;; ++i) {
    print(i);
    CORO_SUSPEND();
  }

  CORO_END(free);
}

// CHECK-LABEL: @main
int main() {
  void* coro = f();
  CORO_RESUME(coro);
  CORO_RESUME(coro);

This file has been truncated. show original

and

github.com

GorNishanov/clang/blob/coro-rfc/test/Coroutines/Inputs/coro.h

void free(void *ptr);
void *malloc(unsigned int);

#define CORO_SUSPEND_IMPL(IsFinal)                                             \
  switch (__builtin_coro_suspend(__builtin_coro_save(0), IsFinal)) {           \
  case 0:                                                                      \
    if (IsFinal)                                                               \
      __builtin_trap();                                                        \
    break;                                                                     \
  case 1:                                                                      \
    goto coro_Cleanup;                                                         \
  default:                                                                     \
    goto coro_Suspend;                                                         \
  }

#define CORO_SUSPEND() CORO_SUSPEND_IMPL(0)
#define CORO_FINAL_SUSPEND() CORO_SUSPEND_IMPL(1)

#define CORO_BEGIN(AllocFunc)                                                  \
  void *coro_hdl =                                                             \

This file has been truncated. show original

vadimcn · July 21, 2016, 8:08pm

What I am getting at, is that ideally the parameters/return value of generator.send() should become parameters/return value of @f.Resume:
define @f.Resume(%f.Frame* %FramePtr, %Param);

They don’t need to be allocated in the coroutine frame at all. Do you think that LLVM will be able to perform such an optimization?

Gor_Nishanov1 · July 21, 2016, 8:39pm

Hi Vadim:

What I am getting at, is that ideally the parameters/return value of
generator.send() should become parameters/return value of @f.Resume:
define <return type> @f.Resume(%f.Frame* %FramePtr, <input type>
%Param);

Absolutely! I sketched out this approach and it looks like it can be build on
top of the current model. I have not designed nor implemented it yet and it is
not part of this RFC.

When I get to designing it, I can keep you in the loop if you are interested.

Gor

P.S.

BTW, you would need to have as many f.resume's as you have suspend points.
So that if you have a coroutine that looks like:

  task<void> f(AsyncStream& s) {
    int a = await s.ReadInt();
    float b = await s.ReadFloat();
  }

you would have:

void f.resume.1(%f.Frame*, i32 result);
void f.resume.2(%f.Frame*, float result);

Topic		Replies	Views
[RFC] LLVM Coroutines LLVM Dev List Archives	9	115	June 11, 2016
Coroutine LLVM IR Intrinsics LLVM Dev List Archives	0	70	November 14, 2018
RFC: Coroutine Optimization Passes LLVM Dev List Archives	8	93	July 18, 2016
[RFC] LLVM Coroutines LLVM Dev List Archives	19	92	June 16, 2016
Coroutine elision not applied LLVM Dev List Archives	0	82	February 4, 2019

Related Topics