[PATCH 1/3] Implement wait_group_events builtin v2

This is a simple default implemetation which just calls barrier().

v2:
  - Only call barrier() once.

This is a simple implementation which just copies data synchronously.

v2:
  - Use size_t.

This is a simple implementation which just copies data synchronously.

v2:
  - Use size_t.