[AMDGPU][PATCH 0/3] barriers/memory-fences related additions

Hello LLVM developers,

This serie of patches aims at defining the new intrinsics necessary to my libclc
serie of patches to implement barriers/memory fences on AMDGPUs.
This patch serie only provides stubs implementations of the necessary intrinsics.
For example, on evergreen hardware, we would need to modify the read/writes
surrounding operations (placed before or in the same loop) so that they make
an ACK when completed.
On Evergreen and SI, everything is lowered to a barrier instruction in fact (for
now).
I plan to add the necessary transformations in a following serie of patches.

Tested on Evergreen (Cedar) only.

(It's my first patches to an open-source project, so don't hesitate to point me my
mistakes/errors/indentation problems etc :slight_smile: ).

Sincerely,
Damien Hilloulin.

[1/3] Addition of the new intrinsics in AMDGPUIntrinsics.td
     This patch adds the definitions of the intrinsics used for
     barriers/memory fences support.
[2/3] Stubs implementation of the new intrinsics on Evergreen
     This patch adds some stubs to provide a first implementation of the
     intrinsics for barriers and memory fences on EG. The barrier.nofence()
     intrinsic is the only intrinsic correctly implemented (for sure)
     with this patch. Maybe the barrier.local() intrinsic can be
     considered ok like this as the LDS memory is atomic. The other
     intrinsics need to use WAIT_ACK in some way and that we modify
     the surrounding memory operations with ACK.
[3/3] Stubs implementation of the new intrinsics on Southern_Islands
      This patch is a first implementation of the newly added
      intrinsics for barriers/memory fences on SI. For ultra-simplicity, every
      intrinsic is lowered to a barrier with no fence.

  lib/Target/R600/AMDGPUIntrinsics.td | 11 +++++
  lib/Target/R600/EvergreenInstructions.td | 69

Hello LLVM developers,

This serie of patches aims at defining the new intrinsics necessary to my libclc
serie of patches to implement barriers/memory fences on AMDGPUs.
This patch serie only provides stubs implementations of the necessary intrinsics.
For example, on evergreen hardware, we would need to modify the read/writes
surrounding operations (placed before or in the same loop) so that they make
an ACK when completed.
On Evergreen and SI, everything is lowered to a barrier instruction in fact (for
now).
I plan to add the necessary transformations in a following serie of patches.

Tested on Evergreen (Cedar) only.

(It's my first patches to an open-source project, so don't hesitate to point me my
mistakes/errors/indentation problems etc :slight_smile: ).

To point out the obvious, patches should always be accompanied by testcases that exercise the new functionality.

thanks,
adrian