[PATCH 1/2] gallium: Add dimension parameter to launch_grid

This is needed for OpenCL

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>

Make the function static.

This needs corresponding change in LLVM otherwise it breaks parameter passing

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>

CC: Tom Stellard <tom@stellard.net>
CC: Matt Arsenault <Matthew.Arsenault@amd.com>

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
---
  include/llvm/IR/IntrinsicsR600.td | 2 ++
  lib/Target/R600/R600ISelLowering.cpp | 6 ++++--
  2 files changed, 6 insertions(+), 2 deletions(-)

Needs a test for the intrinsic

diff --git a/include/llvm/IR/IntrinsicsR600.td b/include/llvm/IR/IntrinsicsR600.td
index ba69eaa..37a9771 100644
--- a/include/llvm/IR/IntrinsicsR600.td
+++ b/include/llvm/IR/IntrinsicsR600.td
@@ -33,6 +33,8 @@ defm int_r600_read_tgid : R600ReadPreloadRegisterIntrinsic_xyz <
                                         "__builtin_r600_read_tgid">;
  defm int_r600_read_tidig : R600ReadPreloadRegisterIntrinsic_xyz <
                                         "__builtin_r600_read_tidig">;
+def int_r600_read_workdim : R600ReadPreloadRegisterIntrinsic <
+ "__builtin_r600_read_workdim">;
    } // End TargetPrefix = "r600"

We're trying to move the intrinsics to use the amdgpu name instead, although all the others use r600 now so it might be best to change them all at once.

  diff --git a/lib/Target/R600/R600ISelLowering.cpp b/lib/Target/R600/R600ISelLowering.cpp
index 4c603f8..1c59684 100644
--- a/lib/Target/R600/R600ISelLowering.cpp
+++ b/lib/Target/R600/R600ISelLowering.cpp
@@ -805,6 +805,8 @@ SDValue R600TargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const
        return LowerImplicitParameter(DAG, VT, DL, 7);
      case Intrinsic::r600_read_local_size_z:
        return LowerImplicitParameter(DAG, VT, DL, 8);
+ case Intrinsic::r600_read_workdim:
+ return LowerImplicitParameter(DAG, VT, DL, 9);
        case Intrinsic::r600_read_tgid_x:
        return CreateLiveInRegister(DAG, &AMDGPU::R600_TReg32RegClass,
@@ -1722,7 +1724,7 @@ SDValue R600TargetLowering::LowerFormalArguments(
      // being invalid. Somehow this seems to work with i64 arguments, but breaks
      // for <1 x i64>.
  - // The first 36 bytes of the input buffer contains information about
+ // The first 40 bytes of the input buffer contains information about
      // thread group and global sizes.

I think we should probably round this up to some larger number since it's likely more will need to be added in the future

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
---
generic/include/clc/workitem/get_work_dim.h | 1 +
r600/lib/SOURCES | 1 +
r600/lib/workitem/get_work_dim.ll | 6 ++++++
3 files changed, 8 insertions(+)
create mode 100644 generic/include/clc/workitem/get_work_dim.h
create mode 100644 r600/lib/workitem/get_work_dim.ll

diff --git a/generic/include/clc/workitem/get_work_dim.h b/generic/include/clc/workitem/get_work_dim.h
new file mode 100644
index 0000000..b4251e4
--- /dev/null
+++ b/generic/include/clc/workitem/get_work_dim.h
@@ -0,0 +1 @@
+_CLC_DECL uint get_num_groups();

surely this must be uint get_work_dim();

Jeroen

Maybe this should have range metadata attached now that it applies to calls?

Make the function static.

No need to cc llvm-commits on these mesa patches. Reviewers follow both
lists.

This needs corresponding change in LLVM otherwise it breaks parameter passing

CC: Tom Stellard <tom@stellard.net>
CC: Matt Arsenault <Matthew.Arsenault@amd.com>

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
---
src/gallium/drivers/r600/evergreen_compute.c | 26 +++++++++++++++++++++-----
src/gallium/drivers/r600/evergreen_compute.h | 1 -
2 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/src/gallium/drivers/r600/evergreen_compute.c b/src/gallium/drivers/r600/evergreen_compute.c
index 3928676..150bc5c 100644
--- a/src/gallium/drivers/r600/evergreen_compute.c
+++ b/src/gallium/drivers/r600/evergreen_compute.c
@@ -266,24 +266,31 @@ static void evergreen_bind_compute_state(struct pipe_context *ctx_, void *state)
  * DWORDS 3-5: Number of global work items in each dimension (x,y,z)
  * DWORDS 6-8: Number of work items within each work group in each dimension
  * (x,y,z)
- * DWORDS 9+ : Kernel parameters
+ * DWORD 9 : work dimension (needs new enough llvm)
+ * DWORDS 10+: Kernel parameters

I would prefer to add new parameters after the kernel arguments, so we
don't need to break compatibility with LLVM every time we add a new
parameter.

-Tom

v2: Fix function declaration
    Add range metadata to r600 implementation

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>

Jan Vesely <jan.vesely@rutgers.edu> writes:

This is needed for OpenCL

CC: Francisco Jerez <currojerez@riseup.net>
CC: Tom Stellard <tom@stellard.net>

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
---

I tried to find another way how to get the information, but in the end
there is no way to distinguish between <1,1,1> dim 1, and <1,1,1> dim 2 (or 3).
So passing the work_dim information is required.

this series fixes piglits cl-program-max-work-item-sizes, and get-work-dim.cl on my TURKS gpu.

My plan was to keep the pipe driver interface as simple as possible and
pass the dimension as an additional *kernel* parameter directly from the
clover front-end (along with other things that we currently don't handle
and we could abstract from the pipe driver, like the grid offset). I
don't see any reason to bother the driver with this as e.g. <1> and
<1,1,1> are equivalent grid definitions that will invariably result in
the same hardware setup.

P.S.: Sorry for taking so long to get back to you, been quite busy
during the last couple of weeks.

Jan Vesely <jan.vesely@rutgers.edu> writes:

> This is needed for OpenCL
>
> CC: Francisco Jerez <currojerez@riseup.net>
> CC: Tom Stellard <tom@stellard.net>
>
> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
> ---
>
> I tried to find another way how to get the information, but in the end
> there is no way to distinguish between <1,1,1> dim 1, and <1,1,1> dim 2 (or 3).
> So passing the work_dim information is required.
>
> this series fixes piglits cl-program-max-work-item-sizes, and get-work-dim.cl on my TURKS gpu.
>

My plan was to keep the pipe driver interface as simple as possible and
pass the dimension as an additional *kernel* parameter directly from the
clover front-end (along with other things that we currently don't handle
and we could abstract from the pipe driver, like the grid offset).

I'm currently looking into Tom's suggestion to pass these values after
the kernel args, appending workdim in clover should not be a big
problem.

  I don't see any reason to bother the driver with this as e.g. <1> and
<1,1,1> are equivalent grid definitions that will invariably result in
the same hardware setup.

My original idea was to get the last useful ( > 1) dimension, or return
1. However, the specs say that it should return the value passed to
clEnqueueNDRangeKernel, and that's what the existing piglits tests.

jan

P.S.: Sorry for taking so long to get back to you, been quite busy
during the last couple of weeks.

no problem, there's no rush. I need to prioritize university work these
weeks anyway

v2: Add SI lowering
    Add test

v3: Work dimensions after the kernel arguments.

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>

+ case Intrinsic::r600_read_workdim: {
+ const size_t arg_size = DAG.getMachineFunction().getFunction()->arg_size();

arg_size() returns the number of arguments, not their actual size. You can't assume every argument is 4 bytes. There could be larger types, vectors, or structs

> + case Intrinsic::r600_read_workdim: {
> + const size_t arg_size = DAG.getMachineFunction().getFunction()->arg_size();
arg_size() returns the number of arguments, not their actual size. You
can't assume every argument is 4 bytes. There could be larger types,
vectors, or structs

ah, right. I guess I'll have to repeat most of the argument magic from
clover/llvm/invocation.cpp

> > + case Intrinsic::r600_read_workdim: {
> > + const size_t arg_size = DAG.getMachineFunction().getFunction()->arg_size();
> arg_size() returns the number of arguments, not their actual size. You
> can't assume every argument is 4 bytes. There could be larger types,
> vectors, or structs

ah, right. I guess I'll have to repeat most of the argument magic from
clover/llvm/invocation.cpp

The way I would do this is to add a field to AMDGPUMachineFunctionInfo
called something like ABIArgOffset, and then set in
SITargetLowering::LowerFormalArguments() where we compute the offset for
each argument something like:

   for (unsigned i = 0, e = Ins.size(), ArgIdx = 0; i != e; ++i) {

     const ISD::InputArg &Arg = Ins[i];
     if (Skipped & (1 << i)) {
       InVals.push_back(DAG.getUNDEF(Arg.VT));
       continue;
     }

     CCValAssign &VA = ArgLocs[ArgIdx++];
     EVT VT = VA.getLocVT();

     if (VA.isMemLoc()) {
       VT = Ins[i].VT;
       EVT MemVT = Splits[i].VT;
+ unsigned Offset = 36 + VA.getLocMemOffset();
       // The first 36 bytes of the input buffer contains information
       // about
       // thread group and global sizes.
       SDValue Arg = LowerParameter(DAG, VT, MemVT, DL, DAG.getRoot(),
+ Offset,
- 36 + VA.getLocMemOffset(),
                                    Ins[i].Flags.isSExt());
       InVals.push_back(Arg);
+ MFI->ABIArgOffset = Offset + MemVT.getSizeInBits() / 8
       continue;
     }

Then can you can use this computed offset when you lower get_dim.

-Tom

I think Jan posted this to the wrong list libclc instead of llvm-dev? Maybe post the updated patch to llvm-dev?

Jeroen

I think Jan posted this to the wrong list libclc instead of llvm-dev?
Maybe post the updated patch to llvm-dev?

you're right, sorry about that. I'll be more careful with v4.

jan

I think Jan posted this to the wrong list libclc instead of llvm-dev?
Maybe post the updated patch to llvm-dev?

you're right, sorry about that. I'll be more careful with v4.

No problem, it’s mostly that you get maximum exposure there,
although I guess that most people that care also read this list.

Jeroen