GFX1010 support

Hi,

I would like to query about GFX1010 (Navi 10, RX5700 XT) support. I
understand it is not supported at the moment, what kind of work would
need to be done for this? I would like to help if I can, but I am not
sure how to proceed or even if I have the required background.

$ clinfo
...
fatal error: cannot open file '/usr/lib/clc/gfx1010-amdgcn-mesa-mesa3d.bc': No such file or directory
...

Cheers,
Filipe Laíns

Hi,

it might miss some optimizations, but adding the gfx1010 target to the cmake list of supported GPUs should give you a usable library.
You can test this by just creating a symlink named ‘gfx1010-amdgcn-mesa-mesa3d.bc’ in you current setup.
There are probably several gpu targets missing.

Jan

Sorry Jan (and everyone),

I forgot to hit reply-all when I sent the attached earlier.

--Aaron

0001-libclc-Add-several-AMDGPU-subtargets.patch (1.84 KB)

Hi,

Sorry Jan (and everyone),

I forgot to hit reply-all when I sent the attached earlier.

–Aaron

From: Aaron Watry <awatry@gmail.com>
Date: Mon, Jul 6, 2020 at 1:31 PM
Subject: Re: [Libclc-dev] GFX1010 support
To: Filipe Laíns <lains@archlinux.org>

o you have the ability to build libclc from source from a checkout of
the LLVM repository?
The repository is here: https://github.com/llvm/llvm-project.git

You’d only need to build/install the libclc/ sub-project to test any
prospective patches (see attached), not the full LLVM project.

I’ve attached a basic patch that adds some AMDGPU subtargets that were
missing along with the LLVM version they were added in.

thanks! Can you group those if statements by LLVM version?
Keep the comments and `set’ statements, just avoid multiple ifs.
with that change:

Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>

Jan

Hi Jan,

I enabled the gfx1010 target, clinfo passes. I have played around with
some things, do you have any suggestions to benchmark the GPU? And see
is the performance level is the expected.

Filipe Laíns

Hi,

I haven't run benchmarks much since there's still functionality
missing.
You can try 'clpeak' to get rough fp32/fp64 throughput numbers.
luxmark-3.1 should also work out of the box, but the 'spehere' scene
gives ~30% incorrect pixels on the HW that I have.
luxmark-4.x needs ocl-1.2, but works if the version is overridden using CLOVER_PLATFORM_VERSION_OVERRIDE, CLOVER_DEVICE_VERSION_OVERRIDE, CLOVER_DEVICE_CLC_VERSION_OVERRIDE.
All three env vars take string, so you need to set it to "1.2".

there are other benchmarks that you can try, like JohnTheRipper or Rodinia. Note that anything that needs image support won't work.

regards,Jan

Thank you for the suggestions! I am away this week so I don't have
access to the GPU, I'll give them a try when I get back :slight_smile:

Filipe Laíns

A more substantial patch will be needed for wave32 support, which is supposed to be the default for compute

-Matt

Sorry Jan (and everyone),

I forgot to hit reply-all when I sent the attached earlier.

—Aaron

A more substantial patch will be needed for wave32 support, which is supposed to be the default for compute

Can you elaborate? What changes will be visible on the IR level?

Jan

The implementation of most cross-lane/reduction operations needs to be swapped depending on the wave size (not sure if those are fully implemented already in libclc). Some intrinsics (like llvm.amdgcn.ballot) need to be differently mangled. Other cases require emitting different/longer code for wave32 to handle both halves of the wave for example. ROCm-Device-Libs mostly handles the two wave sizes similarly to the other library control libraries, like libclc.

A few sample cases: https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/amd-stg-open/ockl/src/lane.cl
https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/amd-stg-open/opencl/src/subgroup/subget.cl

-Matt

>
>
>
>
>
> >
> > Sorry Jan (and everyone),
> >
> > I forgot to hit reply-all when I sent the attached earlier.
> >
> > —Aaron
> >
>
> A more substantial patch will be needed for wave32 support, which is supposed to be the default for compute
>
> Can you elaborate? What changes will be visible on the IR level?
>
> Jan
>

The implementation of most cross-lane/reduction operations needs to be swapped depending on the wave size (not sure if those are fully implemented already in libclc). Some intrinsics (like llvm.amdgcn.ballot) need to be differently mangled. Other cases require emitting different/longer code for wave32 to handle both halves of the wave for example. ROCm-Device-Libs mostly handles the two wave sizes similarly to the other library control libraries, like libclc.

A few sample cases: https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/amd-stg-open/ockl/src/lane.cl
https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/amd-stg-open/opencl/src/subgroup/subget.cl

thanks. looks like we won't need it for clc 1.x, so it can be revisited later.
Jan