AMD and libclc (progress)

I compiled and installed the latest ‘libclc’ from the repository and I successfully generated AMD GCN ASM from the GEMM kernel of the polybench-gpu benchmark (http://web.cse.ohio-state.edu/~pouchet/software/polybench/GPU/) executing the following commands:

/opt/clang+llvm-3.7.1-x86_64-linux-gnu-ubuntu-15.10/bin/clang -Dcl_clang_storage_class_specifiers -isystem /opt/libclc/include -include clc/clc.h -target amdgcn -S -emit-llvm -xcl -o gemm.ll gemm.cl

/opt/clang-3.9/bin/llvm-link gemm.ll /opt/libclc/lib/clc/verde-amdgcn–.bc -o gemm.linked.bc

/opt/clang-3.9/bin/clang -target amdgcn gemm.linked.bc -S -o gemm.verde.s

Now what should I do with the assembly file (i.e., ‘gemm.verde.s’)?
Can I use an assembler such as GCNASM (https://github.com/balidani/gcnasm) or CLRadeonExtender (http://clrx.nativeboinc.org/) to generate a binary compatible with AMD Catalyst?

I tried using both assemblers and they both complained with many of the instructing generated by Clang.

Can you give me some pointers on how to progress?

Ricardo

I compiled and installed the latest 'libclc' from the repository and I
successfully generated AMD GCN ASM from the GEMM kernel of the
polybench-gpu benchmark (
PolyBench/GPU -- Homepage of Louis-Noël Pouchet) executing
the following commands:

/opt/clang+llvm-3.7.1-x86_64-linux-gnu-ubuntu-15.10/bin/clang
-Dcl_clang_storage_class_specifiers -isystem /opt/libclc/include -include
clc/clc.h -target amdgcn -S -emit-llvm -xcl -o gemm.ll gemm.cl

/opt/clang-3.9/bin/llvm-link gemm.ll /opt/libclc/lib/clc/verde-amdgcn--.bc
-o gemm.linked.bc

/opt/clang-3.9/bin/clang -target amdgcn gemm.linked.bc -S -o gemm.verde.s

Now what should I do with the assembly file (i.e., 'gemm.verde.s')?
Can I use an assembler such as GCNASM (https://github.com/balidani/gcnasm)
or CLRadeonExtender (http://clrx.nativeboinc.org/) to generate a binary
compatible with AMD Catalyst?
I tried using both assemblers and they both complained with many of the
instructing generated by Clang.

Can you give me some pointers on how to progress?

Hi,

clang uses an integrated assembler for the amdgcn target so you can
compile directly to binary with clang.

These binaries are not compatible with catalyst.

If you have one of the GPU/CPU/Motherboard combinations supported by the
Radeon Open Compute (ROC) driver (which can be found here:
http://gpuopen.com/compute-product/rocm/), then you can use the
ROC apis to upload an execute code.

You can find driver installation instructions for ROC here:

There are a few examples here that demonstrate how to use the API
to upload code:

You can actually combine all the commands you used into a single
invocation, like this:

${CLANG} -x cl -Dcl_clang_storage_class_specifiers -target
amdgcn--amdhsa -mcpu=fiji -B -Xclang -mlink-bitcode-file -Xclang
$BITCODE_LIBRARY -include $BITCODE_LIBARY_HEADER -o ${f}.co
${CMAKE_CURRENT_SOURCE_DIR}/${f}.cl

Since you are using a tonga, which isn't officially supported by ROC,
the only way you can execute the code is by using the Open Source
OpenCL implementation known as clover, which is part of Mesa.
For more details see here: GalliumCompute
Also, if you want to compile code using clang for clover, you
need to use the amdgcn-- triple instead of amdgcn--amdhsa.

However, the binary format accepted by this implementation is
a container which includes the code output by clang in addition
to some other information. If you do want to use clover, it will
be easier to just to use the OpenCL APIs to compile the kernel
for you.

Note that clover is not a complete implementation, so there may be
things that don't work if you try it.

Let me know if you have any other questions.

-Tom

What about this?

https://github.com/RadeonOpenCompute/HCC-Native-GCN-ISA/wiki

What is the different between this and the methods you pointed to?

I have access to a R9 Nano (same chip as the Fury X).

Thanks,

Ricardo

What about this?

https://github.com/RadeonOpenCompute/HCC-Native-GCN-ISA/wiki

This is probably the easiest way to run code on the gpu. HCC is a modified
version of clang which supports the hcc language, which is a single
source c++ like language that lets you mix GPU and CPU code.

What is the different between this and the methods you pointed to?

The only difference is how you compile code, and interact with the roc
runtime. HCC uses the roc stack too.