OpenCL toolset (for AMD GPU)

Hi Tom,

Thanks for your kind response!

> clang -include /path/to/libclc/headers/clc.h -I /path/to/libclc/headers -Dcl_clang_storage_class_specifiers -target amdgcn--amdhsa -mcpu=carrizo $INPUT_FILE -o $OUTPUT_FILE

When I tried to build cos.cl testcase libclc/cos.cl at master · llvm-mirror/libclc · GitHub

$ clang -include clc/clc.h -Dcl_clang_storage_class_specifiers -target amdgcn--amdhsa -mcpu=carrizo cos.cl -o cos.out

cos.cl:1:27: error: pointer arguments to kernel functions must reside in '__global', '__constant' or
       '__local' address space
__kernel void foo(float4 *f) {
                           ^
1 error generated.

Then I added __local before float4 *f like this:

Index: cos.cl

Sorry for my wrong post!

Dear Tom,

hsa-runtime was not available? https://github.com/tstellarAMD/hsa-runtime/

Hi Tom,

I found the correct mailing list finally :slight_smile:

Hi Tom,

Thanks for your kind response!

clang -include /path/to/libclc/headers/clc.h -I /path/to/libclc/headers -Dcl_clang_storage_class_specifiers -target amdgcn--amdhsa -mcpu=carrizo $INPUT_FILE -o $OUTPUT_FILE

When I tried to build cos.cl testcase libclc/cos.cl at master · llvm-mirror/libclc · GitHub

$ clang -include clc/clc.h -Dcl_clang_storage_class_specifiers -target amdgcn--amdhsa -mcpu=carrizo cos.cl -o cos.out

cos.cl:1:27: error: pointer arguments to kernel functions must reside in '__global', '__constant' or
       '__local' address space
__kernel void foo(float4 *f) {
                           ^
1 error generated.

Then I added __local before float4 *f like this:

Index: cos.cl

--- cos.cl (revision 306265)
+++ cos.cl (working copy)
@@ -1,3 +1,3 @@
-__kernel void foo(float4 *f) {
+__kernel void foo(__local float4 *f) {
    *f = cos(*f);
  }

because [OpenCL] Improve address space diagnostics ⚙ D27671 [OpenCL] Improve address space diagnostics.

then rebuilt again,

cos.cl:1:15: error: unsupported call to function _Z3cosDv4_f
__kernel void foo(__local float4 *f) {
               ^
1 error generated.

why not found *cos* function? please give me some hint, thanks a lot!

Hi,

This is becuase you aren't linking the kernel with the libclc bitcode library,
try passing the path to the libclc library to clang using this option:
  -mlink-bitcode-file

-Tom

I should at first read http://lists.llvm.org/pipermail/libclc-dev/2016-May/002203.html

$ clang -x cl -Dcl_clang_storage_class_specifiers -target amdgcn--amdhsa -mcpu=carrizo -B -Xclang -mlink-bitcode-file -Xclang clc/amdgcn--amdhsa.bc -include clc/clc.h -o cos.co cos.cl

but,

clang-5.0: error: unknown argument: '-mlink-bitcode-file'

Allow linking multiple bitcode files https://reviews.llvm.org/D13913 so buildbot and other developers had already checked the CC1Options.td for 'mlink-bitcode-file' argument, but how to use it correctly, please give me some hint, thanks a lot!

PS: it works as Ricardo mentioned!

$ clang -Dcl_clang_storage_class_specifiers -isystem /usr/include -include clc/clc.h -target amdgcn--amdhsa -mcpu=carrizo -S -emit-llvm -xcl -o cos.ll cos.cl
$ llvm-link cos.ll /usr/lib/clc/amdgcn--amdhsa.bc -o cos.linked.bc
$ clang -target amdgcn--amdhsa -mcpu=carrizo cos.linked.bc -S -o cos.amdhsa.s

$ clang --version
iSoft clang version 5.0.0 (trunk 305877) (based on LLVM 5.0.0svn)
Target: x86_64-isoft-linux
Thread model: posix
InstalledDir: /bin

   Registered Targets:
     ...
     amdgcn - AMD GCN GPUs
     ...
     avr - Atmel AVR Microcontroller
     ...
     mips64el - Mips64el [experimental]
     ...
     nvptx64 - NVIDIA PTX 64-bit
     ...
     r600 - AMD GPUs HD2XXX-HD6XXX
     ...

Hi Tom,

I found it https://clang.llvm.org/docs/FAQ.html for clang/include/clang/Driver/CC1Options.td

Hi Tom,

I found the correct mailing list finally :slight_smile:

Hi Tom,

Thanks for your kind response!

clang -include /path/to/libclc/headers/clc.h -I /path/to/libclc/headers -Dcl_clang_storage_class_specifiers -target amdgcn--amdhsa -mcpu=carrizo $INPUT_FILE -o $OUTPUT_FILE

When I tried to build cos.cl testcase libclc/cos.cl at master · llvm-mirror/libclc · GitHub

$ clang -include clc/clc.h -Dcl_clang_storage_class_specifiers -target amdgcn--amdhsa -mcpu=carrizo cos.cl -o cos.out

cos.cl:1:27: error: pointer arguments to kernel functions must reside in '__global', '__constant' or
       '__local' address space
__kernel void foo(float4 *f) {
                           ^
1 error generated.

Then I added __local before float4 *f like this:

Index: cos.cl

--- cos.cl (revision 306265)
+++ cos.cl (working copy)
@@ -1,3 +1,3 @@
-__kernel void foo(float4 *f) {
+__kernel void foo(__local float4 *f) {
    *f = cos(*f);
  }

because [OpenCL] Improve address space diagnostics https://reviews.llvm.org/D27671

then rebuilt again,

cos.cl:1:15: error: unsupported call to function _Z3cosDv4_f
__kernel void foo(__local float4 *f) {
               ^
1 error generated.

why not found *cos* function? please give me some hint, thanks a lot!

Hi,

This is becuase you aren't linking the kernel with the libclc bitcode library,
try passing the path to the libclc library to clang using this option:
  -mlink-bitcode-file

-Tom

I should at first read http://lists.llvm.org/pipermail/libclc-dev/2016-May/002203.html

$ clang -x cl -Dcl_clang_storage_class_specifiers -target amdgcn--amdhsa -mcpu=carrizo -B -Xclang -mlink-bitcode-file -Xclang clc/amdgcn--amdhsa.bc -include clc/clc.h -o cos.co cos.cl

but,

clang-5.0: error: unknown argument: '-mlink-bitcode-file'

Allow linking multiple bitcode files https://reviews.llvm.org/D13913 so buildbot and other developers had already checked the CC1Options.td for 'mlink-bitcode-file' argument, but how to use it correctly, please give me some hint, thanks a lot!

$ clang -x cl -Dcl_clang_storage_class_specifiers -target amdgcn--amdhsa -mcpu=carrizo -Xclang -mlink-bitcode-file /usr/lib/clc/amdgcn--amdhsa.bc -include clc/clc.h -o cos.co cos.cl

error: error reading '/tmp/amdgcn--amdhsa-ab2e8b.o'
error: unable to open output file '': 'Permission denied'
2 errors generated.

it failed to generate amdgcn--amdhsa-ab2e8b.o for /usr/lib/clc/amdgcn--amdhsa.bc?

PS: it works as Ricardo mentioned!

$ clang -Dcl_clang_storage_class_specifiers -isystem /usr/include -include clc/clc.h -target amdgcn--amdhsa -mcpu=carrizo -S -emit-llvm -xcl -o cos.ll cos.cl
$ llvm-link cos.ll /usr/lib/clc/amdgcn--amdhsa.bc -o cos.linked.bc
$ clang -target amdgcn--amdhsa -mcpu=carrizo cos.linked.bc -S -o cos.amdhsa.s

works!

$ clang -x assembler -target amdgcn--amdhsa -mcpu=carrizo -c -o cos.o cos.amdhsa.s
$ clang -target amdgcn--amdhsa cos.o -o cos.co
$ llvm-readobj -sections -symbols -program-headers cos.co

File: cos.co
Format: ELF64-amdgpu-hsacobj
Arch: amdgcn
AddressSize: 64bit
LoadName:
Sections [
   Section {
     Index: 0
     Name: (0)
     Type: SHT_NULL (0x0)
     Flags [ (0x0)
     ]
     Address: 0x0
     Offset: 0x0
     Size: 0
     Link: 0
     Info: 0
     AddressAlignment: 0
     EntrySize: 0
   }
   Section {
     Index: 1
     Name: .note (1)
     Type: SHT_NOTE (0x7)
     Flags [ (0x2)
       SHF_ALLOC (0x2)
     ]
     Address: 0x200
     Offset: 0x200
     Size: 1136
     Link: 0
     Info: 0
     AddressAlignment: 4
     EntrySize: 0
   }
   Section {
     Index: 2
     Name: .dynsym (7)
     Type: SHT_DYNSYM (0xB)
     Flags [ (0x2)
       SHF_ALLOC (0x2)
     ]
     Address: 0x670
     Offset: 0x670
     Size: 48
     Link: 4
     Info: 1
     AddressAlignment: 8
     EntrySize: 24
   }
   Section {
     Index: 3
     Name: .hash (15)
     Type: SHT_HASH (0x5)
     Flags [ (0x2)
       SHF_ALLOC (0x2)
     ]
     Address: 0x6A0
     Offset: 0x6A0
     Size: 24
     Link: 2
     Info: 0
     AddressAlignment: 4
     EntrySize: 4
   }
   Section {
     Index: 4
     Name: .dynstr (21)
     Type: SHT_STRTAB (0x3)
     Flags [ (0x2)
       SHF_ALLOC (0x2)
     ]
     Address: 0x6B8
     Offset: 0x6B8
     Size: 5
     Link: 0
     Info: 0
     AddressAlignment: 1
     EntrySize: 0
   }
   Section {
     Index: 5
     Name: .text (29)
     Type: SHT_PROGBITS (0x1)
     Flags [ (0x6)
       SHF_ALLOC (0x2)
       SHF_EXECINSTR (0x4)
     ]
     Address: 0x1000
     Offset: 0x1000
     Size: 8132
     Link: 0
     Info: 0
     AddressAlignment: 256
     EntrySize: 0
   }
   Section {
     Index: 6
     Name: .dynamic (35)
     Type: SHT_DYNAMIC (0x6)
     Flags [ (0x3)
       SHF_ALLOC (0x2)
       SHF_WRITE (0x1)
     ]
     Address: 0x3000
     Offset: 0x3000
     Size: 96
     Link: 4
     Info: 0
     AddressAlignment: 8
     EntrySize: 16
   }
   Section {
     Index: 7
     Name: .AMDGPU.csdata (44)
     Type: SHT_PROGBITS (0x1)
     Flags [ (0x0)
     ]
     Address: 0x0
     Offset: 0x3060
     Size: 0
     Link: 0
     Info: 0
     AddressAlignment: 1
     EntrySize: 0
   }
   Section {
     Index: 8
     Name: .comment (59)
     Type: SHT_PROGBITS (0x1)
     Flags [ (0x30)
       SHF_MERGE (0x10)
       SHF_STRINGS (0x20)
     ]
     Address: 0x0
     Offset: 0x3060
     Size: 186
     Link: 0
     Info: 0
     AddressAlignment: 1
     EntrySize: 0
   }
   Section {
     Index: 9
     Name: .symtab (68)
     Type: SHT_SYMTAB (0x2)
     Flags [ (0x0)
     ]
     Address: 0x0
     Offset: 0x3120
     Size: 264
     Link: 11
     Info: 10
     AddressAlignment: 8
     EntrySize: 24
   }
   Section {
     Index: 10
     Name: .shstrtab (76)
     Type: SHT_STRTAB (0x3)
     Flags [ (0x0)
     ]
     Address: 0x0
     Offset: 0x3228
     Size: 94
     Link: 0
     Info: 0
     AddressAlignment: 1
     EntrySize: 0
   }
   Section {
     Index: 11
     Name: .strtab (86)
     Type: SHT_STRTAB (0x3)
     Flags [ (0x0)
     ]
     Address: 0x0
     Offset: 0x3286
     Size: 64
     Link: 0
     Info: 0
     AddressAlignment: 1
     EntrySize: 0
   }
]
Symbols [
   Symbol {
     Name: (0)
     Value: 0x0
     Size: 0
     Binding: Local (0x0)
     Type: None (0x0)
     Other: 0
     Section: Undefined (0x0)
   }
   Symbol {
     Name: BB0_11 (1)
     Value: 0x25D0
     Size: 0
     Binding: Local (0x0)
     Type: None (0x0)
     Other: 0
     Section: .text (0x5)
   }
   Symbol {
     Name: BB0_12 (8)
     Value: 0x2A74
     Size: 0
     Binding: Local (0x0)
     Type: None (0x0)
     Other: 0
     Section: .text (0x5)
   }
   Symbol {
     Name: BB0_2 (15)
     Value: 0x12E8
     Size: 0
     Binding: Local (0x0)
     Type: None (0x0)
     Other: 0
     Section: .text (0x5)
   }
   Symbol {
     Name: BB0_3 (21)
     Value: 0x177C
     Size: 0
     Binding: Local (0x0)
     Type: None (0x0)
     Other: 0
     Section: .text (0x5)
   }
   Symbol {
     Name: BB0_5 (27)
     Value: 0x191C
     Size: 0
     Binding: Local (0x0)
     Type: None (0x0)
     Other: 0
     Section: .text (0x5)
   }
   Symbol {
     Name: BB0_6 (33)
     Value: 0x1DC0
     Size: 0
     Binding: Local (0x0)
     Type: None (0x0)
     Other: 0
     Section: .text (0x5)
   }
   Symbol {
     Name: BB0_8 (39)
     Value: 0x1F94
     Size: 0
     Binding: Local (0x0)
     Type: None (0x0)
     Other: 0
     Section: .text (0x5)
   }
   Symbol {
     Name: BB0_9 (45)
     Value: 0x2430
     Size: 0
     Binding: Local (0x0)
     Type: None (0x0)
     Other: 0
     Section: .text (0x5)
   }
   Symbol {
     Name: _DYNAMIC (55)
     Value: 0x3000
     Size: 0
     Binding: Local (0x0)
     Type: None (0x0)
     Other [ (0x2)
       STV_HIDDEN (0x2)
     ]
     Section: .dynamic (0x6)
   }
   Symbol {
     Name: foo (51)
     Value: 0x1000
     Size: 8132
     Binding: Global (0x1)
     Type: AMDGPU_HSA_KERNEL (0xA)
     Other: 0
     Section: .text (0x5)
   }
]
ProgramHeaders [
   ProgramHeader {
     Type: PT_PHDR (0x6)
     Offset: 0x40
     VirtualAddress: 0x40
     PhysicalAddress: 0x40
     FileSize: 448
     MemSize: 448
     Flags [ (0x4)
       PF_R (0x4)
     ]
     Alignment: 8
   }
   ProgramHeader {
     Type: PT_LOAD (0x1)
     Offset: 0x0
     VirtualAddress: 0x0
     PhysicalAddress: 0x0
     FileSize: 1725
     MemSize: 1725
     Flags [ (0x4)
       PF_R (0x4)
     ]
     Alignment: 4096
   }
   ProgramHeader {
     Type: PT_LOAD (0x1)
     Offset: 0x1000
     VirtualAddress: 0x1000
     PhysicalAddress: 0x1000
     FileSize: 8132
     MemSize: 8132
     Flags [ (0x5)
       PF_R (0x4)
       PF_X (0x1)
     ]
     Alignment: 4096
   }
   ProgramHeader {
     Type: PT_LOAD (0x1)
     Offset: 0x3000
     VirtualAddress: 0x3000
     PhysicalAddress: 0x3000
     FileSize: 96
     MemSize: 96
     Flags [ (0x6)
       PF_R (0x4)
       PF_W (0x2)
     ]
     Alignment: 4096
   }
   ProgramHeader {
     Type: PT_DYNAMIC (0x2)
     Offset: 0x3000
     VirtualAddress: 0x3000
     PhysicalAddress: 0x3000
     FileSize: 96
     MemSize: 96
     Flags [ (0x6)
       PF_R (0x4)
       PF_W (0x2)
     ]
     Alignment: 8
   }
   ProgramHeader {
     Type: PT_GNU_RELRO (0x6474E552)
     Offset: 0x3000
     VirtualAddress: 0x3000
     PhysicalAddress: 0x3000
     FileSize: 96
     MemSize: 4096
     Flags [ (0x4)
       PF_R (0x4)
     ]
     Alignment: 1
   }
   ProgramHeader {
     Type: PT_GNU_STACK (0x6474E551)
     Offset: 0x0
     VirtualAddress: 0x0
     PhysicalAddress: 0x0
     FileSize: 0
     MemSize: 0
     Flags [ (0x6)
       PF_R (0x4)
       PF_W (0x2)
     ]
     Alignment: 0
   }
   ProgramHeader {
     Type: PT_NOTE (0x4)
     Offset: 0x200
     VirtualAddress: 0x200
     PhysicalAddress: 0x200
     FileSize: 1136
     MemSize: 1136
     Flags [ (0x4)
       PF_R (0x4)
     ]
     Alignment: 4
   }
]

Oclgrind is awesome! GitHub - jrprice/Oclgrind: An OpenCL device simulator and debugger Although I do not have AMDGPU target real device llvm/lib/Target/AMDGPU but the simulator helps me being familiar with Host to communicate (clCreateBuffer) with Device :slight_smile:

Hi Tom,

I found it https://clang.llvm.org/docs/FAQ.html for clang/include/clang/Driver/CC1Options.td

Hi Tom,

I found the correct mailing list finally :slight_smile:

Hi Tom,

Thanks for your kind response!

clang -include /path/to/libclc/headers/clc.h -I /path/to/libclc/headers -Dcl_clang_storage_class_specifiers -target amdgcn--amdhsa -mcpu=carrizo $INPUT_FILE -o $OUTPUT_FILE

When I tried to build cos.cl testcase https://github.com/llvm-mirror/libclc/blob/master/test/cos.cl

$ clang -include clc/clc.h -Dcl_clang_storage_class_specifiers -target amdgcn--amdhsa -mcpu=carrizo cos.cl -o cos.out

cos.cl:1:27: error: pointer arguments to kernel functions must reside in '__global', '__constant' or
       '__local' address space
__kernel void foo(float4 *f) {
                           ^
1 error generated.

Then I added __local before float4 *f like this:

Index: cos.cl

--- cos.cl (revision 306265)
+++ cos.cl (working copy)
@@ -1,3 +1,3 @@
-__kernel void foo(float4 *f) {
+__kernel void foo(__local float4 *f) {
    *f = cos(*f);
  }

because [OpenCL] Improve address space diagnostics ⚙ D27671 [OpenCL] Improve address space diagnostics.

then rebuilt again,

cos.cl:1:15: error: unsupported call to function _Z3cosDv4_f
__kernel void foo(__local float4 *f) {
               ^
1 error generated.

why not found *cos* function? please give me some hint, thanks a lot!

Hi,

This is becuase you aren't linking the kernel with the libclc bitcode library,
try passing the path to the libclc library to clang using this option:
  -mlink-bitcode-file

-Tom

I should at first read http://lists.llvm.org/pipermail/libclc-dev/2016-May/002203.html

$ clang -x cl -Dcl_clang_storage_class_specifiers -target amdgcn--amdhsa -mcpu=carrizo -B -Xclang -mlink-bitcode-file -Xclang clc/amdgcn--amdhsa.bc -include clc/clc.h -o cos.co cos.cl

but,

clang-5.0: error: unknown argument: '-mlink-bitcode-file'

Allow linking multiple bitcode files https://reviews.llvm.org/D13913 so buildbot and other developers had already checked the CC1Options.td for 'mlink-bitcode-file' argument, but how to use it correctly, please give me some hint, thanks a lot!

$ clang -x cl -Dcl_clang_storage_class_specifiers -target amdgcn--amdhsa -mcpu=carrizo -Xclang -mlink-bitcode-file /usr/lib/clc/amdgcn--amdhsa.bc -include clc/clc.h -o cos.co cos.cl

error: error reading '/tmp/amdgcn--amdhsa-ab2e8b.o'
error: unable to open output file '': 'Permission denied'
2 errors generated.

it failed to generate amdgcn--amdhsa-ab2e8b.o for /usr/lib/clc/amdgcn--amdhsa.bc?

PS: it works as Ricardo mentioned!

$ clang -Dcl_clang_storage_class_specifiers -isystem /usr/include -include clc/clc.h -target amdgcn--amdhsa -mcpu=carrizo -S -emit-llvm -xcl -o cos.ll cos.cl
$ llvm-link cos.ll /usr/lib/clc/amdgcn--amdhsa.bc -o cos.linked.bc
$ clang -target amdgcn--amdhsa -mcpu=carrizo cos.linked.bc -S -o cos.amdhsa.s

works!

$ clang -x assembler -target amdgcn--amdhsa -mcpu=carrizo -c -o cos.o cos.amdhsa.s
$ clang -target amdgcn--amdhsa cos.o -o cos.co
$ llvm-readobj -sections -symbols -program-headers cos.co

I will test clCreateProgramWithBinary(... "cos.co" ...)!