Hi everybody,
I am testing the PTX backend using the OpenCL NVIDIA SDK benchmarks.
Compiling the Histogram64.cl program I get a several backend errors.
I isolated one of them in the following kernel program:
__kernel void kernel_function(__global int *input) {
__local char localArray[16];
for(unsigned int index = 0; index < 16; ++index)
localArray[index] = 0;
input[0] = localArray[get_local_id(0)];
}
fatal error: error in backend: Cannot select:
0x5810cc0: i32,ch = load 0x57fa148,
0x5810ac0, 0x58105c0<LD1[%arrayidx1], sext
from i8> [ID=9]
0x5810ac0: i32 = add 0x58109c0, 0x5813640 [ORD=113] [ID=8]
0x58109c0: i32 = PTXISD::COPY_ADDRESS 0x5813540 [ID=7]
0x5813540: i32 = TargetGlobalAddress<[16 x i8] addrspace(4)*
@kernel_function.localArray> 0 [ID=4]
0x5813640: i32,ch = load 0x57fa148, 0x5810dc0,
0x58105c0<LD4[%retval.i]> [ORD=110] [ID=5]
0x5810dc0: i32 = FrameIndex<0> [ORD=110] [ID=1]
0x58105c0: i32 = undef [ORD=110] [ID=2]
0x58105c0: i32 = undef [ORD=110] [ID=2]
The command I am using is:
clang kernels/fatal_error_test.cl -O0 -include ocldef.h -include
builtin_functions_ptx.cl
-D__x86_64__
-ccc-host-triple ptx32 -Xclang
-target-feature
-Xclang +ptx23 -Xclang
-target-feature
-Xclang +compute20
Any ideas ?
Best regards
Alberto
Hi everybody,
I am testing the PTX backend using the OpenCL NVIDIA SDK benchmarks.
Compiling the Histogram64.cl program I get a several backend errors.
I isolated one of them in the following kernel program:
__kernel void kernel_function(__global int *input) {
__local char localArray[16];
for(unsigned int index = 0; index < 16; ++index)
localArray[index] = 0;
input[0] = localArray[get_local_id(0)];
}
fatal error: error in backend: Cannot select:
0x5810cc0: i32,ch = load 0x57fa148,
0x5810ac0, 0x58105c0<LD1[%arrayidx1], sext
from i8> [ID=9]
0x5810ac0: i32 = add 0x58109c0, 0x5813640 [ORD=113] [ID=8]
0x58109c0: i32 = PTXISD::COPY_ADDRESS 0x5813540 [ID=7]
0x5813540: i32 = TargetGlobalAddress<[16 x i8] addrspace(4)*
@kernel_function.localArray> 0 [ID=4]
0x5813640: i32,ch = load 0x57fa148, 0x5810dc0,
0x58105c0<LD4[%retval.i]> [ORD=110] [ID=5]
0x5810dc0: i32 = FrameIndex<0> [ORD=110] [ID=1]
0x58105c0: i32 = undef [ORD=110] [ID=2]
0x58105c0: i32 = undef [ORD=110] [ID=2]
The command I am using is:
clang kernels/fatal_error_test.cl -O0 -include ocldef.h -include
builtin_functions_ptx.cl
-D__x86_64__
-ccc-host-triple ptx32 -Xclang
-target-feature
-Xclang +ptx23 -Xclang
-target-feature
-Xclang +compute20
Any ideas ?
Unfortunately, this sample will not work at this time. First, the backend does not support i8 types yet. Second, at higher optimization levels, LLVM turns this loop into a memset intrinsic, which is also not yet implemented. 
Hopefully I’ll get some time soon to work on this, and other deficiencies. Patches are always welcome, too.
Justin,
Add this to your TargetLowering constructor, this fixes the mem* issue.
maxStoresPerMemcpy = 4096;
maxStoresPerMemmove = 4096;
maxStoresPerMemset = 4096;
Justin,
Add this to your TargetLowering constructor, this fixes the mem* issue.
maxStoresPerMemcpy = 4096;
maxStoresPerMemmove = 4096;
maxStoresPerMemset = 4096;
Thanks for this! I applied it in r144551. However, this particular kernel still will not pass through on account of the lack of support to sign-extend loads from i8 to i32.