PTX backend fatal error

Hi everybody,

I am testing the PTX backend using the OpenCL NVIDIA SDK benchmarks.
Compiling the Histogram64.cl program I get a several backend errors.

I isolated one of them in the following kernel program:

__kernel void kernel_function(__global int *input) {
    __local char localArray[16];
    for(unsigned int index = 0; index < 16; ++index)
      localArray[index] = 0;
    input[0] = localArray[get_local_id(0)];
}

fatal error: error in backend: Cannot select:
      0x5810cc0: i32,ch = load 0x57fa148,
      0x5810ac0, 0x58105c0<LD1[%arrayidx1], sext
      from i8> [ID=9]
  0x5810ac0: i32 = add 0x58109c0, 0x5813640 [ORD=113] [ID=8]
    0x58109c0: i32 = PTXISD::COPY_ADDRESS 0x5813540 [ID=7]
      0x5813540: i32 = TargetGlobalAddress<[16 x i8] addrspace(4)*
@kernel_function.localArray> 0 [ID=4]
    0x5813640: i32,ch = load 0x57fa148, 0x5810dc0,
0x58105c0<LD4[%retval.i]> [ORD=110] [ID=5]
      0x5810dc0: i32 = FrameIndex<0> [ORD=110] [ID=1]
      0x58105c0: i32 = undef [ORD=110] [ID=2]
  0x58105c0: i32 = undef [ORD=110] [ID=2]

The command I am using is:

clang kernels/fatal_error_test.cl -O0 -include ocldef.h -include
builtin_functions_ptx.cl
                                                   -D__x86_64__
-ccc-host-triple ptx32 -Xclang
                                                   -target-feature
-Xclang +ptx23 -Xclang
                                                   -target-feature
-Xclang +compute20

Any ideas ?

Best regards

Alberto

Hi everybody,

I am testing the PTX backend using the OpenCL NVIDIA SDK benchmarks.
Compiling the Histogram64.cl program I get a several backend errors.

I isolated one of them in the following kernel program:

__kernel void kernel_function(__global int *input) {
__local char localArray[16];
for(unsigned int index = 0; index < 16; ++index)
localArray[index] = 0;
input[0] = localArray[get_local_id(0)];
}

fatal error: error in backend: Cannot select:
0x5810cc0: i32,ch = load 0x57fa148,
0x5810ac0, 0x58105c0<LD1[%arrayidx1], sext
from i8> [ID=9]
0x5810ac0: i32 = add 0x58109c0, 0x5813640 [ORD=113] [ID=8]
0x58109c0: i32 = PTXISD::COPY_ADDRESS 0x5813540 [ID=7]
0x5813540: i32 = TargetGlobalAddress<[16 x i8] addrspace(4)*
@kernel_function.localArray> 0 [ID=4]
0x5813640: i32,ch = load 0x57fa148, 0x5810dc0,
0x58105c0<LD4[%retval.i]> [ORD=110] [ID=5]
0x5810dc0: i32 = FrameIndex<0> [ORD=110] [ID=1]
0x58105c0: i32 = undef [ORD=110] [ID=2]
0x58105c0: i32 = undef [ORD=110] [ID=2]

The command I am using is:

clang kernels/fatal_error_test.cl -O0 -include ocldef.h -include
builtin_functions_ptx.cl
-D__x86_64__
-ccc-host-triple ptx32 -Xclang
-target-feature
-Xclang +ptx23 -Xclang
-target-feature
-Xclang +compute20

Any ideas ?

Unfortunately, this sample will not work at this time. First, the backend does not support i8 types yet. Second, at higher optimization levels, LLVM turns this loop into a memset intrinsic, which is also not yet implemented. :frowning:

Hopefully I’ll get some time soon to work on this, and other deficiencies. Patches are always welcome, too.

Justin,

Add this to your TargetLowering constructor, this fixes the mem* issue.

maxStoresPerMemcpy = 4096;

maxStoresPerMemmove = 4096;

maxStoresPerMemset = 4096;

Justin,

Add this to your TargetLowering constructor, this fixes the mem* issue.

maxStoresPerMemcpy = 4096;

maxStoresPerMemmove = 4096;

maxStoresPerMemset = 4096;

Thanks for this! I applied it in r144551. However, this particular kernel still will not pass through on account of the lack of support to sign-extend loads from i8 to i32.