C-to-PTX compilation issues

Hi all,

i'm trying to compile some small ANSI C benchmarks to PTX assembly. For this purpose, I'm using the NVPTX backend, introduced in the LLVM 3.2 release.

It appears that certain LLVM constructs cannot be compiled to PTX. The problems mostly deal with handling of arrays. I also don't get any debug info when these problems.

I use "llc" for compiling regular .ll files to PTX using the "nvptx" backend. The llc options are as follows (omitting these options also present the same problems):

-nacdrvtest -asm-verbose -stats -print-before-all -print-after-all -time-passes

1. Only programs that make no use of arrays appear to always be processed correctly by llc (nvptx target). Some programs using arrays, make llc "hang" (runs endlessly).

2. One small C function exposing this behavior is as follows:

// start here
int arrtest (int x, int y) {
   unsigned char b[3]={0x10,0x30,0x55};
   b[0] = (255 - b[x]) + b[y]; // doesn't work!
// b[0] = (255 - b[0]) + b[1]; // works
   b[1] = x - y;
   b[2] = x * y;
   return b[0];
}
// end here

3. A change that makes this function compilable (also visible in the comments)

Change:
b[0] = (255 - b[x]) + b[y];
to:
b[0] = (255 - b[0]) + b[1];

Similarly, even the use of a single variable indexing (e.g. b[x]) doesn't work.

Please note that if variable indexing is not used, then PTX assembly is generated. The problems seems related to the support/handling of variable indexing.

However, the same code is always processed by other target backends (x86, mips).

Is this behavior expected by the NVPTX backend?

I understand that PTX is primarily targeted by OpenCL, however I would expect that the backend would issue some warnings on unimplemented features and not just run endlessly with no messages whatsoever.

Best regards
Nikolaos Kavvadias

The issue you’re seeing is actually a problem with clang integration. Generally, clang does not understand all of the conventions required by the NVPTX back-end, and compilation from general C may not always work.

In this case, it’s the local array “b” that is the problem. Clang pulls this out into the global scope, but keeps it in address space 0. In the back-end, address space 0 is the PTX generic address space and global variables cannot use it. The index computations are not causing this issue. The following change would work:

// start here
attribute((address_space(3)))
static unsigned char b[3]={0x10,0x30,0x55};

int arrtest (int x, int y) {
b[0] = (255 - b[x]) + b[y]; // now it works!
// b[0] = (255 - b[0]) + b[1]; // works
b[1] = x - y;
b[2] = x * y;
return b[0];
}
// end here

If you compile LLVM in debug mode, you would have seen a failure about an unknown address space. I just committed a change in r174808 that emits a user-visible error for address space issues.

I understand that there is a lack of documentation on the proper use of this back-end, and this is something I am working on rectifying. GPU targets place additional restrictions and conventions on compilers, and they often require some level of custom handling. In this case, you need to be careful how you use address spaces. You’re not really doing anything wrong, clang was just never taught how to handle this case for NVPTX.

Thanks Justin for your prompt response.

The following change would work:

// start here
__attribute__((address_space(3)))
static unsigned char b[3]={0x10,0x30,0x55};

int arrtest (int x, int y) {
  b[0] = (255 - b[x]) + b[y]; // now it works!
// b[0] = (255 - b[0]) + b[1]; // works
  b[1] = x - y;
  b[2] = x * y;
  return b[0];
}
// end here

Thanks, I will attempt to use this principle to my other small codes.

If you compile LLVM in debug mode, you would have seen a failure about an
unknown address space. I just committed a change in r174808 that emits a
user-visible error for address space issues.

Indeed, I compiled LLVM in Release mode, so for this reason execution of llc appears to be hanging. I guess a Debug mode is always handy.

I understand that there is a lack of documentation on the proper use of
this back-end, and this is something I am working on rectifying. GPU
targets place additional restrictions and conventions on compilers, and
they often require some level of custom handling. In this case, you need
to be careful how you use address spaces. You're not really doing anything
wrong, clang was just never taught how to handle this case for NVPTX.

Best regards,
Nikolaos Kavvadias