UNREACHABLE executed! error while trying to generate PTX

I am trying to generate PTX code for 'nbody' sample program's kernel
(nbody_kernel.cu) using clang/LLVM version 3.2. The nbody CUDA program is
available in Nvidia's SDK.

I am referring to https://github.com/jholewinski/llvm-ptx-samples project.

Following are my commands,

clang++ -O4 -S -I/usr/local/cuda/include -emit-llvm -target nvptx64
nbody_kernel.cu -o nbody_kernel.ll

opt -O3 -loop-unroll -unroll-allow-partial nbody_kernel.ll -o
nbody_kernel.ll

llc nbody_kernel.ll -o nbody_kernel.ptx

After execution of the last command(llc) I get a UNREACHABLE executed! error
with the following stack trace

[DEVICE-C++] nbody.kernel.cpp
unexpected address space
UNREACHABLE executed at
/home/pratnali/LLVM/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp:1317!
0 libLLVM-3.3svn.so 0x00007f3857bdf0cb
llvm::sys::PrintStackTrace(_IO_FILE*) + 43
1 libLLVM-3.3svn.so 0x00007f3857bde74a
2 libpthread.so.0 0x00007f3856c3c460
3 libc.so.6 0x00007f3855a90b15 gsignal + 53
4 libc.so.6 0x00007f3855a91f96 abort + 390
5 libLLVM-3.3svn.so 0x00007f3857bc30f7 llvm::llvm_unreachable_internal(char
const*, char const*, unsigned int) + 359
6 libLLVM-3.3svn.so 0x00007f385722967d
7 libLLVM-3.3svn.so 0x00007f385722b6d7
8 libLLVM-3.3svn.so 0x00007f3857341723
llvm::FPPassManager::doInitialization(llvm::Module&) + 99
9 libLLVM-3.3svn.so 0x00007f385734639d
llvm::MPPassManager::runOnModule(llvm::Module&) + 205
10 libLLVM-3.3svn.so 0x00007f3857349b7c
llvm::PassManagerImpl::run(llvm::Module&) + 268
11 llc 0x000000000040b534
12 llc 0x000000000040d131 main + 465
13 libc.so.6 0x00007f3855a7d4bd __libc_start_main + 253
14 llc 0x0000000000406e59
Stack dump:
0. Program arguments: llc nbody.kernel.ll -o nbody.kernel.ptx
make: *** [nbody.kernel.ptx] Aborted

I replaced the global indexes like for e.g threadIdx.x with
__builtin_ptx_read_tid_x() and others. There are no problems in generating
LLVM IR (i.e .ll). The error pops up while trying to generate PTX from the
IR using llc.

Any pointers on what might be going on here ? Will appreciate any help in
going forward

I have attached my program and observations in a README here.
LLVM_PTX_nbody.tar.gz
<http://llvm.1065342.n5.nabble.com/file/n56026/LLVM_PTX_nbody.tar.gz&gt;

You can easily reproduce the problem using this.

I noticed you’re using cuda_runtime.h in the source file. Where are you getting this file? From the CUDA toolkit?

Since the error is in the back-end, can you just post the .ll or .bc file you are trying to compile?

Please find the .ll attached below . Yes, I am using the cuda_runtime.h from
the toolkit.
nbody.kernel.ll
<http://llvm.1065342.n5.nabble.com/file/n56048/nbody.kernel.ll&gt;
- Uday

The problem you are seeing is because clang is putting the global variables in address space 0, which in NVPTX means the generic address space. PTX does not allow this, so the back-end should be printing an error for you. Are you using trunk or 3.2?

Generally, clang won’t be compatible with the CUDA Toolkit headers. If you want to use the constant modifier from CUDA in Clang, define it like so:

#define constant attribute((address_space(2)))

OK. That helps.
It does flash a warning though

[DEVICE-C++] nbody.kernel.cpp
nbody.kernel.cpp:29:9: warning: '__constant__' macro redefined
#define __constant__ __attribute__((address_space(2)))
        ^
/opt/cuda/include/host_defines.h:183:9: note: previous definition is here
#define __constant__ \
        ^
1 warning generated.

Another question is
What about extern __shared__ ?

I can see that the error goes away if I replace "extern __shared__ float4
sharedPos" with "__shared__ float4* sharedPos;". Do I have to dynamically
allocate the shared memory by specifying size in kernel Launch? If so, why
doesn't the second use of the same statement in another function cause the
error ?

I am using 3.2.

OK. That helps.
It does flash a warning though

[DEVICE-C++] nbody.kernel.cpp
nbody.kernel.cpp:29:9: warning: '__constant__' macro redefined
#define __constant__ __attribute__((address_space(2)))
        ^
/opt/cuda/include/host_defines.h:183:9: note: previous definition is here
#define __constant__ \
        ^
1 warning generated.

Another question is
What about extern __shared__ ?

I can see that the error goes away if I replace "extern __shared__ float4
sharedPos" with "__shared__ float4* sharedPos;". Do I have to dynamically
allocate the shared memory by specifying size in kernel Launch? If so, why
doesn't the second use of the same statement in another function cause the
error ?

I am using 3.2.

I would just do away with the toolkit headers. I may try to put together
some minimalistic headers for clang w/ nvptx at some point. Your best bet
is to just define what you need yourself for now.

__shared__ would be address space 3, so:

#define __shared__ __attribute__((address_space(3)))

Either using or * should work. Just be aware that you will need to
specify a shared size when you launch the kernel. You can get the address
space mapping from lib/Target/NVPTX/MCTargetDesc/NVPTXBaseInfo.h.

Thanks a lot Justin,

I will remove the toolkit header. Just one last question..(maybe :wink: ) If I
do away with toolkit headers it says unknown type name '__device__'. Does
this function qualifier have an alternative ? or I can just do away with ?

Not really. Clang does not have a way to annotate device vs. kernel functions in C/C++ mode. You’re probably better off trying to use OpenCL or CUDA mode in clang.

In the clang unit tests, there is a cuda.h header that provides very basic support for these keywords: tests/SemaCUDA/cuda.h

If you compile as CUDA (use .cu extension, or “-x cuda”) and use this header, you will have basic support. You can invoke clang with something like:

$ clang test1.cu -Xclang -fcuda-is-device -I …/src/clang/test/SemaCUDA -Xclang -triple -Xclang nvptx64 -Xclang -target-cpu -Xclang sm_20 -S

… assuming your clang source directory is …/src/clang, you want 64-bit PTX, and your target SM is 2.0. Adjust accordingly.

Clang also knows how to map OpenCL to PTX, so you would do something like:

$ clang test1.cl -Xclang -triple -Xclang nvptx64 -Xclang -target-cpu -Xclang sm_20 -S

Well, I tried the command line given by you and I get the following error

clang++ nbody.kernel.cu -Xclang -fcuda-is-device
-I/home/upitamba/llvm-3.2.src/tools/clang/test/SemaCUDA/ -Xclang -triple
-Xclang nvptx64 -Xclang -target-cpu -Xclang sm_10 -S

fatal error: error in backend: Cannot select: 0x334a870: v4f32 =
NVPTXISD::MoveParam 0x334a770 [ORD=1] [ID=22]
  0x334a770: v4f32 = TargetExternalSymbol'.PARAM0' [ID=1]
In function: computeBodyAccel

Am I doing anything wrong here ?

Attached my new nbody.kernel.cu
<http://llvm.1065342.n5.nabble.com/file/n56141/nbody.kernel.cu&gt; .cu here

That particular error is fixed in trunk, but with your code I’m now hitting a new issue. I’ll get the fix in soon.

This file should compile fine now on trunk.