[NVPTX] For linkonce_odr NVPTX generates .weak, but even newest PTXAS can't handle it

Dear LLVM NVPTX maintainers,

Just to have the issue recorded, I don’t know how important it is:

clang generates linkonce_odr out of inline, and NVPTX generates .weak out of linkonce_odr (how it happens - a big question, btw, because I can’t find anything related in NVPTX asm printer - does it chain to some other printer?), and finally ptxas (both 4.2 and 5) fails to compile it to cubin. Below is the test case:

cat test3.cu

inline attribute((device)) attribute((used)) void test()
{
return;
}

clang -cc1 -emit-llvm -triple ptx64-unknown-unknown -fcuda-is-device test3.cu -o test3.ll
cat test3.ll
; ModuleID = ‘test3.cu
target datalayout = “e-p:64:64-i64:64:64-f64:64:
64-n1:8:16:32:64”
target triple = “ptx64-unknown-unknown”

@llvm.used = appending global [1 x i8*] [i8* bitcast (void ()* @_Z4testv to i8*)], section “llvm.metadata”

define linkonce_odr ptx_device void @_Z4testv() nounwind inlinehint {
entry:
ret void
}

llc -march=nvptx64 -mcpu=sm_20 test3.ll -o test3.ptx
cat test3.ptx

//
// Generated by LLVM NVPTX Back-End
//

.version 3.0
.target sm_20, texmode_independent
.address_size 64

.weak _Z4testv
.func _Z4testv(

) // @_Z4testv
{
.reg .pred %p<396>;
.reg .s16 %rc<396>;
.reg .s16 %rs<396>;
.reg .s32 %r<396>;
.reg .s64 %rl<396>;
.reg .f32 %f<396>;
.reg .f64 %fl<396>;

// BB#0: // %entry
ret;
}

  1. ptxas @ CUDA 4.2:

ptxas -arch=sm_20 -m64 test3.ptx -o -

ptxas test3.ptx, line 10; fatal : Parsing error near ‘.weak’: syntax error
ptxas fatal : Ptx assembly aborted due to errors

  1. ptxas @ CUDA 5:

~/cuda/bin/ptxas -arch=sm_20 -m64 test3.ptx -o -

ptxas test3.ptx, line 10; error : Feature ‘.weak directive’ requires PTX ISA .version 3.1 or later
ptxas test3.ptx, line 10; fatal : Parsing error near ‘_Z4testv’: syntax error
ptxas fatal : Ptx assembly aborted due to errors

  1. ptxas @ CUDA 5, changed .version to 3.1: still error, because according to 3.1 PTX spec, .weak must be followed by .func:

~/cuda/bin/ptxas -arch=sm_20 -m64 test3.ptx -o -

ptxas test3.ptx, line 10; fatal : Parsing error near ‘_Z4testv’: syntax error
ptxas fatal : Ptx assembly aborted due to errors

Best,

  • Dima.

Dear LLVM NVPTX maintainers,

Just to have the issue recorded, I don’t know how important it is:

clang generates linkonce_odr out of inline, and NVPTX generates .weak out of linkonce_odr (how it happens - a big question, btw, because I can’t find anything related in NVPTX asm printer - does it chain to some other printer?), and finally ptxas (both 4.2 and 5) fails to compile it to cubin. Below is the test case:

cat test3.cu

inline attribute((device)) attribute((used)) void test()
{
return;
}

clang -cc1 -emit-llvm -triple ptx64-unknown-unknown -fcuda-is-device test3.cu -o test3.ll
cat test3.ll
; ModuleID = ‘test3.cu
target datalayout = “e-p:64:64-i64:64:64-f64:64:
64-n1:8:16:32:64”
target triple = “ptx64-unknown-unknown”

@llvm.used = appending global [1 x i8*] [i8* bitcast (void ()* @_Z4testv to i8*)], section “llvm.metadata”

define linkonce_odr ptx_device void @_Z4testv() nounwind inlinehint {
entry:
ret void
}

llc -march=nvptx64 -mcpu=sm_20 test3.ll -o test3.ptx
cat test3.ptx

//
// Generated by LLVM NVPTX Back-End
//

.version 3.0
.target sm_20, texmode_independent
.address_size 64

.weak _Z4testv
.func _Z4testv(

) // @_Z4testv
{
.reg .pred %p<396>;
.reg .s16 %rc<396>;
.reg .s16 %rs<396>;
.reg .s32 %r<396>;
.reg .s64 %rl<396>;
.reg .f32 %f<396>;
.reg .f64 %fl<396>;

// BB#0: // %entry
ret;
}

  1. ptxas @ CUDA 4.2:

ptxas -arch=sm_20 -m64 test3.ptx -o -

ptxas test3.ptx, line 10; fatal : Parsing error near ‘.weak’: syntax error
ptxas fatal : Ptx assembly aborted due to errors

  1. ptxas @ CUDA 5:

~/cuda/bin/ptxas -arch=sm_20 -m64 test3.ptx -o -

ptxas test3.ptx, line 10; error : Feature ‘.weak directive’ requires PTX ISA .version 3.1 or later
ptxas test3.ptx, line 10; fatal : Parsing error near ‘_Z4testv’: syntax error
ptxas fatal : Ptx assembly aborted due to errors

  1. ptxas @ CUDA 5, changed .version to 3.1: still error, because according to 3.1 PTX spec, .weak must be followed by .func:

~/cuda/bin/ptxas -arch=sm_20 -m64 test3.ptx -o -

ptxas test3.ptx, line 10; fatal : Parsing error near ‘_Z4testv’: syntax error
ptxas fatal : Ptx assembly aborted due to errors

Thanks for the report. Unfortunately, this does not appear to have a trivial fix. As you mentioned, it is not the NVPTX back-end itself that is emitting the “.weak”, but the default MCAsmStreamer implementation. Setting WeakDefDirective in the NVPTXMCAsmInfo class seems to trigger an emission of “.weak_directive”, which doesn’t help things. Setting LinkOnceDirective helps the “.weak” case, but there is code in LLVM that causes a special label to be produced when LinkOnceDirective is set, which again messes with the PTX assembler.

This is a case where we really need a custom MCAsmStreamer, but this will take a bit of time. Is this a blocker for you?

Hi Justin,

Thanks for explanation!

Not really a blocker, but a bit of extra work, as I’m going to convert /opt/cuda/nvvm/ci_include.h to LLVM IR, in order to fuse it with non-C frontend language at IR-level. This header contains inline-s.

  • D.

2012/6/13 Justin Holewinski <justin.holewinski@gmail.com>