[NVPTX] Backend failure in LegalizeDAG due to unimplemented expand in target lowering

Dear LLVM,

I'm trying to understand why the attached IR code works for x86_64
target and fails for nvptx64, because of unimplemented expand during
the target lowering. Any ideas?
Just change the target triple to x86_64-unknown-unknown, and the same
IR code could we successfully codegen-ed for x86_64.

Thanks,
- Dima.

dmikushin@dmikushin-desktop:~/Desktop$ gdb ~/sandbox/bin/llc
GNU gdb (Ubuntu/Linaro 7.3-0ubuntu2) 7.3-2011.08
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html&gt;
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/&gt;\.\.\.
Reading symbols from /home/dmikushin/sandbox/bin/llc...done.
(gdb) r -march=nvptx64 test.ll
Starting program: /home/dmikushin/sandbox/bin/llc -march=nvptx64 test.ll
[Thread debugging using libthread_db enabled]
This action is not supported yet!
UNREACHABLE executed at
/home/dmikushin/sandbox/src/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp:1198!

Program received signal SIGABRT, Aborted.
0x00007ffff55ed3a5 in __GI_raise (sig=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:64
64 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
  in ../nptl/sysdeps/unix/sysv/linux/raise.c
(gdb) bt
#0 0x00007ffff55ed3a5 in __GI_raise (sig=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x00007ffff55f0b0b in __GI_abort () at abort.c:92
#2 0x00007ffff70183b3 in llvm::llvm_unreachable_internal
(msg=0x7ffff75b4570 "This action is not supported yet!",
    file=0x7ffff75b4128
"/home/dmikushin/sandbox/src/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp",
line=1198) at /home/dmikushin/sandbox/src/llvm/lib/Support/ErrorHandling.cpp:98
#3 0x00007ffff6e9a612 in (anonymous
namespace)::SelectionDAGLegalize::LegalizeOp (this=0x7fffffffd300,
Node=0x722820)
    at /home/dmikushin/sandbox/src/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp:1198
#4 0x00007ffff6e911ca in (anonymous
namespace)::SelectionDAGLegalize::LegalizeDAG (this=0x7fffffffd300) at
/home/dmikushin/sandbox/src/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp:227
#5 0x00007ffff6eb6092 in llvm::SelectionDAG::Legalize (this=0x697590)
at /home/dmikushin/sandbox/src/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp:3689
#6 0x00007ffff6fb3862 in llvm::SelectionDAGISel::CodeGenAndEmitDAG
(this=0x697230) at
/home/dmikushin/sandbox/src/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:632
#7 0x00007ffff6fb2a84 in llvm::SelectionDAGISel::SelectBasicBlock
(this=0x697230, Begin=..., End=..., HadTailCall=@0x7fffffffd880)
    at /home/dmikushin/sandbox/src/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:499
#8 0x00007ffff6fb5792 in llvm::SelectionDAGISel::SelectAllBasicBlocks
(this=0x697230, Fn=...) at
/home/dmikushin/sandbox/src/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:1160
#9 0x00007ffff6fb1ef5 in llvm::SelectionDAGISel::runOnMachineFunction
(this=0x697230, mf=...) at
/home/dmikushin/sandbox/src/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:367
#10 0x00007ffff68c91fd in llvm::MachineFunctionPass::runOnFunction
(this=0x697230, F=...) at
/home/dmikushin/sandbox/src/llvm/lib/CodeGen/MachineFunctionPass.cpp:33
#11 0x00007ffff6acb524 in llvm::FPPassManager::runOnFunction
(this=0x68ec40, F=...) at
/home/dmikushin/sandbox/src/llvm/lib/VMCore/PassManager.cpp:1478
#12 0x00007ffff6acb73f in llvm::FPPassManager::runOnModule
(this=0x68ec40, M=...) at
/home/dmikushin/sandbox/src/llvm/lib/VMCore/PassManager.cpp:1498
#13 0x00007ffff6acba82 in llvm::MPPassManager::runOnModule
(this=0x67e4c0, M=...) at
/home/dmikushin/sandbox/src/llvm/lib/VMCore/PassManager.cpp:1552
#14 0x00007ffff6acbfa5 in llvm::PassManagerImpl::run (this=0x63f090,
M=...) at /home/dmikushin/sandbox/src/llvm/lib/VMCore/PassManager.cpp:1635
#15 0x00007ffff6acc159 in llvm::PassManager::run (this=0x7fffffffdf00,
M=...) at /home/dmikushin/sandbox/src/llvm/lib/VMCore/PassManager.cpp:1664
#16 0x000000000040ec9c in main (argc=3, argv=0x7fffffffe148) at
/home/dmikushin/sandbox/src/llvm/tools/llc/llc.cpp:484
(gdb) f 4
#4 0x00007ffff6e911ca in (anonymous
namespace)::SelectionDAGLegalize::LegalizeDAG (this=0x7fffffffd300) at
/home/dmikushin/sandbox/src/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp:227
227 LegalizeOp(N);
(gdb) p N->dump()
0x722820: ch = store 0x718f70, 0x722220, 0x714200,
0x717740<ST1[%483](align=8)> [ID=58]
$1 = void

test.ll (140 KB)

Hi again,

Kind people on #llvm helped me to utilize bugpoint to reduce the
previously submitted test case. For record, it code be done with the
following command:

$ bugpoint -llc-safe test.ll

The resulting IR is attached, and it is crashing in the same way. Is
it a valid code?

dmikushin@hp2:~/forge/kernelgen/branches/tests_lnt/behavior/sincos>
llc test.ll.1
This action is not supported yet!
UNREACHABLE executed at
/tmp/rpmbuild_debug/BUILD/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp:1194!
0 libLLVM-3.2svn.so 0x00007f395f147077
1 libLLVM-3.2svn.so 0x00007f395f14763d
2 libpthread.so.0 0x00007f395dee05d0
3 libc.so.6 0x00007f395d74b945 gsignal + 53
4 libc.so.6 0x00007f395d74cf21 abort + 385
5 libLLVM-3.2svn.so 0x00007f395f1305d9
llvm::report_fatal_error(llvm::Twine const&) + 0
6 libLLVM-3.2svn.so 0x00007f395efdb4d2
7 libLLVM-3.2svn.so 0x00007f395efdfc3b
8 libLLVM-3.2svn.so 0x00007f395efdfd2d llvm::SelectionDAG::Legalize() + 49
9 libLLVM-3.2svn.so 0x00007f395f0d0d76
llvm::SelectionDAGISel::CodeGenAndEmitDAG() + 2532
10 libLLVM-3.2svn.so 0x00007f395f0d2ae6
llvm::SelectionDAGISel::SelectBasicBlock(llvm::ilist_iterator<llvm::Instruction

, llvm::ilist_iterator<llvm::Instruction const>, bool&) + 228

11 libLLVM-3.2svn.so 0x00007f395f0d3524
llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) +
2620
12 libLLVM-3.2svn.so 0x00007f395f0d3ade
llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) +
896
13 libLLVM-3.2svn.so 0x00007f395ea033de
llvm::MachineFunctionPass::runOnFunction(llvm::Function&) + 82
14 libLLVM-3.2svn.so 0x00007f395ec3a38d
llvm::FPPassManager::runOnFunction(llvm::Function&) + 331
15 libLLVM-3.2svn.so 0x00007f395ec3a568
llvm::FPPassManager::runOnModule(llvm::Module&) + 86
16 libLLVM-3.2svn.so 0x00007f395ec3a061
llvm::MPPassManager::runOnModule(llvm::Module&) + 381
17 libLLVM-3.2svn.so 0x00007f395ec3b7df
llvm::PassManagerImpl::run(llvm::Module&) + 111
18 libLLVM-3.2svn.so 0x00007f395ec3b841
llvm::PassManager::run(llvm::Module&) + 33
19 llc 0x000000000040e086 main + 2835
20 libc.so.6 0x00007f395d737bc6 __libc_start_main + 230
21 llc 0x000000000040bdb9
Stack dump:
0. Program arguments: llc test.ll.1
1. Running pass 'Function Pass Manager' on module 'test.ll.1'.
2. Running pass 'NVPTX DAG->DAG Pattern Instruction Selection' on
function '@__kernelgen_main'
Aborted

test.ll.1 (3.22 KB)

Looks like a bug in the NVPTXISelLowering.cpp: it has
"setOperationAction(ISD::STORE, MVT::i1, Expand);", but the legalizer
doesn't know how to handle that.

-Eli

Thanks, for insight, Eli,

So instead of setOperationAction(ISD::STORE, MVT::i1, Expand); one
should probably do setOperationAction(ISD::STORE, MVT::i1, Custom);
and implement it in NVPTXTargetLowering::LowerOperation.

But this issue makes a good point about the code efficiency: I suspect
such expansion will be very ugly in terms of performance. Probably we
can do much better if bool would use i32 instead of i1. I don't know
how to do that, though. Is it possible?

Anyway, if this is a defect, then it's a blocker for us, and we'd much
appreciate a fix.

- D.

Hi Dmitry,

So instead of setOperationAction(ISD::STORE, MVT::i1, Expand); one
should probably do setOperationAction(ISD::STORE, MVT::i1, Custom);
and implement it in NVPTXTargetLowering::LowerOperation.

But this issue makes a good point about the code efficiency: I suspect
such expansion will be very ugly in terms of performance. Probably we
can do much better if bool would use i32 instead of i1. I don't know
how to do that, though. Is it possible?

did you declare i1 to be an illegal type? If so, you shouldn't get any
stores of i1 at this stage (you may get trunc stores to i1, but that is
different).

Ciao, Duncan.

Hi Duncan,

did you declare i1 to be an illegal type?

No. How?

Hi Dmitry,

did you declare i1 to be an illegal type?

No. How?

I think it will be considered illegal if you don't add it to any
register class.

Ciao, Duncan.

Hi Duncan,

Sorry I don't understand your point, could you please explain a little bit more?
Why i1 should be declared illegal? Operations on byte-wide types like
char or bool are pretty legal, according to PTX spec:

"Registers may be typed (signed integer, unsigned integer, floating
point, predicate) or untyped. Register size is restricted; aside from
predicate registers which are 1-bit, scalar registers have a width of
8-, 16-, 32-, or 64-bits, and vector registers have a width of 16-,
32-, 64-, or 128-bits. The most common use of 8-bit registers is with
ld, st, and cvt instructions, or as elements of vector tuples."

Thanks,
- D.

Okay, few issues here:

First, i1 is used in the NVPTX back-end to map to the predicate (.pred) type. We definitely do not want to declare this type as illegal. The real issue is lack of complete support for this type. The PTX language places restrictions on what can be done with .pred registers, and it looks like the failure is here:

kernelgen_hostcall.exit228: ; preds = %while.cond.i226
store i1 false, i1 addrspace(1)* undef, align 8

Ignoring for a second that you’re storing to an undefined address (???), the back-end does not yet handle up-casting an i1 to an appropriate type for storage. The memory space is not bit-addressable, so a direct store of an i1 does not make sense. In the short term, I would recommend that you manually zext from/to i8 and load/store those.

Justin,

Thank you,

It is undefined address (???) only in reduced test case and was defined in original big one in first message of this thread.

  • Dima.

Okay, few issues here:

First, i1 is used in the NVPTX back-end to map to the predicate (.pred) type. We definitely do not want to declare this type as illegal. The real issue is lack of complete support for this type. The PTX language places restrictions on what can be done with .pred registers, and it looks like the failure is here:

kernelgen_hostcall.exit228: ; preds = %while.cond.i226
  store i1 false, i1 addrspace(1)* undef, align 8

Ignoring for a second that you're storing to an undefined address (???), the back-end does not yet handle up-casting an i1 to an appropriate type for storage.
[Villmow, Micah] We've seen this to from some weird OpenCL code, in our case it was the result of storing to a NULL pointer.
The memory space is not bit-addressable, so a direct store of an i1 does not make sense. In the short term, I would recommend that you manually zext from/to i8 and load/store those.

Hi Duncan,

Sorry I don't understand your point, could you please explain a little bit more?
Why i1 should be declared illegal? Operations on byte-wide types like
char or bool are pretty legal, according to PTX spec:

"Registers may be typed (signed integer, unsigned integer, floating
point, predicate) or untyped. Register size is restricted; aside from
predicate registers which are 1-bit, scalar registers have a width of
8-, 16-, 32-, or 64-bits, and vector registers have a width of 16-,
32-, 64-, or 128-bits. The most common use of 8-bit registers is with
ld, st, and cvt instructions, or as elements of vector tuples."

Thanks,
- D.

2012/6/30 Duncan Sands <baldrick@free.fr<mailto:baldrick@free.fr>>:

In our (NVIDIA’s) NVVM IR spec, we define i1 having a memory size of 8 bit.

setOperationAction(ISD::LOAD, MVT::i1, Custom);

setOperationAction(ISD::STORE, MVT::i1, Custom);

is the right way to go.

Yuan

OK, thanks.

For our project I implemented a similar workaround: extend each i1
memory item to i8 and load/store i1 to i8 with a type cast. Still, the
issue in NVPTX remains. I don't know whether NVIDIA or community
fellows have any reasonable priority to fix it (or at least put an NYI
assertion!). It seems to be quite more complex, than implementing
custom lowering handler, that's why I'm not trying myself. So for now
I filled a bug, just for record:
http://llvm.org/bugs/show_bug.cgi?id=13291

- Dima.

OK, thanks.

For our project I implemented a similar workaround: extend each i1
memory item to i8 and load/store i1 to i8 with a type cast. Still, the
issue in NVPTX remains. I don’t know whether NVIDIA or community
fellows have any reasonable priority to fix it (or at least put an NYI
assertion!). It seems to be quite more complex, than implementing
custom lowering handler, that’s why I’m not trying myself. So for now
I filled a bug, just for record:
http://llvm.org/bugs/show_bug.cgi?id=13291

Thanks for posting the bug. You’re right that a bit of implementation effort will be required to fix this.