compiling CUDA w/ -fdebug-default-version=5 generates invalid PTX

Hi, Alexey!

I’ve ran into an odd case with debug info generation in NVPTX.

Reproduction:

Hi Artem, will check this tomorrow.

Best regards,
Alexey Bataev

Hi Artem, here is what I found about this.

These labels are emitted only if DWARF 4 or 5 is used. They are required for emission of the DW_AT_call_site attribute. The info about callsites also emitted for lineinfo emission with debug info for profiling like in your example (-O1 -gmlt).

Call/CallUni instruction is treated as a separate instruction by the debug info generator and it treats it as a separate call and emits labels for it. You can try to mark the CallUni instructions as hasDelaySlot = 1 or stop treating it as a call instruction in NVPTXInstrInfo.td. Can’t say which one is better/correct. Looks like the representation for call/call.uni is not quite compatible with the debug info

Hi Artem, here is what I found about this.

These labels are emitted only if DWARF 4 or 5 is used. They are required for emission of the DW_AT_call_site attribute. The info about callsites also emitted for lineinfo emission with debug info for profiling like in your example (-O1 -gmlt).

Thank you for looking into this.

Call/CallUni instruction is treated as a separate instruction by the debug info generator and it treats it as a separate call and emits labels for it.

Would it help if we were to fold the whole call.uni into a single line? It appears that swarf injects the label after the call instruction line gets printed.

What is the label supposed to point at? At the call instruction itself? Or at the return point?

You can try to mark the CallUni instructions as hasDelaySlot = 1 or stop treating it as a call instruction in NVPTXInstrInfo.td. Can’t say which one is better/correct. Looks like the representation for call/call.uni is not quite compatible with the debug info

David suggested not allowing -fdebug-default-version=5 to override the DWARF version provided by NVPTX back-end. That would make sense, considering that we can’t handle the newer DWARF versions anyways.

Let me see if I can fix the lineinfo generation first.

–Artem

If it would be a single instruction - yes, it should fix it. Actually, it may insert the label before (for tail calls) and after a call. The labels just point on the start and the end of function call, i.e. it must be inserted before call (for tail calls) and right after call. Yes, that’s a good idea. If we need to change it we still can use LLVM option to override the DWARF version.

Hi Artem, here is what I found about this.

These labels are emitted only if DWARF 4 or 5 is used. They are required for emission of the DW_AT_call_site attribute. The info about callsites also emitted for lineinfo emission with debug info for profiling like in your example (-O1 -gmlt).

Thank you for looking into this.

Call/CallUni instruction is treated as a separate instruction by the debug info generator and it treats it as a separate call and emits labels for it.

Would it help if we were to fold the whole call.uni into a single line? It appears that swarf injects the label after the call instruction line gets printed.

What is the label supposed to point at? At the call instruction itself? Or at the return point?

I believe it’s immediately after the call.

Testing an x86 example:

$ cat test.c

void f1();

void f2();

void f3() {

f1();

f2();

}

$ clang-tot -gdwarf-5 test.c -c -O3 && llvm-dwarfdump-tot test.o | grep “DW_TAG|call” && llvm-objdump -d test.o

0x0000000c: DW_TAG_compile_unit

0x00000023: DW_TAG_subprogram

DW_AT**call**all_calls (true)

0x0000002e: **DW_TAG_call_**site

DW_AT**call**origin (0x0000003b)

DW_AT**call**return_pc (0x0000000000000008)

0x00000034: **DW_TAG_call_**site

DW_AT**call**origin (0x00000041)

DW_AT**call**tail_call (true)

DW_AT**call**pc (0x000000000000000b)

0x0000003b: DW_TAG_subprogram

0x0000003f: DW_TAG_unspecified_parameters

0x00000041: DW_TAG_subprogram

0x00000045: DW_TAG_unspecified_parameters

test.o: file format elf64-x86-64

Disassembly of section .text:

0000000000000000 :

0: 50 pushq %rax

1: 31 c0 xorl %eax, %eax

3: e8 00 00 00 00 callq 0x8 <f3+0x8>

8: 31 c0 xorl %eax, %eax

b: e9 00 00 00 00 jmp 0x10 <f3+0x10>

But the tail call uses the jump location, because it can’t do anything else.

You can try to mark the CallUni instructions as hasDelaySlot = 1 or stop treating it as a call instruction in NVPTXInstrInfo.td. Can’t say which one is better/correct. Looks like the representation for call/call.uni is not quite compatible with the debug info

David suggested not allowing -fdebug-default-version=5 to override the DWARF version provided by NVPTX back-end. That would make sense, considering that we can’t handle the newer DWARF versions anyways.

Let me see if I can fix the lineinfo generation first.

Yeah, -gdwarf-5 and -fdebug-default-version=5 -g should behave the same. So whatever it is that’s disabling/downgrading to DWARFv2 for NVPTX when the user uses -gdwarf-5 shoudl do the same for -fdebug-default-version=5