What should I do to enable DWARF on a new backend?

I am learning LLVM and trying to write a backend for a specific target architecture.

When I compile a hello-world with clang -g, the result of llvm-dwarfdump --verify is as follows.

I traced the process of llvm-dwarfdump with gdb and it looks like the .debug_info section isn’t generated correctly.

I’ve had a look at the documentation relating to LLVM and DWARF. It looks like they are mostly about how to adapt DWARF for a new language on the front end.

I also googled a combination of keywords like LLVM and DWARF and didn’t find anyone with the same problem.

I grep’d the keywords under Target/X86 and Target/AArch64. It doesn’t look like there is any code related to the .debug_info section either.

I’m wondering if I need to write some additional code for a new backend to support DWARF and if so, is there a document that tells me what code I need to write? If not, where should I start to debug this problem?

The error messages from llvm-dwarfdump certainly sound like at least the header isn’t being generated correctly. It might help if you could post a hex dump of the .debug_info section.

Here is part of the result of llvm-objdump -s. I’m not familiar with ELF. Let me know if additional information is needed.

Contents of section .debug_info:
 0000 00000000 05000102 00000000 01000c00  ................
 0010 01000000 00000000 00020123 00000000  ...........#....
 0020 00000002 2d000000 000302a1 00033900  ....-.........9.
 0030 0000043d 00000005 00050306 01060408  ...=............
 0040 07070123 00000002 90230500 02510000  ...#.....#...Q..
 0050 00050605 0400

As a comparison, here are the results for X86_64.

Contents of section .debug_info:
 0000 51000000 05000108 00000000 01000c00  Q...............
 0010 01080000 00000000 00020125 00000008  ...........%....
 0020 00000002 2d000000 000302a1 00033900  ....-.........9.
 0030 0000043d 00000005 00050306 01060408  ...=............
 0040 07070125 00000001 56050002 50000000  ...%....V...P...
 0050 05060504 00

That first word (first 32 bits) is supposed to be the length of the .debug_info for the compilation unit. As you can see, for x86_64 it’s 0x51 but for your case it is 0. That causes llvm-dwarfdump to misinterpret the entire section.

That length is typically computed by the assembler as a label difference, so my first guess is that the labels aren’t being emitted correctly. I suggest comparing the assembly source (compile with -S) and look at how that first word is computed for x86_64, versus your target. Possibly something in your target code isn’t being done in the correct order, or might be missing entirely.

1 Like

Here is the assembly for my target:

.Lcu_begin0:
	.long	.Ldebug_info_end0-.Ldebug_info_start0
.Ldebug_info_start0:
	.short	5
	.byte	1
	.byte	2
	.long	.debug_abbrev
	0x01	.byte	0
	.short	12
	.byte	1
	.long	.Lstr_offsets_base0
	.long	.Lline_table_start0
	.byte	2
	0x01	.long	.Lfunc_end0-.Lfunc_begin0
	.long	.Laddr_table_base0
	0x02	.long	45
	.byte	0
	.byte	3
	0x02	.byte	161
	0x00	0x03	.long	57
	0x04	.long	61
	.byte	5
	.byte	0
	0x05	.byte	3
	.byte	6
	.byte	1
	0x06	.byte	4
	.byte	8
	.byte	7
	0x07	0x01	.long	.Lfunc_end0-.Lfunc_begin0
	0x02	.byte	144
	0x23	.byte	5
	.byte	0
	.byte	2
	.long	81

	0x05	.byte	6
	.byte	5
	.byte	4
	.byte	0
.Ldebug_info_end0:
	.section	.debug_str_offsets,"",@progbits
	.long	32
	.short	5
	.short	0

And here is X86_64:

.Lcu_begin0:
	.long	.Ldebug_info_end0-.Ldebug_info_start0 # Length of Unit
.Ldebug_info_start0:
	.short	5                               # DWARF version number
	.byte	1                               # DWARF Unit Type
	.byte	8                               # Address Size (in bytes)
	.long	.debug_abbrev                   # Offset Into Abbrev. Section
	.byte	1                               # Abbrev [1] 0xc:0x49 DW_TAG_compile_unit
	.byte	0                               # DW_AT_producer
	.short	12                              # DW_AT_language
	.byte	1                               # DW_AT_name
	.long	.Lstr_offsets_base0             # DW_AT_str_offsets_base
	.long	.Lline_table_start0             # DW_AT_stmt_list
	.byte	2                               # DW_AT_comp_dir
	.byte	1                               # DW_AT_low_pc
	.long	.Lfunc_end0-.Lfunc_begin0       # DW_AT_high_pc
	.long	.Laddr_table_base0              # DW_AT_addr_base
	.byte	2                               # Abbrev [2] 0x23:0xa DW_TAG_variable
	.long	45                              # DW_AT_type
	.byte	0                               # DW_AT_decl_file
	.byte	3                               # DW_AT_decl_line
	.byte	2                               # DW_AT_location
	.byte	161
	.byte	0
	.byte	3                               # Abbrev [3] 0x2d:0xc DW_TAG_array_type
	.long	57                              # DW_AT_type
	.byte	4                               # Abbrev [4] 0x32:0x6 DW_TAG_subrange_type
	.long	61                              # DW_AT_type
	.byte	5                               # DW_AT_count
	.byte	0                               # End Of Children Mark
	.byte	5                               # Abbrev [5] 0x39:0x4 DW_TAG_base_type
	.byte	3                               # DW_AT_name
	.byte	6                               # DW_AT_encoding
	.byte	1                               # DW_AT_byte_size
	.byte	6                               # Abbrev [6] 0x3d:0x4 DW_TAG_base_type
	.byte	4                               # DW_AT_name
	.byte	8                               # DW_AT_byte_size
	.byte	7                               # DW_AT_encoding
	.byte	7                               # Abbrev [7] 0x41:0xf DW_TAG_subprogram
	.byte	1                               # DW_AT_low_pc
	.long	.Lfunc_end0-.Lfunc_begin0       # DW_AT_high_pc
	.byte	1                               # DW_AT_frame_base
	.byte	86
	.byte	5                               # DW_AT_name
	.byte	0                               # DW_AT_decl_file
	.byte	2                               # DW_AT_decl_line
	.long	80                              # DW_AT_type
                                        # DW_AT_external
	.byte	5                               # Abbrev [5] 0x50:0x4 DW_TAG_base_type
	.byte	6                               # DW_AT_name
	.byte	5                               # DW_AT_encoding
	.byte	4                               # DW_AT_byte_size
	.byte	0                               # End Of Children Mark
.Ldebug_info_end0:
	.section	.debug_str_offsets,"",@progbits
	.long	32                              # Length of String Offsets Set
	.short	5
	.short	0

Although the assembly code for my target seems odd, it looks like the start and end tags for .debug_info are emitted correctly, as is the unit length .Ldebug_info_end0-.Ldebug_info_start0. It looks like something may have gone wrong converting from assembly to binary.

Yes, the labels and label difference expression look correct in the assembly text. You probably do have some issue in your MC code that is causing the problem. I can’t really help you there.

1 Like

I have solved the problem and here is my journey. In a nutshell, it’s basically fixing problems with the hook functions my partner wrote when introducing the new backend.

First, in LLVMTargetMachine::initAsmInfo, I set Options.MCOptions.AsmVerbose to true. This prints comments in assembly, which helps me to make sure that the workflow up to assembly is correct. Here I found the first problem. My target machine is VLIW architecture. It needs a special assembly format to fit the downstream emulator, which doesn’t take into account the .debug_* sections, so the result is a messy assembly.

BTW, I can’t find any target-dependent code in the assembler that enables AsmVerbose, but it defaults to true on X86, and I don’t know why.

The second problem (perhaps) is with XXAsmInfo. My partner specified CodePointerSize as 2. However, for a 32-bit machine, this number should probably be 4. I suspect this will cause some problems with both assembly generation and parsing.

The third problem is that in XXAsmBackend::applyFixup, my partner provides an XXAsmBackend::adjustFixupValue, modelled on the AArch64 implementation. This function is supposed to return Value itself by default, but is incorrectly written as return 0. This means that all the calculations we provided for Fixup are wasted.

The last problem occurs with the linker. The hook XX::relocate provided by the new backend for relocation filters out the Rel.expr == RelExpr::R_NONE scenario because of a problem I have not yet identified. However, the relocation of the .debug_* sections arrives here via relocateNoSym, and Rel.expr must be RelExpr::R_NONE.

As I said at the beginning, I didn’t need to do anything special to support DWARF on the new backend, all the problems stemmed from the hooks written when the new backend was introduced. I hope this report will help anyone who encounters this problem later.