Introduction
LLVM supports multiple debug information formats (namely DWARF and CodeView) in different binary formats (e.g. ELF, PDB, Mach-O). Understanding the mappings between source code and debug information can be complex, and it is a problem we have commonly encountered when triaging debug information issues.
The output from tools such as llvm-dwarfdump, llvm-readobj or llvm-pdbutil use a close representation of the internal debug information format and in our experience, we have found that they require a good knowledge of those formats to understand the output, limiting who can triage and address such issues quickly. Even for the experts, it can sometimes take a lot of time and effort to triage issues due to the inherent complexity.
llvm-dva
At Sony, we have been developing an LLVM-based debug information analysis tool which we have called llvm-dva (short for LLVM debug information visual analyzer), designed to visualize these mappings. It’s based entirely on the existing LLVM libraries for debug info parsing, target support, etc. and at this stage we believe that its proven its worth internally to the point where we would like to propose upstreaming it as part of the mainline LLVM project alongside existing tools such as llvm-dwarfdump.
llvm-dva is a command line tool that process debug info contained in a binary file and produces a debug information format agnostic “Logical View”, which is a high-level semantic representation of the debug info, independent of the low-level format.
The logical view is composed of the tradition programming elements as: scopes, types, symbols, lines. These elements can display additional information, such as variable coverage factor, lexical block level, disassembly code, code ranges, etc.
The diversity of llvm-dva command line options enables the creation of very rich logical views to include more low-level debug information: disassembly code associated with the debug lines, variables runtime location and coverage, internal offsets for the elements within the binary file, etc.
With llvm-dva, we aim to address the following points:
- Which variables are dropped due to optimization?
- Why I cannot stop at a particular line?
- Which lines are associated to a specific code range?
- Does the debug information represent the original source?
- What is the semantic difference between the debug info generated by different toolchain versions?
Printing Mode
In this mode llvm-diva prints the logical view or portions of it, based on criteria patterns (including regular expressions) to select the kind of logical elements to be included in the output.
The below example is used to show different output generated by llvm-diva. We then compiled it for an x86 ELF target with a recent version of clang (-O0 -g):
1 using INTPTR = const int *;
2 int foo(INTPTR ParamPtr, unsigned ParamUnsigned, bool ParamBool) {
3 if (ParamBool) {
4 typedef int INTEGER;
5 const INTEGER CONSTANT = 7;
6 return CONSTANT;
7 }
8 return ParamUnsigned;
9 }
Print basic details
The following command prints basic details for all logical elements sorted by the debug information internal offset; it includes its lexical level.
llvm-dva --attribute=level,format
--output-sort=offset
--print=scopes,symbols,types,lines,instructions
test-dwarf-clang.o
Each row represents an element that is present within the debug information. The first column represents the scope level, followed by the associated line number (if any), and finally the description of the element.
Logical View:
[000] {File} 'test-dwarf-clang.o' -> elf64-x86-64
[001] {CompileUnit} 'test.cpp'
[002] 2 {Function} extern not_inlined 'foo' -> 'int'
[003] 2 {Parameter} 'ParamPtr' -> 'INTPTR'
[003] 2 {Parameter} 'ParamUnsigned' -> 'unsigned int'
[003] 2 {Parameter} 'ParamBool' -> 'bool'
[003] {Block}
[004] 5 {Variable} 'CONSTANT' -> 'const INTEGER'
[004] 5 {Line}
[004] {Code} 'movl $0x7, -0x1c(%rbp)'
[004] 6 {Line}
[004] {Code} 'movl $0x7, -0x4(%rbp)'
[004] {Code} 'jmp 0x6'
[004] 8 {Line}
[004] {Code} 'movl -0x14(%rbp), %eax'
[003] 4 {TypeAlias} 'INTEGER' -> 'int'
[003] 2 {Line}
[003] {Code} 'pushq %rbp'
[003] {Code} 'movq %rsp, %rbp'
[003] {Code} 'movb %dl, %al'
[003] {Code} 'movq %rdi, -0x10(%rbp)'
[003] {Code} 'movl %esi, -0x14(%rbp)'
[003] {Code} 'andb $0x1, %al'
[003] {Code} 'movb %al, -0x15(%rbp)'
[003] 3 {Line}
[003] {Code} 'testb $0x1, -0x15(%rbp)'
[003] {Code} 'je 0x13'
[003] 8 {Line}
[003] {Code} 'movl %eax, -0x4(%rbp)'
[003] 9 {Line}
[003] {Code} 'movl -0x4(%rbp), %eax'
[003] {Code} 'popq %rbp'
[003] {Code} 'retq'
[003] 9 {Line}
[002] 1 {TypeAlias} 'INTPTR' -> '* const int'
On closer inspection, we can see what could be a potential debug issue:
[003] {Block}
[003] 4 {TypeAlias} 'INTEGER' -> 'int'
The ‘INTEGER’ definition is at level [003], the same lexical scope as the anonymous {Block} (‘true’ branch for the ‘if’ statement) whereas in the original source code the typedef statement is clearly inside that block, so the ‘INTEGER’ definition should also be at level [004] inside the block.
Select logical elements
This feature allows selecting specific logical elements; the patterns used as criteria can include regular expressions. The output layout is controlled by the ‘–report’ option to have a tabular report, a tree view showing the parents hierarchy for the logical element that matches the criteria, or just a summary with the number of occurrences.
The following prints all instructions, symbols and types that contain ‘inte’ or ‘movl’ in their names or types, using a tab layout and given the number of matches.
llvm-dva --attribute=level
--select-nocase --select-regex --select=INTe --select=movl
--report=list
--print=symbols,types,instructions,summary
test-dwarf-clang.o
Logical View:
[000] {File} 'test-dwarf-clang.o'
[001] {CompileUnit} 'test.cpp'
[003] {Code} 'movl $0x7, -0x1c(%rbp)'
[003] {Code} 'movl $0x7, -0x4(%rbp)'
[003] {Code} 'movl %eax, -0x4(%rbp)'
[003] {Code} 'movl %esi, -0x14(%rbp)'
[003] {Code} 'movl -0x14(%rbp), %eax'
[003] {Code} 'movl -0x4(%rbp), %eax'
[003] 4 {TypeAlias} 'INTEGER' -> 'int'
[004] 5 {Variable} 'CONSTANT' -> 'const INTEGER'
-----------------------------
Element Total Found
-----------------------------
Scopes 3 0
Symbols 4 1
Types 2 1
Lines 17 6
-----------------------------
Total 26 8
Comparison mode
In this mode llvm-dva compares logical views to produce a report with the logical elements that are missing or added. This a very powerful aid in finding semantic differences in the debug information produced by different toolchain versions or even completely different toolchains altogether (For example a compiler producing DWARF can be directly compared against a completely different compiler that produces CodeView).
Given the previous example we found the above debug information issue (related to the previous invalid scope location for the ‘typedef int INTEGER’) by comparing against another compiler.
Using GCC to generate test-dwarf-gcc.o, we can apply a selection pattern with the printing mode to obtain the following logical view output.
llvm-dva --attribute=level
--select-regex --select-nocase --select=INTe
--report=list
--print=symbols,types
test-dwarf-clang.o test-dwarf-gcc.o
Logical View:
[000] {File} 'test-dwarf-clang.o'
[001] {CompileUnit} 'test.cpp'
[003] 4 {TypeAlias} 'INTEGER' -> 'int'
[004] 5 {Variable} 'CONSTANT' -> 'const INTEGER'
Logical View:
[000] {File} 'test-dwarf-gcc.o'
[001] {CompileUnit} 'test.cpp'
[004] 4 {TypeAlias} 'INTEGER' -> 'int'
[004] 5 {Variable} 'CONSTANT' -> 'const INTEGER'
The output shows that both objects contain the same elements. But the ‘typedef INTEGER’ is located at different scope level. The GCC generated object, shows ‘4’, which is the correct value.
Note that there is no requirement that GCC must produce identical or similar DWARF to Clang to allow the comparison. We are only comparing the semantics. The same case when comparing CodeView debug information generated by MSVC and Clang.
There are 2 comparison methods: logical view and logical elements.
Logical View
It compares the logical view as a whole unit; for a match, each compared logical element must have the same parents and children.
Using the llvm-dva comparison functionality, that issue can be seen in a more global context, that can include the logical view.
The output shows in view form the missing (-), added (+) elements, giving more context by swapping the reference and target object files.
llvm-dva --attribute=level
--compare=types
--report=view
--print=symbols,types
test-dwarf-clang.o test-dwarf-gcc.o
Reference: 'test-dwarf-clang.o'
Target: 'test-dwarf-gcc.o'
Logical View:
[000] {File} 'test-dwarf-clang.o'
[001] {CompileUnit} 'test.cpp'
[002] 1 {TypeAlias} 'INTPTR' -> '* const int'
[002] 2 {Function} extern not_inlined 'foo' -> 'int'
[003] {Block}
[004] 5 {Variable} 'CONSTANT' -> 'const INTEGER'
+[004] 4 {TypeAlias} 'INTEGER' -> 'int'
[003] 2 {Parameter} 'ParamBool' -> 'bool'
[003] 2 {Parameter} 'ParamPtr' -> 'INTPTR'
[003] 2 {Parameter} 'ParamUnsigned' -> 'unsigned int'
-[003] 4 {TypeAlias} 'INTEGER' -> 'int'
The output shows the merging view path (reference and target) with the missing and added elements.
Logical Elements
It compares individual logical elements without considering if their parents are the same. For both comparison methods, the equal criteria include the name, source code location, type, lexical scope level.
llvm-dva --attribute=level
--compare=types
--report=list
--print=symbols,types,summary
test-dwarf-clang.o test-dwarf-gcc.o
Reference: 'test-dwarf-clang.o'
Target: 'test-dwarf-gcc.o'
(1) Missing Types:
-[003] 4 {TypeAlias} 'INTEGER' -> 'int'
(1) Added Types:
+[004] 4 {TypeAlias} 'INTEGER' -> 'int'
----------------------------------------
Element Expected Missing Added
----------------------------------------
Scopes 4 0 0
Symbols 0 0 0
Types 2 1 1
Lines 0 0 0
----------------------------------------
Total 6 1 1
Changing the Reference and Target order:
llvm-dva --attribute=level
--compare=types
--report=list
--print=symbols,types,summary
test-dwarf-gcc.o test-dwarf-clang.o
Reference: 'test-dwarf-gcc.o'
Target: 'test-dwarf-clang.o'
(1) Missing Types:
-[004] 4 {TypeAlias} 'INTEGER' -> 'int'
(1) Added Types:
+[003] 4 {TypeAlias} 'INTEGER' -> 'int'
----------------------------------------
Element Expected Missing Added
----------------------------------------
Scopes 4 0 0
Symbols 0 0 0
Types 2 1 1
Lines 0 0 0
----------------------------------------
Total 6 1 1
As the Reference and Target are switched, the Added Types from the first case now are listed as Missing Types.
LLVM issues
LLVM debug issues found using llvm-dva, in DWARF and CodeView debug formats:
- PR43205 - COFF Debug info shows variable at the wrong lexical scope
- PR43250 - COFF Debug info missing nested enumeration
- PR44229 - Debug information shows incorrect lexical scope for typedef
- PR45706 - [CodeView] Omitted class member function declaration for lambda
- PR45739 - Misssing LF_NESTTYPE with nested templates (structure/class)
- PR45776 - [Codeview] The ‘this’ symbol is not marked as compiler generated
- PR45811 - [DWARF] Does not generate nested enumerations
- PR45867 - [DWARF] Template instance name not consistent with CodeView
Conclusion
We have created the following incrementally depending patches with a very specific functionality:
- Interval tree
- Driver and documentation
- Logical elements
- Locations and ranges
- Select elements
- Warning and internal options
- Compare elements
- ELF Reader
- CodeView Reader
Special thanks to Paul Robinson for his invaluable help by suggesting improvements and reviewing the patches and tool documentation.
Original RFC
https://lists.llvm.org/pipermail/llvm-dev/2020-August/144174.html
Comments and feedback welcome.
Thanks,
Carlos