Description
Use the variable location information from the debug info to annotate LLDB’s disassembler (and register read
) output with the location and lifetime of source variables. The rich disassembler output should be exposed as structured data and made available through LLDB’s scripting API so more tooling could be built on top of this. In a terminal, LLDB should render the annotations as text.
Expected outcomes
For example, we could augment the disassembly for the following function
frame #0: 0x0000000100000f80 a.out`main(argc=1, argv=0x00007ff7bfeff1d8) at demo.c:4:10 [opt]
1 void puts(const char*);
2 int main(int argc, char **argv) {
3 for (int i = 0; i < argc; ++i)
→ 4 puts(argv[i]);
5 return 0;
6 }
(lldb) disassemble
a.out`main:
...
0x100000f71 <+17>: movl %edi, %r14d
0x100000f74 <+20>: xorl %r15d, %r15d
0x100000f77 <+23>: nopw (%rax,%rax)
→ 0x100000f80 <+32>: movq (%rbx,%r15,8), %rdi
0x100000f84 <+36>: callq 0x100000f9e ; symbol stub for: puts
0x100000f89 <+41>: incq %r15
0x100000f8c <+44>: cmpq %r15, %r14
0x100000f8f <+47>: jne 0x100000f80 ; <+32> at demo.c:4:10
0x100000f91 <+49>: addq $0x8, %rsp
0x100000f95 <+53>: popq %rbx
...
using the debug information that LLDB also has access to (observe how the source variable i
is in r15
from [0x100000f77+slide))
$ dwarfdump demo.dSYM --name i
demo.dSYM/Contents/Resources/DWARF/demo: file format Mach-O 64-bit x86-64
0x00000076: DW_TAG_variable
DW_AT_location (0x00000098:
[0x0000000100000f60, 0x0000000100000f77): DW_OP_consts +0, DW_OP_stack_value
[0x0000000100000f77, 0x0000000100000f91): DW_OP_reg15 R15)
DW_AT_name ("i")
DW_AT_decl_file ("/tmp/t.c")
DW_AT_decl_line (3)
DW_AT_type (0x000000b2 "int")
to produce output like this, where we annotate when a variable is live and what its location is:
(lldb) disassemble
a.out`main:
... ; i=0
0x100000f74 <+20>: xorl %r15d, %r15d ; i=r15
0x100000f77 <+23>: nopw (%rax,%rax) ; |
→ 0x100000f80 <+32>: movq (%rbx,%r15,8), %rdi ; |
0x100000f84 <+36>: callq 0x100000f9e ; symbol stub for: puts ; |
0x100000f89 <+41>: incq %r15 ; |
0x100000f8c <+44>: cmpq %r15, %r14 ; |
0x100000f8f <+47>: jne 0x100000f80 ; <+32> at t.c:4:10 ; |
0x100000f91 <+49>: addq $0x8, %rsp ; i=undef
0x100000f95 <+53>: popq %rbx
The goal would be to produce output like this for a subset of unambiguous cases, for example, variables that are constant or fully in registers.
Confirmed mentors and their contacts
- @adrian.prantl (primary contact)
- @JDevlieghere
Required / desired skills
Required:
- Good understanding of C++
- Familiarity with using a debugger on the terminal
- Need to be familiar with all the concepts mentioned in the example above
- Need to have a good understanding of at least one assembler dialect for machine code (x86_64 or AArch64).
Desired:
- Compiler knowledge including data flow and control flow analysis is a plus.
- Being able to navigate debug information (DWARF) is a plus.
Size of the project.
medium (~175h)
An easy, medium or hard rating if possible
hard