Rich Disassembler for LLDB

Description

Use the variable location information from the debug info to annotate LLDB’s disassembler (and register read) output with the location and lifetime of source variables. The rich disassembler output should be exposed as structured data and made available through LLDB’s scripting API so more tooling could be built on top of this. In a terminal, LLDB should render the annotations as text.

Expected outcomes

For example, we could augment the disassembly for the following function

frame #0: 0x0000000100000f80 a.out`main(argc=1, argv=0x00007ff7bfeff1d8) at demo.c:4:10 [opt]
  1   void puts(const char*);
  2   int main(int argc, char **argv) {
  3    for (int i = 0; i < argc; ++i)
→ 4      puts(argv[i]);
  5    return 0;
  6   }
(lldb) disassemble
a.out`main:
...
  0x100000f71 <+17>: movl  %edi, %r14d
  0x100000f74 <+20>: xorl  %r15d, %r15d
  0x100000f77 <+23>: nopw  (%rax,%rax)
→  0x100000f80 <+32>: movq  (%rbx,%r15,8), %rdi
  0x100000f84 <+36>: callq 0x100000f9e ; symbol stub for: puts
  0x100000f89 <+41>: incq  %r15
  0x100000f8c <+44>: cmpq  %r15, %r14
  0x100000f8f <+47>: jne 0x100000f80 ; <+32> at demo.c:4:10
  0x100000f91 <+49>: addq  $0x8, %rsp
  0x100000f95 <+53>: popq  %rbx
...

using the debug information that LLDB also has access to (observe how the source variable i is in r15 from [0x100000f77+slide))

$ dwarfdump demo.dSYM --name  i 
demo.dSYM/Contents/Resources/DWARF/demo: file format Mach-O 64-bit x86-64
0x00000076: DW_TAG_variable
 DW_AT_location (0x00000098: 
 [0x0000000100000f60, 0x0000000100000f77): DW_OP_consts +0, DW_OP_stack_value
 [0x0000000100000f77, 0x0000000100000f91): DW_OP_reg15 R15)
 DW_AT_name ("i")
 DW_AT_decl_file ("/tmp/t.c")
 DW_AT_decl_line (3)
 DW_AT_type (0x000000b2 "int")

to produce output like this, where we annotate when a variable is live and what its location is:

(lldb) disassemble
a.out`main:
...                                                               ; i=0
  0x100000f74 <+20>: xorl  %r15d, %r15d                           ; i=r15
  0x100000f77 <+23>: nopw  (%rax,%rax)                            ; |
→  0x100000f80 <+32>: movq  (%rbx,%r15,8), %rdi                   ; |
  0x100000f84 <+36>: callq 0x100000f9e ; symbol stub for: puts    ; |
  0x100000f89 <+41>: incq  %r15                                   ; |
  0x100000f8c <+44>: cmpq  %r15, %r14                             ; |
  0x100000f8f <+47>: jne 0x100000f80 ; <+32> at t.c:4:10          ; |
  0x100000f91 <+49>: addq  $0x8, %rsp                             ; i=undef
  0x100000f95 <+53>: popq  %rbx

The goal would be to produce output like this for a subset of unambiguous cases, for example, variables that are constant or fully in registers.

Confirmed mentors and their contacts

Required / desired skills

Required:

  • Good understanding of C++
  • Familiarity with using a debugger on the terminal
  • Need to be familiar with all the concepts mentioned in the example above
  • Need to have a good understanding of at least one assembler dialect for machine code (x86_64 or AArch64).

Desired:

  • Compiler knowledge including data flow and control flow analysis is a plus.
  • Being able to navigate debug information (DWARF) is a plus.

Size of the project.

medium (~175h)

An easy, medium or hard rating if possible

hard

4 Likes

Dear @adrian.prantl ,

My name is Sahil Patidar, and I am excited about the opportunity to contribute to the LLVM project. I have a strong background in C++ and am familiar with using LLDB, as well as having knowledge of x86 architecture. My experience includes successfully raising Pull Requests (PRs) in the LLVM project, which has furthered my understanding of the codebase and its development process.

If there are specific challenges or areas of focus you recommend, please let me know; I want to ensure I align my efforts with the project’s goals.

Thank you for your time and consideration. I look forward to the possibility of working together.

Best regards,
Sahil Patidar

Hi there, I’m new to GSoc and would want to work on this project. I’m not sure how the mentoring would work? Do we exchange on discourse to work on the problem together or after we get accepted? I’m a junior at berkeley and am taking a class in compilers this semester. We’re building Chocopy, a statically-typed Python compiler and emits RISC-V assembly code. I’m also following the LLVM Kaleidoscope right now and have some experience contributing to a C++ repository CXXGRAPH. Thank you!

The formal mentorship part of GSoC would start once a candidate & project is accepted. That said, the discourse community is definitely the right place to ask questions about the inner workings of LLVM and LLDB at any time!

1 Like

I believe this refers to a method call like function.GetRichDisassemblyInfo(), but I couldn’t find any reference to SBDisassembler in the LLDB. I assume this means that the functionality needs to be implemented. Could you please provide more explanation or guidance on this?

The API would have to be designed (in coordination with the LLDB community) as part of the project. Right now the API is very basic (LLDB: lldb::SBFrame Class Reference) and merely returns string. We would probably want to add a richer API that, for example returns SBStructuredData with a to-be-designed scheme instead.

1 Like

Seems pretty manageable to me.

I’m a junior student and I’ve been working on x86-64 and aarch64 throughout my college life and also been pretty familiar to using debugger through command line. I’ve studied compiler design last year and also have knowledge about control flow analysis and data flow analysis But I’ve not been skimmed through the LLVM codebase and I guess I’d better check that out and it’d be a great start to get familiar with the LLVM code base. My first intention is to collect the static analysis information through some methods and recontruct them along with DWARF debugging information.

A point of reference could also be this existing llvm-objdump feature --debug-vars (the output is wide, make sure to scroll to the right).

$ cat /tmp/test.c
int foo(int a, int b) {
  return a * b;
}

$ ./bin/llvm-objdump -d /tmp/test.o --debug-vars

/tmp/test.o:    file format elf64-littleaarch64

Disassembly of section .text:

0000000000000000 <foo>:
                                                                            ┠─ a = <unknown op DW_OP_fbreg (145)>
                                                                            ┃ ┠─ b = <unknown op DW_OP_fbreg (145)>
       0: d10043ff      sub     sp, sp, #0x10                               ┃ ┃
       4: b9000fe0      str     w0, [sp, #0xc]                              ┃ ┃
       8: b9000be1      str     w1, [sp, #0x8]                              ┃ ┃
       c: b9400fe1      ldr     w1, [sp, #0xc]                              ┃ ┃
      10: b9400be0      ldr     w0, [sp, #0x8]                              ┃ ┃
      14: 1b007c20      mul     w0, w1, w0                                  ┃ ┃
      18: 910043ff      add     sp, sp, #0x10                               ┃ ┃
      1c: d65f03c0      ret                                                 ┻ ┻
1 Like

A point of reference could also be this existing llvm-objdump feature --debug-vars (the output is wide, make sure to scroll to the right).

Good point, David. If we can find a way to factor out some common infrastructure that would be a plus, but it’s not a requirement. You’d need to expose a surprising amount of LLDB-specific functionality (variable lookup, ranges, DWARF expression evaluation, …) in a common interface at the LLVM level and that may not end up being a net win. I’d be happy to be proven wrong though!

Can this be a community effort and not exactly something required to be done by a single student? From what I can see now, the project is interesting but there is no single assignee to it.

Unfortunately we received fewer GSoC slots this year than we had project proposals for, and this project did not make it into the final selection. If anyone wants to work on this on their own time, we’d be happy to provide some guidance.

@ToolmanP Are you interested? We could work together on this.

Hi @adrian.prantl,

I hope this message finds you well.

I would like to work on this project but I will need guidance as I am new to the project and still a student. I would appreciate if you could provide guidance and milestones throughout the project, hopefully at GSOC level, But if your time doesn’t permit I’ll try to not take much from your precious time.

As a start, Could you please give me a qualification task, something beginner friendly, to get familiar with “lldb” code base and the way of sending patches? I searched issues with “lldb” and “good first issue” label but couldn’t find something suitable for me.

After completing the qualification task, I think I will be more ready to discuss the project tasks and milestones in detail.

Thanks,
AbdAlRahman Gad

Always happy to welcome more people in the community!
The first task would be to build and successfully run the entire LLDB testsuite in your environment :slight_smile:

Potentially a good beginner issue could be more patches in the line of Change GetChildCompilerTypeAtIndex to return Expected (NFC) by adrian-prantl · Pull Request #92979 · llvm/llvm-project · GitHub.
We have lots of APIs in LLDB that return a result and take an Status object by reference, and it would be highly beneficial to convert these over to llvm::Expected, which forces us to check the error at the call site. These are pretty straightforward refactorings, but it needs some thinking about how to appropriately handle the errors in each case. For example ValueObject::AddressOf() has only a handful of uses, so it could be a good first patch to discuss.

2 Likes

Thanks for your time!
I’ve just built the project and successfully ran the LLDB testsuite (took more than expected). I’ll start working on the task and try to send it ASAP.