Documentation

Hello everybody,

I’m an undergrad student of computer science doing some research in debuggers. I’ve read J.B. Rosenberg’s “How Debuggers Work”, the GDB Internals manual and went through the documentation on LLDB’s website and now I’m craving for more. I’d greatly appreciate if you’d let me know about other relevant documentation or anything else you could suggest as a reading.

Best regards
Andreas

Glad to hear this!

The one other bit of source material you might want to dig into is the DWARF standard, figuring out how debug information is represented in some detail will give you a sense of some of the complexities on the symbol side of the debugger. Plus if you actually start poking around in actual debuggers you're going to need to know this. For everything but the Microsoft tools, DWARF is the de facto standard, so getting familiar with how it works will be a big help.

Other than that, probably the best thing to do is to start reading the source code. There's only so much theoretical work you can do, then you just have to get down to the details...

The trick is to find some clue that you can follow into the daunting mass of code. One way would be to pick some task, setting breakpoints, handling shared library loads, function calling in the inferior, etc, and start to follow how it is done, walking through the code in the debugger to get a general outline, then read around for more details. Another really good way to get started is to think as a user of the debugger what features you'd like to see added or fixed, and then use that as your lead to start into the code, figuring out how to fix or add whatever you've decided to focus on.

Jim

Andreas,

I am glad you are interested in finding out more. Unfortunately there aren't a ton of other documents you can read. The one thing I can do is point out what LLDB does different from other typical debuggers.

1 - Types are converted from debug info back into correct clang types.

Most debuggers tend to make up their own internal type representation that is more geared toward how the information is represented in the debug info and also to how the debugger's expression parser will want to us that type information.

LLDB converts DWARF back into real clang types. It currently makes an AST context per module (executable, shared library, or other loadable code container) and lazily populates the AST as needed by expressions.

2 - We use the compiler as our expression parser

Because we convert all of our types into clang types, we can use clang as our expression parser. LLDB just pretends to be a precompiled header when the compiler is parsing an expression and we can answer all of the precompiled header queries for information based on the current execution context (which frame we have selected in which thread a specific process). The other benefit of this approach is if clang adds C++Ox support, we already have support for it in our expression parser just by updating to the latest version of clang. It also allows us to support any feature currently supported by clang. Other debuggers must add new runtime features or modify their expression parsers each time a new language feature is added. This also allows us to have expression local variables:

(lldb) expression
for (int i=0; i<10; i++)
  (int)printf("i = %i\n", i);

i = 0
i = 1
i = 2
i = 3
i = 4
i = 5
i = 6
i = 7
i = 8
i = 9
(lldb)

Note that "i" here is an expression local. If you had code like:

  int main ()
  {
      int i=0;
-> return 0;
  }

And you were stopped at the "return 0;" statement, and evaluated the above multi-line expression, it is as if you added a new lexical block scope:

  int main ()
  {
      int i=0;
      {
        for (int i=0; i<10; i++)
          (int)printf("i = %i\n", i);
      }
-> return 0;
  }

3 - The expression parser can JIT the code you have for expressions

We use clang to JIT the results of expressions and runs them locally down on in the process you are debugging. We also let clang handle all of the ABI issues when it comes to calling functions which is typically a place where debuggers can also mess up. For example if you have an expression like:

(lldb) expr c4 = complex_add (c1, c2) + c3

We actually JIT up a function that takes a single "data *" as a parameter (which is easy for debuggers to figure out how to call) and we define data as:

struct data
{
   complex c1; // variable used as first arg to complex_add() function
   complex c2; // variable used as second arg to complex_add() function
   complex c3; // variable to add to result of complex_add() function
   complex result; // result of the expression (which will be "c4")
}

Then we JIT up a function

complex
$___lldb_expr (data *data_ptr)
{
    data_ptr->result = complex_add (data_ptr->c1, data_ptr->c2) + data_ptr->c3;
}

Why is this important? Becuase now none of our debugger plug-ins need to know how and where to put arguments to functions. We let clang handle the current ABI issues and let it place the variables in which registers or on the stack as needed, including dealing with the return type from functions. This keeps the debugger from being in the business of having to know the current ABI for the current target (a big source of bugs in debugger expression parsers).

4 - JIT'ed code can be used for more complete expression validation

We write our own helper functions that we JIT up and can copy into the process we are debugging. We can post process the Intermediate Representation (IR) we get after we compile an expression and put extra checks into your expressions. So for an expression like:

(lldb) expr 2 + pt_ptr->x + pt2_ptr->y

We can actually rewrite this expression to use our "void *pointer_validation(void *)" function so we would actually run:

2 + pointer_validation (pt_ptr)->x + pointer_validation (pt2_ptr)->y

And if either "pt_ptr" or "pt2_ptr" was invalid, we can stop the epxression early and let the user know that a pointer was invalid. This can help to detect issues, escpecially when a bad pointer might point to memory just before valid memory and the field access could actually put you back into valid memory.

5 - LLDB parses debug information lazily.

Many debuggers have a lot of different approaches to how they parse debug info. GDB tends to parse everything a compile unit at a time. LLDB will parse only what it needs as it needs it. If you only touch one function in a compile unit with 100 functions, we will have parse only the function and the types needed for that one function. This can help save on memory footprint.

6 - LLDB can run multiple debug sessions simultaneously:

(lldb) target create /tmp/server.exe
(lldb) breakpoint set --name main
(lldb) run
Process 1000 launched: '/tmp/server.exe' (x86_64)
...
(lldb) target create /tmp/client.exe
(lldb) breakpoint set --name main
(lldb) run
Process 1001 launched: '/tmp/client.exe' (x86_64)
...
(lldb) target list
Current targets:
  target #0: /tmp/server.exe ( arch=x86_64-apple-darwin, platform=localhost, pid=1000, state=stopped )
* target #1: /tmp/client.exe ( arch=x86_64-apple-darwin, platform=localhost, pid=1001, state=stopped )
(lldb) target select 0
(lldb) run
(lldb) target select 1
(lldb) run

LLDB can also run binaries for different architectures from the same debugger so you could debug a local server and a remote client for a different architecture on a remote machine in the same session.

7 - LLDB is build around plug-ins

This means no matter what you are debugging, you always have access to other plug-ins for differnet architectures. So you can use any of the supported disassemblers from any target. Below we create a x86_64 target and debug it, and we can disassemble using the ARM disassembler on a x86_64 memory

(lldb) target create /tmp/arm-compiler-on-x86_64
(lldb) breakpoint set --name main
(lldb) run
Process 1000 launched: '/tmp/server.exe' (x86_64)
(lldb) disasemble --arch armv7 --count 32 0x12020300

GDB only has disassemblers for the currently built binary inside of it and can cross disassemble.

There are many more important architectural differences, but I believe that I have outlined the important big differences above.

Greg Clayton