Some Basic LLDB Usage Questions

I am in the middle of a project where I have built a domain-specific-language (DSL) front-end for trading simulation that uses LLVM to JIT the DSL code and link it into precompiled LLVM bitcode that comes from a C++ simulation engine compiled with clang++.

I need to implement some sort of debugger to let users of the trading simulation environment set breakpoints, see local variables defined in the DSL, global variables etc. It needs to be real lightweight and simple as the users are not experienced programmers. They are generally just traders with some basic knowledge of trading algorithms. I'm going to provide a Cocoa-based GUI and want to use the LLDB API to get the appropriate data from the users.

It seems to me after a perusal of the lldb C++ API's that it is likely that lldb will suit my purposes. There are a couple of important open issues:

1) What does the SB stand for? I'm guessing that there was once a code name for the project of SourceBug? Am I close?

2) Can I easily debug JIT'd code with LLDB?

3) Most of the time, users will run unoptimized code while debugging their algorithms and then they'll want fully optimized code while running large tests that might take several hours or days. In the case of the unoptimized quick test, the turnaround is what is important, how quickly can I get a test started. This seems like it will be dependent on how quickly the debugger can startup and load the symbols. Any benchmarks startup times for reasonable sized projects? Say for clang?

4) I know that lldb just made its public debut, so the documentation is a bit sparse. While I'm working on my project, I will be learning how to use lldb without the benefit of the documentation to come. I'm willing to help out on documentation if that makes sense but I don't want to duplicate ongoing work. Does my use of LLDB seem common enough that a tutorial would be helpful once I've learned how to make it work?

5) I need to segregate out much of the debugging information to hide variables and class members from the C++ code that aren't visible to the DSL, I also want to hide a lot of technical detail. Will this be possible? It seems like it ought to be fairly easy but I might be missing something.

Thanks in advance,

Curtis

I am in the middle of a project where I have built a domain-specific-language (DSL) front-end for trading simulation that uses LLVM to JIT the DSL code and link it into precompiled LLVM bitcode that comes from a C++ simulation engine compiled with clang++.

I need to implement some sort of debugger to let users of the trading simulation environment set breakpoints, see local variables defined in the DSL, global variables etc. It needs to be real lightweight and simple as the users are not experienced programmers. They are generally just traders with some basic knowledge of trading algorithms. I'm going to provide a Cocoa-based GUI and want to use the LLDB API to get the appropriate data from the users.

It seems to me after a perusal of the lldb C++ API's that it is likely that lldb will suit my purposes. There are a couple of important open issues:

1) What does the SB stand for? I'm guessing that there was once a code name for the project of SourceBug? Am I close?

No, this stands for Script Bridging. This "SB" prefix will soon be coming off (off of the class names, no file renaming) as it is the public interface to LLDB. It was initially put on just to make the class names different between the public (lldb::SBTarget) and the private (lldb_private::Target). I do plan on getting rid of this soon.

2) Can I easily debug JIT'd code with LLDB?

Are you running JIT'd code within another process that can be debugged (i.e. the simulation binary)? If so, you might want to have LLVM generate a full blown dylib, not just a JIT'd chunk of code and load the dylib using the standard shared library load/unload calls. Why? This gets you around the fact that JIT'd code isn't recognized by the dynamic linker on most systems. /usr/bin/dyld on Mac OS X has some provisions for JIT'd code, but nothing that gives debuggers visibility into that code. So going the dylib route has a few benefits:
- code will work with any debugger (gdb and lldb) in a standard kind of way
- dylib's can have debug information generated in the standard way
- no need to manage memory chunks for JIT'd code

3) Most of the time, users will run unoptimized code while debugging their algorithms and then they'll want fully optimized code while running large tests that might take several hours or days. In the case of the unoptimized quick test, the turnaround is what is important, how quickly can I get a test started. This seems like it will be dependent on how quickly the debugger can startup and load the symbols. Any benchmarks startup times for reasonable sized projects? Say for clang?

Startup times are VERY quick (magnitudes faster than gdb), _but_ it also depends on what you ask LLDB to do, and how targetted those requests are. GDB does an initial index of all shared libraries regardless if you ever ask it to do anything with those indexes. LLDB will generate the indexes lazily. Also when asking to set a breakpoint, if you know which shared library the breakpoint is in, LLDB can quickly index that shared library only (and not all shared libraries). So it really does depend on what you ask LLDB to do. For example:

(lldb) breakpoint set --name main

will cause all shared libraries to be indexed (once only of course). If you have a lot of debug information then the indexing can take some time. The main reason for this is we don't trust the ".debug_pubnames" section as it it useless for debuggers. Why? .debug_pubnames is an accelerator table that shows only functions that are externally visible in a program. This means all static functions and data, and any functions that are hidden in a shared library won't be in the list. When setting a breakpoint at a function by name, users don't really care if a function is externally visible. GDB happily uses the pre-generated index and can be quicker than LLDB when parsing the initial information.

If you know where your breakpoint, you can specify the share library

(lldb) breakpoint set --shlib a.out --name main

So overall the startup speed depends on how many breakpoints you want to set and how targetted you can be when setting those breakpoints.

4) I know that lldb just made its public debut, so the documentation is a bit sparse. While I'm working on my project, I will be learning how to use lldb without the benefit of the documentation to come. I'm willing to help out on documentation if that makes sense but I don't want to duplicate ongoing work. Does my use of LLDB seem common enough that a tutorial would be helpful once I've learned how to make it work?

There are a lot of header files that are documented by header doc that can be parsed by Doxygen:

http://www.stack.nl/~dimitri/doxygen/

This will help with some documentation for some of the classes.

Anything that you can write up and send along to this list can be incorporated into the website for everyone else to use in the future, so feel free to send along anything you come up with. And feel free to ask as many questions as you need to when you don't understand something on this mailing list.

5) I need to segregate out much of the debugging information to hide variables and class members from the C++ code that aren't visible to the DSL, I also want to hide a lot of technical detail. Will this be possible? It seems like it ought to be fairly easy but I might be missing something.

This is easy enough on Mac OS X. Just don't generate debug info for what you want to be hidden.

Greg Clayton