Is there a just-my-code like debugging mode for LLDB?

Hi lldb-dev,

TL;DR: Has there been any efforts to introduce something like “Just My Code” debugging on LLDB? Debugging on Android would really benefit from this.

Details:

Native Android apps typically have a single .so file from the user, but load ~150 system libraries.

When attaching LLDB remotely to an Android app, a significant amount of time is spent on loading modules for those system libraries, even with a warm LLDB cache that contains a copy of all these libraries.

With a cold LLDB cache, things are much worse, because LLDB copies all those libraries from the device back to the host to populate its cache. While one might think this happens only once for a user, things are a bit worse for Android. There are just too many libraries to copy, making it very slow, there are new Android releases every year, and users typically use multiple devices (e.g., x86, x86_64 emulators, arm32, arm64 devices), and multiple hosts (work, home, laptop/desktop); thereby suffering from this issue more than necessary.

If we can eliminate the latency of loading these modules, we can deliver a much faster debugging startup time. In essence, this can be considered as a form of Just My Code debugging.

Prototype and Experiments

I built a simple prototype that only loads a single user module, and totally avoids loading ~150 system modules. I ran it on my Windows host against an Android emulator to measure the end to end latency of “Connect + Attach + Resume + Hit 1st breakpoint immediately” .

  • For warm LLDB cache:
  • Without just-my-code: 23 seconds
  • With just-my-code: 14 seconds- For cold LLDB cache:
  • Without just-my-code: 120 seconds
  • With just-my-code: 16 seconds

I want to solicit some feedback and gather thoughts around this idea. It would be great if there are any existing alternatives in LLDB to achieve my goal, but otherwise, I can implement this on LLDB and I’d appreciate it if anyone has any advice on how to implement such a feature.

Thanks.
-Emre

On iOS, every time a device that is provisioned for debugging is plugged in, the device management stack checks to see if it knows the OS on the device and if not copies the libraries from the system to the host and puts them in a location that lldb can find. That shouldn’t be a big job if the throughput to the device is decent. Originally this took a couple minutes to process on iOS. That was annoying but except for folks working at Apple who had to update their devices every day it was never a burning issue because you always knew when it was going to happen (Xcode gave you a nice progress bar, etc.) Note, internal folks did complain enough that we eventually got around to looking at why it was so slow and found that almost all of that time was taking the iOS “shared cache” - which is how the libraries exist on the device - and expanding it into shared libraries. This was being done single-threaded, and just doing this concurrently got the time down to 10 or 20 seconds. Given you only do this once per os update on your device, this doesn’t seem to bother people anymore.

Once the shared libraries from the device are available on the lldb host, startup times for running an app to first breakpoint are nowhere near 23 seconds. Since you were quoting times for a simulator, I tried debugging an iOS game app that loads 330 shared libraries at startup. Launching an app from a fresh lldb (from hitting Run in Xcode to hitting a breakpoint in applicationDidFinishLaunching, fetching all the stacks of all the threads and displaying the locals for the current frame as well as calling a bunch of functions in the expression parser to get Queue information for all the threads) took 4-5 seconds. And the warm launch was just a second or two.

So I’m surprised that it takes this long to load on Android. Before we go complicating how lldb handles symbols, it might be worth first figuring out what lldb is doing differently on Android that is causing it to be an order of magnitude slower?

Note, if you are reading the binaries out of memory from the device, and don’t have local symbols, things go much more slowly. gdb-remote is NOT a high bandwidth protocol, and fetching all the symbols through a series of memory reads is pretty slow. lldb does have a setting to control what you do with binaries that don’t exist on the host (target.memory-module-load-level) that controls this behavior. But it just deals with what we do and don’t read and makes no attempt to ameliorate the fallout from having a reduced view of the symbols in the program.

We did add a “debug just my code” mode to gdb back in the day, when we were supporting gdb here at Apple. Basically just a load-level for symbols for libraries whose path matches some pattern. gdb was quite slow to process libraries at that point, and this did speed loading up substantially. It wasn’t that hard to implement, but it had a bunch of fallout. Mainly because even though people think they would like to only debug their own code, they actually venture into system code pretty regularly…

For instance, if you don’t have symbols for libraries, the backtracing becomes unreliable. We had to add code to force load libraries when they show up in backtraces to get reliable unwinding, which generally meant you had to restart the unwind when you found an unloaded library.

People also commonly want to set breakpoints on system functions which if you haven’t read symbols you can’t do. I don’t know about Android but on iOS and macOS there are common symbolic breakpoints that people set, to catch error conditions and the like. To work around this we added code so that if you specified a shared library when you set a breakpoint we would read in that shared library’s symbols, but it was hard to get people to use this.

People also very commonly call system libraries in expressions for a whole variety of reasons. There’s no way to express to the expression parser that it should try to load symbols from libraries (and which ones) when it encounters an identifier it can’t find. You’d probably need to do that.

There were other tweaks we had to add to gdb to make this work nicely, but that was a long time ago and I can’t remember them right now…

The other problem with this approach is that it often just takes a bunch of work that happens predictably when the user starts the debugger, and instead makes it happen at some time later, and if it isn’t clear to the user what is triggering this slowdown, that is a much worse experience.

Anyway, I don’t see why startup should be taking so long for Android. It would be better to make sure we can’t improve whatever is causing these delays before we start complicating lldb with this sort of progressive loading of library symbols.

Jim

Hi Emre,

I have to say I'm pretty sceptical about this approach, for all the
reasons that Jim already mentioned. Making it so that lldb pretends the
other shared libraries don't exist (which is what I assume your
prototype does) is fairly easy, but it does create a very long tail of
"feature X does not work in this mode" problems (my app crashed in a
third-party library -- maybe because I passed it a wrong argument -- but
I cannot even unwind into my code to see what I passed).

It's the fixing these problems that will make the solution very complicated.

So one idea is to improve the PlatformAndroid to use “adb” to copy all system libraries over and pre-cache all system libraries instead of letting it happen one by one.

Android is also very inefficient in loading shared libraries. It will load them one by one and each one involves 2 stops since the breakpoint we set gets hit once before the library is loaded and once again when it has been loaded. Each stop for shared libraries takes a few hundred milliseconds.

So it might be nice to have the PlatformAndroid grab all system libraries and populate the cache for a device in one large command and see if that improves things. To test this you can just download all system libraries to a single directory manually and then do set some settings:

(lldb) settings set target.exec-search-paths /path/to/cache
(lldb) settings set target.debug-file-search-paths /path/to/cache

So time the “adb” command that wildcard copies all libraries over, then set the settings, then run your debug sessions and see how much this helps. That will give you a good idea of sequentially grabbing each library is the cost.

If the cost is in parsing these libraries, we can look at parallelizing the loading of all the device shared libraries first (prior to debugging) and then launching when everything is pre-loaded.

Greg

Adding Antonio Afonso since he did some work on speeding things up at Facebook for Android.

Emre: what version of LLDB are you using? Top of tree? One from a package distro?

One reason to not only load your libraries is backtraces will be truncated for any stack frames that go through the system libraries. These tend to be in the stack traces a lot as we deal with Android all the time at Facebook…

Greg

Thanks for all the feedback and ideas.

After Jim’s comment about iOS performance, I decided to dig deeper to figure out why it’s much slower to attach on Android compared to iOS. I identified an ipv6/ipv4 issue about adb and the simple fix (llvm.org/D79757) brought warm cache attach time down from 23 seconds to ~9 seconds, and cold cache attach time from 120 seconds down to 16-20 seconds!! These are much better numbers, but based on the numbers from Jim, I should probably continue hunting for more Android-specific issues.

One thing I want to try is “settings set plugin.process.gdb-remote.use-libraries-svr4 true”. I measure it takes ~600ms to read the list of shared libs without this flag, and it’s done 3 times from startup until we hit the first-breakpoint (load all system libraries at attach time; then load base.odex, then load user-lib.so). So, I expect this to shave up to ~1.8 seconds. I don’t have any numbers on this yet (does anyone know how much improvement I can expect with this?)

About my “don’t-load-system-modules” idea (that I mistakenly called “my-code-only” which is apparently something different): Pavel’s concerns about not loading modules sounds serious, especially about impacting stability, and I’d like to avoid causing trouble there. So, instead of that, I will try Greg’s idea of pre-caching libraries. I’ll start with some measurements. I’m working from top-of-tree (I’m trying to build LLDB with libxml2 now).

Isn't that the default? The reason this setting was added was so we
could test the !svr code path without forcibly disabling xml support
(and possibly workaround any svr issues). However, having it on as a
default definitely makes sense?

pl

The svr4 support seems to be off by default: https://github.com/llvm/llvm-project/blob/2974b3c566d68f1d7c907f891137cf0292dd35aa/lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemoteProperties.td#L14

It would definitely make sense to turn it on by default.

  • J.

Done (deea174ee5).

pl