Module Cache improvements - RFC

Hello all,

we are running into limitations of the current module download/caching
system. A simple android application can link to about 46 megabytes
worth of modules, and downloading that with our current transfer rates
takes about 25 seconds. Much of the data we download this way is never
actually accessed, and yet we download everything immediately upon
starting the debug session, which makes the first session extremely
laggy.

We could speed up a lot by only downloading the portions of the module
that we really need (in my case this turns out to be about 8
megabytes). Also, further speedups could be made by increasing the
throughput of the gdb-remote protocol used for downloading these files
by using pipelining.

I made a proof-of-concept hack of these things, put it into lldb and
I was able to get the time for the startup-attach-detach-exit cycle
down to 5.4 seconds (for comparison, the current time for the cycle is
about 3.6 seconds with a hot module cache, and 28(!) seconds with an
empty cache).

Now, I would like to properly implement these things in lldb properly,
so this is a request for comments on my plan. What I would like to do
is:
- Replace ModuleCache with a SectionCache (actually, more like a cache
of arbitrary file chunks). When a the cache gets a request for a file
and the file is not in the cache already, it returns a special kind of
a Module, whose fragments will be downloaded as we are trying to
access them. These fragments will be cached on disk, so that
subsequent requests for the file do not need to re-download them. We
can also have the option to short-circuit this logic and download the
whole file immediately (e.g., when the file is small, or we have a
super-fast way of obtaining the whole file via rsync, etc...)
- Add pipelining support to GDBRemoteCommunicationClient for
communicating with the platform. This actually does not require any
changes to the wire protocol. The only change is in adding the ability
to send an additional request to the server while waiting for the
response to the previous one. Since the protocol is request-response
based and we are communication over a reliable transport stream, each
response can be correctly matched to a request even though we have
multiple packets in flight. Any packets which need to maintain more
complex state (like downloading a single entity using continuation
packets) can still lock the stream to get exclusive access, but I am
not sure if we actually even have any such packets in the platform
flavour of the protocol.
- Paralelize downloading of multiple files in parallel, utilizing
request pipelining. Currently we get the biggest delay when first
attaching to a process (we download file headers and some basic
informative sections) and when we try to set the first symbol-level
breakpoint (we download symbol tables and string sections). Both of
these actions operate on all modules in bulk, which makes them easy
paralelization targets. This will provide a big speed boost, as we
will be eliminating communication latency. Furthermore, in case of
lots of files, we will be overlapping file download (io) with parsing
(cpu), for an even bigger boost.

What do you think?

cheers,
pl

Feel free to implement this in PlatformAndroid and allow others to opt into this. I won't want this by default in any of the Apple platforms in MachO we have our entire image mapped into memory and we have other tricks for getting the information quicker.

So I would leave the module cache there and not change it, but feel free to add the section cache as needed. Maybe if this goes really well and it can be arbitrarily used on any files types (MachO, ELF, COFF, etc) and it just works seamlessly, we can expand who uses it.

In Xcode we take the time the first time we connect to a device we haven't seen to download all of the system libraries. Why is the 28 seconds considered prohibitive for the first time you connect. The data stays cached even after you quit and restart LLDB or your IDE right?

Greg Clayton

Can’t you just cache the modules locally on the disk, so that you only take that 26 second hit the first time you try to download that module, and then it indexes it by some sort of hash. Then instead of just downloading it, you check the local cache first and only download if it’s not there.

If you already do all this, then disregard.

I believe this is already done.

I am guessing the main issue is this happens on the first time you debug to a device you and up with a 30 second delay with no feedback as to what is going on. So you say "launch" and then 35 seconds later you hit your breakpoint at main. In Xcode we solve this by downloading all of the files when we attach to a device for the first time and we show progress as we download all shared libraries. Sounds like it would be good for Android Studio to do the same thing?

Greg

Yes we already have a disk cache on the host. I agree with you that waiting 30s at the first startup shouldn’t be an issue in general (Pavel isn’t sharing my opinion). The only catch is that in case of iOS there are only a few different builds released so if you downloaded the modules once then I think you won’t have to download them the next time when you try to use a different device. In case of Android we have to download the symbols from each device you are using and at that point 30s might be an issue (I still don’t think it is).

For progress purposes in Android Studio we listen on eBroadcastBitModulesLoaded coming from the target so we can report about every loaded SO.

Yes we already have a disk cache on the host. I agree with you that
waiting 30s at the first startup shouldn't be an issue in general (Pavel
isn't sharing my opinion). The only catch is that in case of iOS there are
only a few different builds released so if you downloaded the modules once
then I think you won't have to download them the next time when you try to
use a different device. In case of Android we have to download the symbols
from each device you are using and at that point 30s might be an issue (I
still don't think it is).

With my app developer hat on, if some program makes me wait 30s for
something then I won't like that program.

I agree, but if the first time you hook your phone up Android Studio pops up a dialog box saying "This is the first time you have connected this device, hold on while I cache the shared libraries for this device..." then it wouldn't be too bad. It is primarily the fact that the 30 seconds is happening without feedback during first launch or attach. Also, you can probably use something faster than the lldb-platform to download all of the files. In Xcode, we download all symbols into the users home directory in a known location:

~/Library/Developer/Xcode/iOS DeviceSupport

This folder contains the exact OS version and a build number:

(lldb) platform select remote-ios
  Platform: remote-ios
Connected: no
SDK Roots: [ 0] "~/Library/Developer/Xcode/iOS DeviceSupport/9.0 (WWWWW)"
SDK Roots: [ 1] "~/Library/Developer/Xcode/iOS DeviceSupport/9.1 (XXXXX)"
SDK Roots: [ 2] "~/Library/Developer/Xcode/iOS DeviceSupport/9.2 (YYYYY)"

Where WWWWW, XXXXX, YYYYY are build numbers. We know we can look in these folders for any files that are from the device. They get populated and these SDK directories get searched by LLDB's PlatformRemoteiOS so they get found (we don't use the file cache that the PlatformAndroid currently uses).

So with a little work, I would add some functionality to your Android Studio, have something that knows how to copy files from device as quickly as possible (using lldb-platform is sloooowww and that is the way it is currently done I believe) into some such directory, all while showing a progress dialog to the user on first device connect, and then debugging will always be quick. And you can probably make it quicker than 30 seconds.

Greg Clayton

I completely agree with you that we shouldn’t change LLDB too much just to speed up the startup time at the first use.

For android we already have a host side disk cache in place similar to what you described for iOS and we already using ADB (an android specific interface) to download the files from the device but unfortunately its speed is only ~4-5MB/s on most device.

Thanks for the feedback, and sorry about the slow response.

After some internal discussions, it looks like we are not going to go
forward with this approach at the moment.

cheers,
pl