Right now we have the ability to have a title and add some detail as we increment the progress for progress reporting. In upstream, the only progress reports that take advantage of the detail
functionality seems to be llvm-project/lldb/source/Plugins/ExpressionParser/Clang/ClangModulesDeclVendor.cpp
. All other progress dialogs set the title one time ("Parsing symbol table for <path>"
or "Manually indexing DWARF for <path>"
so the title strings are all different, which can make it hard to group things together in a UI where progress reports can be started and finished very quickly and make things noisy when trying to display this effectively.
For symbol table parsing, we make a single progress object whose title is always Parsing symbol table for <path>
where <path>
changes for each module. We don’t have centralized code that parses all symbol tables. We could in the symbol preload code if we wanted to, but we don’t right now. If preloading is off, then any symbol tables can be parsed anywhere in the code at any time depending on what APIs trigger the symbols tables to need to be loaded. So that makes it hard to create a single progress object for all symbol table parsing. So the question is how to allow clients that want to receive progress notifications do to something better for display in a UI.
Right now if we do have a command that triggers all symbol tables to be loaded, they usually happen serially, so we will get a progress start notification for each module, and then a progress finished notification for that same module, and then repeat for all other modules. Something like breakpoint set --name foo
can trigger an iteration over all modules and any modules that don’t have their symbol tables parsed yet, will have them parsed on demand. But a command like “breakpoint set --module a.out --name main” will cause only one symbol table to need to be parsed.
This is very similar to DWARF indexing, which tends to take a lot of time on platforms that don’t have really good accelerator tables, like any non Darwin targets (linux, android, etc).
If we don’t want to update how progress reports are created and reported, one idea is to add the notion of wether a progress is considered “aggregate”. If this is true, then the ideas is user interfaces can decide to not show these reports. This method has been submitted as a PR [lldb][progress] Add discrete boolean flag to progress reports by chelcassanova · Pull Request #69516 · llvm/llvm-project · GitHub. My reservations with this approach is it currently marks almost every current progress as aggregate and might cause user interfaces to decide not to show this progress. In my experience on linux, many of these progress notifications are vital to knowing why the debug session is taking a long time to startup. Symbol table parsing takes some time, but manually indexing DWARF take a large amount time (up to 10 - 20 minutes if you have 70gb of unindexed debug info) and without being able to see progress updates in a UI, this would make the debugger seem to be hung and doing nothing which causes many users to kill the debugger and try again.
One idea it to add a category string to progress notifications. for symbol table parsing, this would look something like:
category = "Symbol table parsing"
title = "Parsing symbol table for /tmp/a.out"
detail = ""
For manually indexing DWARF, it would be:
category = "Manually indexing DWARF"
title = "Manually indexing DWARF for /tmp/a.out"
detail = ""
If a user interface gets too many notifications, then the UI could create an umbrella progress using category
as the progress title, and then make it show a continuous spinning progress UI. Even if the one symbol table progress finishes, a small timeout can keep the progress alive waiting for more progress notifications that have the same category
value. If another one comes in, continue and renew the timeout, else close the progress.
The code for progress reports that don’t update a count looks like:
Progress progress(llvm::formatv("Parsing symbol table for {0}", file_name));
This will create a RAII C++ object where on construction, it will report a progress with a value of 0 to indicate the progress started event, and on destruction, will report a progress with a value of UINT64_MAX to indicate completion.
For progress reporting where you know you have N units of work to do, the code looks like:
const uint64_t total_progress = ...;
Progress progress(
llvm::formatv("Manually indexing DWARF for {0}", module_desc.GetData()),
total_progress);
for (...) {
do_somework();
progress.Increment();
}
This allows progress objects to keep people up to date with the progress (or lack thereof). When the object is constructed, it will report a progress with a value of 0 to indicate the progress started event, and on destruction, will report a progress with a value of total_progress
to indicate completion. And during the expensive work, it will update the progress as work gets done.
The main thing we are trying to solve if how to manage how progress gets reported so that everyone is happy.
If we have a category for each progress object, it could help an IDE do something better, by allowing multiple progress start and end notifications to be received, but it can now group them together better. Code would like:
Progress progress(
/*category=*/ "Parsing symbol tables",
/*title=*/ "/tmp/a.out")
And
const uint64_t total_progress = ...;
Progress progress(
/*category=*/"Manually indexing DWARF",
/*title=*/"/tmp/a.out",
/*total=*/total_progress);
for (...) {
do_somework();
progress.Increment();
}
The IDE now has the ability to notice it is getting many progress start and stop notifications from a given category and then put them into a group.
Another ideas to enforce the category stuff shown above and then change the way that progress notifications are delivered to the IDE. Code can be added to automatically aggregate notifications by category
, even if we have internal notifications like:
progress started for {category="Parsing symbol tables", title="/tmp/a.out"}
progress ended for {category="Parsing symbol tables", title="/tmp/a.out"}
progress started for {category="Parsing symbol tables", title="/usr/lib/libc.so"}
progress ended for {category="Parsing symbol tables", title="/usr/lib/libc.so"}
We could notify the IDE always with the same API, but LLDB would manage the notifications where it would send the IDE a notifications like:
progress started for {title="Parsing symbol tables"}
progress update for {title="Parsing symbol tables", detail = "/tmp/a.out"}
progress update for {title="Parsing symbol tables", detail = "/usr/lib/libc.so"}
// wait for a small timeout of maybe 1 second and if no other progress for symbol tables comes in...
progress ended for {title="Parsing symbol tables"}
This leaves the public API untouched and we re-used the existing detail functionality that was added, but adds new code into LLDB to manage how progress updates are delivered. We could also add code to throttle notifications to only be sent once per second for progress items that update too quickly, like in the DWARF case. For example, we can make a fake progress that would incrmrent way too quickly and cause a flurry of notifications with sample code like:
Progress progress(
/*category=*/"Testing spammy progress",
/*title=*/"task a",
/*total=*/100000);
for (int i = 0; i<100000; ++i) {
usleep(1000);
progress.Increment();
}
This would send a ton of progress updates with no delay in between, so having code in LLDB that could throttle any progress updates to only go out once per second, it would help cut down on spammy progresses. When multiple progress events with a valid totals are running in different threads, we would need to make sure the totals were combined when doing the reporting.
This solution would help us not have to change the public API and could imrpove things for everyone.