RFC: LLDB Telemetry/metrics

jh7370 · April 11, 2024, 9:17am

As I’ve mentioned in the PR (and possibly elsewhere), I think it’s wrong for this to be tied specifically to LLDB. Apart from anything else, there’s a risk of tight coupling between the telemetry code and LLDB itself, making it much harder down the line for other tools to be abel to use it. Splitting the implementation from the usage will help avoid that coupling and will lead to a cleaner design and implementation overall, I think.

I hope it’s clear that telemetry isn’t suitable for performance profiling: while certainly it makes sense to record how long things take to get a general overview, LLDB developers should be using proper performance analysis tools to actually profile the tool (we can’t profile every function call using telemetry: the overhead would be tremendous). Telemetry can be used to identify how people use the debugger and therefore help LLDB developers to reproduce similar conditions for profiling purposes.

We definitely want something like this, otherwise how will we test it in the source tree?

What sort of additional data would this custom subclass be able to collect? Would it be possible for a downstream tool to have additional telemetry entry types, for example?

It sounds to me like this is mixing two orthoganol concepts, and is an example of tightly coupling the core telemetry behaviour with tool-specific stuff. If we were to want telemetry in another tool, the “session” concept will be generic (i.e. when does a program run, how long for etc), whereas debugger-specific stuff is obviously tied to LLDB. Related side question: within a single run of LLDB, can it launch, close, relaunch etc one or more programs? There may be some terminology confusion if so, because a session could also mean referring to one of these program launches until conclusion of its debugging.

Assuming for a moment you aren’t looking to gather telemetry off main branch LLVM, it might make more sense to use the LLVM release number. I could also seee a case for having both. Note that the git hash doesn’t indicate the age of the tool, so it is harder to look for older/newer usages only, for example using just that info.

What is the purpose of gathering these three fields? I can definitely see a use case for being able to count distinct users, but I can’t think of any use case where the actual username is needed. Usernames won’t be unique either, since two people on different machines could have the same username, for example. If you want something that uniquely identifies users, you should use a combination of computer + user id in some form.

It sounds to me like you have four separate things being discussed here:

An entry that records the total time of the run of LLDB, including its start time and duration.
An entry that records the “startup process” phase.
An entry that records the “shutdown process” phase.
The error status when LLDB shuts down.

I don’t think you need a separate timestamp for “shuts down”, because a duration from the start to end of the program will give you this.
Each of the first three of these are essentially just “timestamp + duration” entries with an associated identifier . The error status then becomes orthogonal to that entry type and should be in a separate entry of some kind.

[quote=“oontvoo, post:14, topic:64588”]
Usages of LLDB command line vs via DAP (deducible based on the lldb_path (argv[0]) )
[/quote]If it’s deducible within LLDB, this feels like it should be calculated within LLDB and then a true/false “usedViaDAP” bit of data or similar should be the thing that ends up in the final output

Why is this useful?

Would you explain this one in more detail please?

Similar to above comments, these are just start time + durations again, similar to the startup/shutdown phases above. It’s probably useful in some way to link it to the other target info, perhaps we could have some form of optional “related entry” links between certain entries? In our internal telemetry codebase, we have a concept of shared information between our telemetry events called “requirements” - perhaps this data could be similar?

If the sole aim of having both of these is to measure a time period, could you instead have a duration entry to record the command duration?

If the base class can be extended by downstream vendors, I’m not sure I see the purpose of this?

If writing to a stream or file, the telemetry data needs some sort of structure. However, it probably should be up to the vendor to decide what structure that should be, for example JSON or YAML. OTOH, other “destinations” might not be directly a stream at all. For example, if my company’s downstream were to make use of this telemetry, it’d be likely that they’d want to convert from upstream telemetry entries to our downstream telemetry events so that it all plugged into our existing system. The “destination” in this case would therefore be some other concrete C++ implementation, which did the converting.

I don’t think this needs explicitly stating anywhere: it should be up to the vendor to decide whether a separate thread or not is appropriate, and that will depend on various factors such as existing thread usage, overhead of spinning up a new thread versus doing it in the main thread etc.

Topic		Replies	Views
Inquiry for performance monitors LLDB	19	129	June 19, 2017
lldb-gdbserver work LLDB	6	111	December 4, 2013
[RFC] More complete multiprocess support in LLDB LLDB	11	1244	September 2, 2022
Question about remote debugging protocol setup LLDB	3	119	April 11, 2014
Review of API and remote packets LLDB	12	94	June 16, 2016

RFC: LLDB Telemetry/metrics

Related topics