[RFC] Debug info coverage tool

FTR this measurement is not especially difficult to implement, given unoptimized and optimized builds for the same source. I had tooling like this at a previous job for a different compiler. We were looking specifically at lines with is_stmt set, because that compiler did a decent job of picking is_stmt locations; LLVM’s heuristic is lame, so just identifying the set of source locations regardless of is_stmt is probably the way to go. But again it’s just collecting data from two .debug_line sections and diffing them, it’s not hard. For all I know, llvm-debuginfo-analyzer already has a mode like that.

I will take exception to the unstated premise, that it is a reasonable goal to get to 100% parity with unoptimized statement coverage. I believe that is unlikely, and that there is an “unachievable” part of the graph, just as you discovered for variables. I admit I don’t have hard data, but if you think about code-deleting optimizations such as CSE. dead-store removal, and unreachable code removal, there will be no instructions in the final object file for those source statements/expressions. Given that the DWARF line table is a mapping from instructions to source locations, if there are no instructions there can be no mapping.

Understanding that an optimized line table might (continuing your example) have say 70% line-table coverage–that is, maybe 30% of original source lines do not exist in even the most perfect optimized line table–then a variable-coverage metric that excluded those non-existent lines from its calculations would clearly still have high value. A metric that considered those non-existent lines to be “missing” from variable coverage would be less valuable.

So, in order to separate out the concerns about line coverage from concerns about variable coverage, I’d want a variable-coverage metric to compare covered lines as they exist in the compiled object file, rather than compared to some idealized object where all lines seen at O0 are still present in the optimized object, because that idealized object may well not be theoretically achievable. This does mean that improvements to line coverage might alter the variable-coverage metric, but it should not be difficult for users of the metrics to understand that.

Right. There are several ways to get that. The method I was envisioning, which I’m sure you know or can easily guess but I’ll write it down anyway, requires the unoptimized object file (which IME is often available anyway), and can be refined a bit with access to the source. (That is: Given a variable, you find its containing lexical scope, which has associated PC ranges, which you can look up in the line table; this gets you the set of source locations that in the opinion of the compiler are the ones that generate instructions within that lexical block. This is why I keep harping on PC ranges. It’s possible to reduce that set in some cases to avoid the “unreachable” part of the graph posted by @jryans , either by looking at the source and trying to suss out the source location of the first definition, or approximating that by using the variable’s declaration source point from the debug info.)

I like the idea of using the unoptimized line table to derive the baseline set of source locations, because it is obviously comparable to the set of source locations that you can derive from the optimized object file’s debug info. (Comparable in the sense that comments/blanks/noise tokens are automatically excluded, and that both sets are exactly the sets that someone running a debugger would actually be able to use; that is, those sets are the most relevant to the developer.)

You can use other approximations of these sets, but the farther you get from what the debugger uses, the fuzzier and less obviously useful the metric becomes. If the unoptimized debug info isn’t available, and we have some other approximation to take its place, we need to acknowledge the fuzziness and explicitly justify it.

Yes, I agree this should be straightforward to calculate. I attempted to run llvm-debuginfo-analyzer --compare=lines --print=lines, which seemed like it might do something like this, but it seems to run at 100% CPU for many minutes without printing anything… There may be some latent performance issues in that implementation, as it should be pretty quick to calculate this diff. One way or another, it makes sense to me to have a (fast) line table diffing report as part of this work, ideally located in the same tool as the other metrics discussed here for ease of use. (There may indeed be code that can be leveraged from llvm-debuginfo-analyzer.)

Aha, you caught us! :wink: I left this unstated before as I didn’t want to take up too much discussion time with this detail, but since you’ve mentioned it… Stephen and I take perhaps a more idealistic view than most here: we believe it should be possible to reach parity with unoptimised debug info (modulo reachability), and we describe this vision further in an upcoming Onward! 2024 paper. Now, as you’ve highlighted, that’s not actually possible for certain optimisations today, especially those deleting code, since the line table has nothing to map when there are no machine instructions for an optimised-away region of source code. It would require various debug info extensions, like GCC’s location views and others not yet designed to fully achieve this.

Rather than debating which vision of the debugging “illusion” is correct (really they both have merit), I think for the purposes of this RFC, we can separate out this concern with additional metrics as you suggest.

Thanks for highlighting this point! Yes, I agree that we should help our investigator even further by additionally offering a line-based, first-definition-aware variable coverage metric that checks only the optimised debug info, as that allows you examine variable coverage in isolation, just as our earlier additional line table metric examines line table coverage in isolation. I still believe our original metric (let’s think of this as a “combined” metric) is useful to get an overall summary view.

I’ll write a separate post soon to summarise the discussion so far, with particular focus on the additional metrics that should be included as part of this work.

Thanks Paul. Yes, I think at this point we are violently agreeing! Indeed, pushing a variable’s PC ranges through the line table is what we propose to do instead of looking at source directly, at a cost of trusting the compiler somewhat. We only ever need to push them through the matching line table.

Of course, exactly how to get the line table we use in this way is something we have been discussing elsewhere in the thread.

We are planning on posting a “version 2” of the RFC in a new thread, probably next week, that rolls up various comments and will hopefully be both clearer and more tuned to what people actually want to have. Thanks very much for all the comments here! (Or if a separate thread for v2 would be bad etiquette, do shout! We can put it here instead.)

I’ve see it being done both ways, but the one that keeps it all in a single thread makes the most sense to me, as long as you propely update the top post so that new people don’t get the wrong impression from the outdated first post.

We ended up creating a separate v2 RFC thread to continue the discussion. Please take a look at RFC v2 over there and add any feedback you may have in the new thread.