Summary
We (J. Ryan Stinnett and Stephen Kell at King’s College London with support from the Sony / SN Systems team) propose to contribute a new tool for measuring how well local variables are covered by debug info (e.g. DWARF) which improves on previous coverage approaches. The initial version proposed here would focus on DWARF, but support for other debug info formats can be added by future work.
Coverage approach
Existing tools
Existing tools that compute some form of debug info coverage include llvm-dwarfdump and debuginfo-quality. The approach used in these tools has a few problems.
Coverage is measured in terms of instruction bytes. Instruction bytes are problematic for debug info coverage in several ways. The emitted instructions will vary across compilers and compiler options, meaning coverage values are not easily comparable. Optimisations that significantly change the number of bytes (adding by e.g. unrolling loops and also removing as well) can end up distorting coverage results. Additionally, debugging users are most often stepping by source lines, so a bytes-based coverage metric is not a good match for spotting issues that affect users.
Full coverage is defined as the entire parent scope (block / function) for all variables. To understand the issue of using the entire parent scope as the coverage target, imagine a variable which is first defined (not declared, but first assigned / written to) half way down a function. Optimising compilers won’t emit debug coverage for that variable until after it is first defined. Such a variable would never be covered for the entire parent scope even by an “ideal” optimising compiler, and thus 100% coverage under such a metric is unattainable for these variables. This makes it hard to discern whether less-than-perfect coverage can be improved. It also accidentally biases towards unoptimised compilations (where variables are placed on the stack for their whole lifetime).
Our approach
To remedy these issues, our approach makes several adjustments:
- Measure coverage in terms of source lines
- For each variable, calculate its defined regions and only expect coverage in the those lines
By measuring coverage in source lines instead of bytes, the measurement is comparable across compilations and better aligned with the typical debugging user experience. By including in the baseline only those lines where the variable being examined is defined, 100% coverage becomes attainable for all variables.
By combining these adjustments, our approach offers an accurate and achievable coverage metric. Variable storage (stack vs. register) also does not affect coverage attainability, whereas previous metrics accidentally favoured on-stack locals, because these tend to have ranges covering the whole scope, unlike registers that have usually just the defined ranges.
Further detail available
We have previously shared our debug info coverage approach via a EuroLLVM 2024 talk and CC 2024 paper. The talk and paper contains a more detailed story along with experimental evaluations from our research prototype, which used a static analysis approach specific to C language programs. This proposal takes a different approach by using language-agnostic data sources (as one might expect for LLVM tools).
Use cases
There are quite a few potential use cases for this debug info coverage data, including:
- Tracking over time (as in LLVM nightly tester)
- Pre-merge comparison (similar to LLVM compile time tracker)
- Some kind of coverage view in Compiler Explorer
- Integration tests
We suspect there are other potential applications as well. (Let us know if you think of any!)
Data sources
To compute our metric, there are 3 major sources of data needed:
- Debug info (e.g. DWARF) to be analysed
- Source lines that should be covered
- First definition point(s) for each source variable
As a data source for the source lines to be covered, we intend to use the DWARF line table from an unoptimised compilation. This also ensures our baseline only counts lines with meaningful computation (e.g. it skips blank lines, comments, etc.), as we assume the unoptimised line table only includes the lines we actually care about. For our initial version, we will get first definition data from a liveness analysis of source variables in unoptimised LLVM IR. It would be ideal if DWARF also contained variable first definition point(s) (or liveness generally), but that is not the case today.
Alternative data sources
We also considered a source-language static analysis pass to find the baseline source lines and first definition points, and in fact our research prototype used this approach for C language programs. However, this would mean writing a static analysis for every potential source language, which makes it much harder to access this tool for each new source language. We believe our source-language-agnostic design above is a better fit as an LLVM tool.
Tool home
We propose to add a new llvm-debuginfo-coverage tool to compute this. We know we’ll need to take in multiple build outputs (not just debug info from a single compilation), which makes our tool a bit different from existing LLVM tools like llvm-dwarfdump. It will also give us a bit more freedom to experiment with coverage output formats for people and tools without worrying about expectations users may have.
Alternative tool homes
Of existing LLVM tools, we also considered llvm-dwarfdump, as that’s the closest existing tool, especially with its --statistics mode. llvm-dwarfdump is mainly thought of as a DWARF pretty printer, taking only the file(s) to be analysed. We would need to take in a few additional inputs, which may be awkward to add to the llvm-dwarfdump CLI. Additionally, our coverage approach is not DWARF-specific. Although we only plan to support DWARF initially, the coverage tool could be expanded to support other formats in the future. For these reasons, we believe a new tool is a better fit.
Workflow
The initial version of the tool will consume DWARF and LLVM IR from an unoptimised build as the baseline, along with the DWARF from the optimised build being analysed. We acknowledge it’s a bit awkward to wrangle build systems to produce all of these (particularly the LLVM IR, which may necessitate build wrappers like wllvm), we believe this is acceptable for an initial version of the tool. Future work (more detail below) could add variable liveness to DWARF, which would remove the need for the LLVM IR input, which simplifies usage of the tool. The primary use case of this coverage tool is imagined to be in occasionally-run automated jobs, so hopefully scripting together those inputs is not too onerous. We’ll include examples of this build wrangling in both tool documentation and as part of testing the tool itself.
Future improvements
We can imagine lots of ways to improve this for the future, even though they are not part of our initial plan.
Add variable liveness to DWARF
It would be ideal for this coverage tool as well as other analysis tools if DWARF described the first defined and last used points for source variables. Future work could explore a DWARF extension to capture this during compilation, and then adjust the coverage tool to make use of it. This would simplify the coverage tool both internally and at time of use, as we’d no longer need to examine LLVM IR. Beyond the coverage use case here, debuggers could warn about use of uninitialised values, tracers could about printing bogus data, etc. This would obviously require its own RFC and communication with DWARF committee if it were pursued.
Investigate finer-grained coverage
It would be nice to increase coverage precision by going beyond line granularity in some way. This would be particularly helpful for language features like loop headers, which are made up of several expressions that might all occupy a single source line but which execute at different times in the running program. It may also be helpful for other constructs like function calls with computations in their arguments and similar expressions which do not have a source line all to themselves.
It’s not immediately obvious how best to go beyond lines when using the DWARF line table as-is, since an instruction is mapped to a single source position, not a source region (with start and end) as you’d have in a source language AST. While you could perhaps extrapolate a region by joining adjacent line table rows (and stopping when you see the end_sequence flag), it is not clear if such data would be reliable, as line table gaps would imply unintentionally inflated regions. Future work could explore ways of improving precision here.
There’s also a separate dimension to consider: whether the whole variable is covered vs. only some fraction of the bits it contains. Our initial version assumes any coverage of a variable covers the whole variable, but future work may wish to be more precise here.
Support debug info formats beyond DWARF
As already mentioned, we plan to support only DWARF debug info for our initial work, but there’s nothing about the approach that is specific to DWARF. It would be great to see this approach applied to other debug info formats in a single tool.
Acknowledgements
Thanks to everyone who has provided feedback on this along the way. Adrian Prantl gave quite helpful advice in discussions at EuroLLVM. The Sony / SN Systems team is assisting us with this effort and reviewed an earlier draft of this RFC.
