Hi all, I want to collect the coverage of branches after the optimization.
More specifically, if one function/branch is deleted by certain optimization, I want to show such a change in the coverage report. In other words, I want to collect the coverage of optimized programs.
For example,
void checkAndSet(int *arr, int n, int threshold) {
for (int i = 0; i < n; ++i) {
if (arr[i] < threshold) {
arr[i] = threshold;
}
}
}
This program will be optimized to the following program with -O3
void checkAndSet(int* arr, int size, int threshold) {
if (size > 0) {
for (int i = 0; i < size; i += 2) {
if (arr[i] < threshold) {
arr[i] = threshold;
}
if (i + 1 < size && arr[i + 1] < threshold) {
arr[i + 1] = threshold;
}
...
In the coverage report for -O3, I want there to be four branches as the optimized program.
However, I found that the coverage report of LLVM doesn’t change with the optimization. It is always the same.
Is it possible and practical to collect the coverage of branches after optimization?
Hi YangChenyuan – I’m not sure this is a practical metric, at least for Source-based Code Coverage, and it’s likely that the branch instrumentation itself will prevent its removal by optimization. Ultimately, the goal of Branch Coverage, as a specific part of Source-based Code Coverage, is to show you the coverage of source code. As such, it is always tied back to source code with an accurate report of a condition’s evaluation, regardless of how well the generated code is optimized (though it will tell you if a condition is constant-folded and removed).
It sounds like you need a lower-level form of binary-level coverage to measure this, which I’m not sure is supported by LLVM – maybe by BOLT?
I did not land ⚙ D104060 Machine IR Profile. Instead, I opted to expand the existing IRPGO in that RFC you found.
By default, IRPGO injects instrumentation somewhat early in the optimization pipeline before inlining. We also have context-sesitive IRPGO (using -fcs-profile-generate) which injects instrumentation after inlining.
Thank you for all the detailed explanations! My interest in understanding coverage post-optimization stems from the need to ensure that our test cases comprehensively cover all branches in the optimized object code. This concern arises because optimization can alter the branch structure of the code, potentially introducing additional branches that differ from the original source code.
Hi @ellishg, could we use -fcs-profile-generate to instrument the code and then collect the coverage? Based on my understanding, it is designed to collect profiling information to optimize the code.
Yes, -fcs-profile-generate and -fprofile-generate (IRPGO) are designed to generate profiles to optimize binaries (PGO). We can use it to collect some code coverage, but there are caveats. Unlike front-end instrumentation (-fprofile-instr-generate) which can report line coverage, IRPGO can only reliably report file coverage, function coverage, and basic block coverage (but right now there isn’t a simple way to map this back to source). This is because IRPGO injects instrumentation later during optimization, so some source lines could be dead-stripped before we get a chance to instrument it.
That being said, we find IRPGO coverage can be useful, thanks to the fact that it’s overhead is extremely low compared to front-end instrumentation. In particular, it can be used as a lightweight dead function detector. Checkout https://www.youtube.com/watch?v=NuXk1V19pew and https://www.youtube.com/watch?v=vFWwJrOiVMM for more details.
Thank you very much for your prompt and detailed answer! It really helps me have a better understanding.
I have two further questions about IRPGO:
Can it be considered as the instrumentation at the binary level? From the videos, it seems that -fprofile-generate instruments the binary size
Is there any tools that could help to read or visualize the .profdata generated by -fcs-profile-generate and -fprofile-generate? Sorry, I am not familiar with the usage of .profdata
IRPGO instrumentation adds counters along edges between blocks. We use the MST to only add the minimum set of counters. These counters are inserted in LLVM IR, and not at the MIR level. Because of this there may be some discrepancies between the counters and the final basic blocks for a particular architecture, since its possible to add blocks or mutate the CFG of a MachineFunction. Typically, its a very tight mapping, but we can’t guarantee its 1:1 in all cases.
llvm-cov itself has some support for displaying profile counts, via --show-line-counts, which sounds pretty close to what you’re after.
It should also be possible to print the CFG after profile weights have been added in Graphviz format. This isn’t’ the same as the binary, but is as close to a direct visualization as we have. To make it easy to understand, you may have to customize the Graphviz printing, though to include branch weights and color things yourself. If you’re doing that, you could probably do something similar for MIR, though I don’t recall how much existing support there is for visualizing MIR functions.
Lastly, there is some related work in -fbasic-block-sections=labels, that will add an ELF section called .llvm_bb_addr_map. I think its heavily used by san-cov and in fuzzing contexts, but its completely generic, so maybe you can use that to correlate IR counters to blocks and visualize it. Its probably also worth looking into san-cov itself, as I think they may have some visualization tooling, but I’m not familiar enough with it to know for sure. You can find more info on it here: SanitizerCoverage — Clang 18.0.0git documentation