Analyzing certain coverage stats against binary but combining later for reporting

I have a Rust project containing build-time code (proc macros) and run-time code (the actual program). I want to measure test coverage for the whole project, but in separate build/test environments, and I want to do this automatically in CI. I’d like to find a way to not have to copy the entire set of build artifacts from job to job just to generate the coverage report at the end of the process.

To do my split build/test coverage, I currently have two jobs:

  1. Build:

    • Build the code with coverage instrumentation enabled.
    • Run the proc macro tests. This generates data with LLVM_PROFILE_FILE="$COVERAGE_DIR/build-%p-%m.profraw". $COVERAGE_DIR is just a directory under the project, like coverage/, but resolved to an absolute path (so that tests that have a different working directory put their stats in the same place).
  2. Test and report:

    • Test the runtime code (unit tests and integration tests). This generates data with LLVM_PROFILE_FILE="$COVERAGE_DIR/run-%p-%m.profraw".
    • Combine that with the coverage data from step 1 (with llvm-profdata merge).
    • Generate a report from the merged data ([horrible find command to list the interesting artifacts] -exec printf -- '-object\0%s\0' {} \; | xargs -0 llvm-cov show -Xdemangler=rustfilt -instr-profile=merged.profdata [reporting options])

The testing in step 2 only needs the built test binaries from step 1. But the report generation requires a much larger set of built artifacts.

What this means is that I have to copy the entire build output from step 1 to step 2. Not just the build artifacts for my project, but the built products of all dependencies as well, even though I don’t need them. It adds up to gigabytes of data passed around as CI artifacts.

What I’d really like is a way to “export” coverage data before report generation but after processing against the instrumented binaries. So instead of:

STEP 1 (build)                   STEP 2 (test and report)

instrumented binaries ──(~GB)──🢖 instrumented binaries ─╮
build coverage data   ──(~kB)──🢖 build coverage data   ─┼─🢖 coverage report
                                 runtime coverage data ─╯

…I could do

STEP 1 (build)                          STEP 2 (test and report)

instrumented binaries ─╮
build coverage data   ─┴─🢖 ??? ─(~kB)──🢖 ???                        ─╮
instrumented test binaries ─────(~MB)──🢖 instrumented test binaries ─┼─🢖 report
                                         runtime coverage data      ─╯

…where ??? stands in for some llvm-* command I don’t know about that combines the debugging info from the binaries with the coverage stats into some intermediate form.

I don’t think llvm-profdata merge does this - its output still needs the binaries around to be turned into a report.

I don’t think llvm-cov export does this - its output can’t be consumed by llvm-cov show.

I want this so that I can save time and storage on CI pipelines, so if there’s some other solution that’s viable, I’d love to know about it. Maybe it’s possible to identify only those build artifacts with coverage instrumentation that relates to my project’s source tree, and just transfer those? I don’t know.

Environment details:

  • Rust: 1.71.1
  • LLVM: 16.0.5-rust-1.71.1-stable
    • installed via rustup with rustup component add llvm-tools-preview
  • OS (desktop): Ubuntu 23.04
  • OS (CI image): Docker container based on rust:1.17.1-slim
  • Code hosting/CI: Gitlab 16.4.0