How to reduce .profraw file size

I’m trying to get code coverage to work on our project using clang-cl. Unfortunately, due to legacy, the amount of code that is taken along in the test exes is quite large.
Compiling with only -fprofile-instr-generate -fcoverage-mapping gives us a profraw file of 4GB after the unit test has ran.
I did find the -fprofile-list=cov.list option and managed to prune the symbols by a lot already. Currently the file is only 2GB. (taking around 10 seconds to write)

With the following, I am able to get some idea of what is in the file:

llvm-profdata.exe show --all-functions .\default.profraw -o default.profraw.allfunc
undname.exe default.profraw.allfunc > default.profraw.symbols

the allfunc file contains:

  ??$?0AEA_K@?$_Optional_destruct_base@_K$00@std@@QEAA@Uin_place_t@1@AEA_K@Z:
    Hash: 0x0000000000000000
    Counters: 1
    Function count: 1023

or unmangled

  public: __cdecl std::_Optional_destruct_base<unsigned __int64,1>::_Optional_destruct_base<unsigned __int64,1><unsigned __int64 & __ptr64>(struct std::in_place_t,unsigned __int64 & __ptr64) __ptr64:
    Hash: 0x0000000000000000
    Counters: 1
    Function count: 1023

This is quite unfortunate as we are not interested in the code coverage of the standard library.

Our current cov.list file looks like this:

[clang]
source:*/VS2022/*=forbid
default=allow

I’ve tried to exclude using function, though it expects a managed name. My understanding is that I can use *@std@@* to filter based on the namespace, though it also excludes part of our code that has any function taking an argument defined in the standard library. As such, this ain’t a viable option.

I’ve already looked into the code (llvm-project/clang/lib/CodeGen/CodeGenModule.cpp at 7a28a5b3fee6c78ad59af79a3d03c00db153c49f · llvm/llvm-project · GitHub), though it ain’t obvious what ->getName() returns. For functions, I believe this is only the mangled name, and for sources I believe this is the absolute path, based on my experimentation.

Do you know any way that that allows reducing the .profraw size that excludes the standard library?

Interesting. In Itanium C++ ABI, <prefix> (class, namespace, etc) is before the unqualified name

    <nested-name> ::= N [<CV-qualifiers>] [<ref-qualifier>] <prefix> <unqualified-name> E
		  ::= N [<CV-qualifiers>] [<ref-qualifier>] <template-prefix> <template-args> E

So skipping the _ZNSt* prefix is sufficient to skip functions instantiated in the STL headers. The MSVC manging scheme places qualification after the unqualified name, so there is no such easy trick.

% cat a.list
[clang]
function:_ZNSt*=skip
% cat b.cc
#include <iostream>
#include <vector>
using namespace std;

void foo(const vector<int> &x) { } // should be instrumented

int main(){
  vector<int> x;
  x.resize(5); // skip
  foo(x);
  cout << x[0];
}
% clang++ -fprofile-instr-generate -fcoverage-mapping b.cc -fprofile-list=a.list
1 Like

I think you can index it directly as showed in Source-based Code Coverage — Clang 19.0.0git documentation.

Note that the indexed profdata is mergeable according to the last note of the section.

I understand that these can be indexed, which reduces the 2GB to only 70MB. Although this fixes the memory issue (which is not that big of a problem), it doesn’t fix the performance impact. That impact is completely linked to fwrite of the 2GB. Hence Im searching to reduce at the source of the data.

What I don’t understand is that my source-filter doesn’t solve the problem, as all std:: symbols are defined in the MSVC standard lib. So by excluding the the whole VS installation, one would expect those symbols to be removed, yet it doesn’t do that.

For functions, I believe this is only the mangled name, and for sources I believe this is the absolute path, based on my experimentation.

You can use -ffile-compilation-dir=. or -fcoverage-compilation-dir=. to reduce the sizes of source file absolute path (-ffile-compilation-dir= expands to -fcoverage-compilation-dir= and -fdebug-compilation-dir=)

That impact is completely linked to fwrite of the 2GB.

Here’s also binary profile correlation that reduce the .profraw file size by keeping some metadata in the original instrumented binary and retrieve them from the binary later when merging to indexed profile data: [RFC] Add binary profile correlation to not load profile metadata sections into memory at runtime

If you don’t care about the exact execution count, single byte counter would also help: [RFC] Single Byte Counters for Source-based Code Coverage

1 Like