question about -coverage


I have few questions about coverage.

Is there any user-facing documentation for clang’s “-coverage” flag?
The coverage instrumentation seems to happen before asan, and so if asan is also enabled

asan will instrument accesses to @__llvm_gcov_ctr.
This is undesirable and so we’d like to skip these accesses.
Looks like GEP around @__llvm_gcov_ctr have special metadata attached:

%2 = getelementptr inbounds [4 x i64]* @__llvm_gcov_ctr, i64 0, i64 %1

%3 = load i64* %2, align 8
%4 = add i64 %3, 1
store i64 %4, i64* %2, align 8

!1 = metadata !{…; [ DW_TAG_compile_unit ] … /home/kcc/tmp/] [DW_LANG_C_plus_plus]

Can we rely on having this metadata attached to @__llvm_gcov_ctr?

Should we attach some metadata to the actual accesses as well, or simply find the corresponding GEP?

Finally, does anyone have performance numbers for coverage?
As of today it seems completely thread-hostile since __llvm_gcov_ctr is not thread-local.
A simple stress test shows that coverage slows down by 50x!

% cat ~/tmp/
#include <pthread.h>
__thread int x;
void foo() {

void *Thread(void *) {
for (int i = 0; i < 100000000; i++)
return 0;

int main() {
static const int kNumThreads = 16;
pthread_t t[kNumThreads];
for (int i = 0; i < kNumThreads; i++)
pthread_create(&t[i], 0, Thread, 0);
for (int i = 0; i < kNumThreads; i++)
pthread_join(t[i], 0);
return 0;

% clang -O2 ~/tmp/ -lpthread ; time ./a.out
TIME: real: 0.284; user: 3.560; system: 0.000
% clang -O2 ~/tmp/ -lpthread -coverage ; time ./a.out
TIME: real: 13.327; user: 174.510; system: 0.000

Any principal objections against making __llvm_gcov_ctr thread-local, perhaps under a flag?

If anyone is curious, my intent is to enable running coverage and asan in one process.



Another question is about the performance of coverage’s at-exit actions (dumping coverage data on disk).
I’ve built chromium’s base_unittests with -fprofile-arcs -ftest-coverage and the coverage’s at-exit hook takes 22 seconds,
which is 44x more than I am willing to pay.
Most of the time is spent here:

#0 0x00007ffff3b034cd in msync () at …/sysdeps/unix/syscall-template.S:82
#1 0x0000000003a8c818 in llvm_gcda_end_file ()
#2 0x0000000003a8c914 in llvm_writeout_files ()
#3 0x00007ffff2f5e901 in __run_exit_handlers

The test depends on ~700 source files and so the profiling library calls msync ~700 times.
Full chromium depends on ~12000 source files, so we’ll be dumping the coverage data for 5 minutes this way.

I understand that we have to support the lcov/gcov format (broken in may ways) and this may be the reason for being slow.

But I really need something much faster (and maybe simpler).

Is anyone planing any work on coverage in the nearest months?

If no, we’ll probably cook something simple and gcov-independent.


The instrumentation that I have proposed (on cfe-dev) for PGO is also intended to provide the necessary info for code coverage. I have not yet measured the performance of the code to write out the data, but it ought to be quite a bit faster than what we have now.