Getting basic block using -fsanitize-coverage=inline-bool-flag

Hello!

I have a project in which code coverage is added. Unfortunately, the default --coverage option does not fit as it outputs a lot of information per run (and slows
the binary execution due to atomic operations), so I decided to switch to SanitizerCoverage.

This sanitizer has an option inline-bool-flag which builds a bool array of visited entities (currently I’m using basic block coverage).

However, I did not find a good way to use this information.

If I process the .gcno files generated by using -ftest-coverage, I can’t get the
basic block index in the resulting binary section that’s passed to sanitizer callback. Thus I’m unable to determine to which basic block (or to which function/translation unit) the cell is linked.

Assuming the basic blocks in the .gcno files are present in the same order as they are in the function, I would only need the counters section offset per function, but this information is also missing.

If I got the code right, there is no info about individual functions’ sections’ offset.

I came up with a following algorithm:

  1. Use the pc-table in addition to inline-bool flag.
  2. Use an internal symbolizer in the binary for each pc address to determine the function and the translation unit it belongs to.
  3. Form an array of basic blocks belonging to a function.
  4. Merge this information with .gcno files (again, assuming the order is preserved).

I believe this algorithm is highly suboptimal and does a lot of work in runtime, so I’m asking for advice: is there an easier way to get source coverage info from inline-bool array?

The only idea that came up to me is to write another transform pass based on current sanitizer coverage pass.
In this pass the compiler would emit a tuple (translation unit, function name, lineset, basic block index in the resulting section for inline-bool-array) for each basic block into some file.

  1. Use the pc-table in addition to inline-bool flag.

This is correct – pc-table is necessary to identify instrumented functions’/blocks’ addresses.

I do not recall exactly how it was done anymore, but with enough build-time information (and access to the source code) one could symbolize the binary offline and map pc addresses to source code (although in my case it assumed a static binary – it wouldn’t work otherwise). This way the runtime cost of producing the debug symbols could be avoided. I don’t recall any custom transform pass for that though.

The gcov style --coverage only uses atomic increments with -fsanitize=thread. For a CFG with V vervices and E edges, it needs to instrument E-V+1 edges (since Clang 12.0.0). The instrumentation pass is inserted very early in the pipeline and therefore retains more of the source information.

SanitizerCoverage is a simple code coverage instrumentation which does not use debug info. The instrumentation pass is inserted very late in the pipeline, after heavy optimizations. It uses a simple heuristic to remove the number of basic blocks which need to be instrumented. For -fsanitize-coverage=edge, the instrumentations may be larger. Its func and bb modes do not have direct matches. As its documentation says for visualization it’s probably not so suitable.

Do you find that gcov is significantly slower than SanitizerCoverage? If that is the case, the likely reason is that SanitizerCoverage performs many optimizations before instrumentation.

If you still want to do that, consider using the ELF symbol table (.symtab) with the PC table. It’s much faster than running symbolization one by one.