RFC: Source-based MC/DC Code Coverage

Last January, I upstreamed support for Source-based Branch Coverage (Revision https://reviews.llvm.org/D84467). I also spoke about this work at the 2020 LLVM Developers’ Meeting. It was a lot of fun to work on!

As I mentioned then, I have had an interest in extending this support to enable Modified Condition/Decision Coverage (MC/DC), and I’d like to start working on that very soon. Based on what I learned from others, there is a lot of interest in supporting MC/DC with LLVM. MC/DC is a requirement for functional safety critical development, in particular, in the embedded space. More about MC/DC: https://en.wikipedia.org/wiki/Modified_condition/decision_coverage

Background

The key thing added by MC/DC (from the DO-178C standard) is to show that, “Each condition in a decision is shown to independently affect the outcome of a decision.” A condition is shown to affect a decision’s outcome independently “by varying just that condition while holding fixed all other possible conditions”. An addendum to the DO-178C standard definition clarifies, “or varying just that condition while holding fixed all other possible conditions that could affect the outcome”, which allows us to ignore unevaluatable conditions in languages that have short-circuit behavior on logical operations && and || (like C/C++). This is what I plan to implement.

For a Boolean logical expression consisting of one or more logical operators, the individual condition True/False path of execution constitutes a ‘test vector’ yielding a decision outcome. For example, for “a && b”, there are three possible test vectors (‘-‘ signifies an unevaluatable ‘don’t-care’ condition due to short-circuit semantics).

a b Result

1: F, -, F {F,-,F}

2: T, F, F {T,F,F}

3: T, T, T {T,T,T}

MC/DC effectively requires N+1 test vectors to verify (where N is the number of conditions). In the example above, all of the test vectors must be executed in order to show that both conditions, a and b, independently affect the outcome. More precisely, two test vectors are required for each condition to show MC/DC, forming an “independence pair” for each condition (in the above example, ‘a’ can be shown via test vectors 1 and 3, and ‘b’ can be shown via test vectors 2 and 3). The goal for LLVM is to enable llvm-cov to determine the executed test vectors and analyze them for each condition/decision outcome to prove that each condition independently affects the outcome. The overall metric is simply the number of executed “independence pairs” vs. total expected for each condition.

I’m just sketching out the design. Similar to Branch Coverage, there are three primary areas that I am proposing to extend:

FYI for the llvm-dev list, I have decided to change my originally proposed design after some email exchanges with Vedant Kumar that will prevent the exponential counter explosion I identified in #2.b. of the original proposal. Instead, I’d like to implement a log-based solution that tracks executed test vectors using an instrumented, variable-length bitmap, one per Boolean expression. The log-based approach scales much better, seems like a cleaner design, and is closer to what other vendors do to implement MC/DC.

Here is an outline of the new design proposal overall: