Alan Phipps @evodius96 has introduced MC/DC Code Coverage in Clang/LLVM. Thanks for the great work.
It has a limitation of “The maximum number of conditions is 6”. I have designed to work for more than 6, possibly hundreds of conditions. The limitation will come from the number of possible test vectors.
Design abstract
Clang emits executed test vectors as bitmap. It is smaller than the combination of all combinations. For example:
((n1 && n4) || (n3 && n5) || (n2 && n6))
In the current implementation, this has 6 conditions and uses 64 (1 << 6) bits of bitmap for possible combinations as executed test vectors. The number of possibly executed test vector is 15 in this case. The bitmap will be sparse (15/64) even if all possible combinations are executed.
I will introduce the design to emit possible test vectors. Of course in this case, the length of bitmap is 15.
Design involved
The decision graph represents the expression above. Each quadrangle node is a condition. Each circular node is the final decision.
w
in each decision is the number of possible paths. The sum of each w
is 15. in
in each node is the number of incomings. Each w
can be calculated with the sum of incoming w
(s).
idx
in each edge is assigned based on the number of incomings and each w
in precedent nodes.
n3
(w=[n1:w:1, n4:w:1]
)- 1st edge (
n1->n3
)idx
is0
.- The next number of
idx
is1
by addingn1:w
.
- 2nd edge (
n4->n3
)idx
is1
.- The next number of
idx
is2
by addingn4:w
. It should be equivalent ton3:w:2
.
- 1st edge (
n2
(w=[n3:w:2, n5:w:2]
)- 1st edge (
n3->n2
)idx
is0
.- The next number of
idx
is2
by addingn3:w
.
- 2nd edge (
n5->n2
)idx
is2
.- The next number of
idx
is4
by addingn5:w
. It should be equivalent ton2:w:4
.
- 1st edge (
Each idx
in decisions should be assigned identically. I am sorting nodes ORDER BY w DESC, NodeID
.
- F2:idx=0
- F6:idx=4
- T6:idx=8
- T5:idx=12
- T4:idx=14
Indexes should be deterministic between Clang and llvm-cov
.
Clang CodeGen
The initial value of the accumulator %acc
is 0
. the accumulator is increased conditionally by its condition.
%idx = select %cond,TrueIdx,FalseIdx
%acc = add %acc,%idx
br %cond,%TrueLabel,%FalseLabel
The final value of %acc
is the ID of the test vector.
LLVMCoverage (llvm-cov
)
The ID of the test vector can be calculated with idx
in each node in CoverageMapping.cpp:buildTestVector()
.
Other minor implementations
I will propose Bitmap
as not byte-aligned but packed. I have committed the preparation.
In the case above, final decisions with w=1
is identical and will be equivalent of the branch count of the final condition. We could reduce Bitmap little a bit. This is effective for expressions with a sequence of a single operator, like (a0 && a1 && ... && a999)
.
Format compatibility
This design requires the change of file format. I wonder what could be compatible.
llvm-cov
(*.profdata
) – Easy to read and merge the previous format.- Clang’s emission
%.o
%.s
%.bc
%.ll
– I guess they would require the previous version (18.x) of clang toolchain. Maybe unsupported?
Efforts
I have tried this to the LLVM tree. A few expressions are tough. Fine almost!
- clang-tidy/MisleadingIdentifier.cpp was impossible due to billions of test vectors!
- LLVMAnalysis::ConstantFolding.cpp has 45931 of test vectors.
- X86OptimizeLEAs.cpp has 6561 of test vectors.
Thank you.