TL;DR: The implementation of check::BranchCondition
is buggy and it’s barely used, so perhaps it should be completely removed from the analyzer.
Part 1: Incorrect implementation
Recently, while I was working on heuristics for weak loop assumptions I looked into the implementation details of check::BranchCondition
and I realized that it is broken.
The relevant part of the logic is the body of ExprEngine::processBranch
:
ExplodedNodeSet CheckersOutSet;
getCheckerManager().runCheckersForBranchCondition(Condition, CheckersOutSet,
Pred, *this);
// We generated only sinks.
if (CheckersOutSet.empty())
return;
BranchNodeBuilder builder(CheckersOutSet, Dst, BldCtx, DstT, DstF);
for (ExplodedNode *PredN : CheckersOutSet) {
if (PredN->isSink())
continue;
ProgramStateRef PrevState = PredN->getState();
ProgramStateRef StTrue, StFalse;
if (const auto KnownCondValueAssumption = assumeCondition(Condition, PredN))
std::tie(StTrue, StFalse) = *KnownCondValueAssumption;
else {
assert(!isa<ObjCForCollectionStmt>(Condition));
builder.generateNode(PrevState, true, PredN);
builder.generateNode(PrevState, false, PredN);
continue;
}
if (StTrue && StFalse)
assert(!isa<ObjCForCollectionStmt>(Condition));
// Process the true branch.
if (builder.isFeasible(true)) {
if (StTrue)
builder.generateNode(StTrue, true, PredN);
else
builder.markInfeasible(true);
}
// Process the false branch.
if (builder.isFeasible(false)) {
if (StFalse)
builder.generateNode(StFalse, false, PredN);
else
builder.markInfeasible(false);
}
}
This (slightly awkward) code performs the following steps:
- It runs the
check::BranchCondition
checkers and collects theExplodedNode
s produced by them inCheckerOutSet
. - It initializes a
BranchNodeBuilder builder
which will build the “after the branching point” nodes (and will place them in eitherDstT
orDstF
). - It loops over the nodes within
CheckersOutSet
with the loop variablePredN
. - For each node
PredN
it splits the state based on the value of the loop condition and creates successor nodes for the outcomes that are feasible. - If the one of the outcomes was infeasible (e.g. the loop condition cannot be false), we call
builder.markInfeasible()
which will prevent the production of additional nodes with that outcome. - But these “infeasible” marks are preserved for the next iteration of the
for
loop, and suppress the generation of nodes with that outcome – even if that outcome would be feasible when starting from the new, differentPredN
!
I was surprised to see why this incorrect logic was not causing any errors, so I added an assertion which revealed that the LIT test suite doesn’t contain any test code where CheckersOutSet
has more than one element, i.e. check::BranchCondition
checkers never split the state.
Part 2: Barely used
Then I looked up the checkers that use this callback and it turned out that there are only three of them (not counting CheckerDocumentation.cpp
):
UndefBranchChecker.cpp
is a very straightforward checker which just says “Branch condition evaluates to a garbage value” when it’s appropriate.
- This is obviously an useful report, but I don’t see why we have five or so specific
UndefinedSomethingResult
checkers instead of a single checker that reports every occurrence when an undefined value is used in some way.
TraversalChecker.cpp
is apparently a debug dumper utility which was added in 2012 to “Add a test that we are, in fact, doing a DFS on the ExplodedGraph”.
- Its
BranchCondition
callback is only used by the single test filetraversal-algorithm.mm
which could be easily ported to some other callback.
TestAfterDivZeroChecker.cpp
is an old alpha checker from 2014, which reports code paths where we use a value as a denominator and then later check whether that value is zero – and for some reason this logic is limited to cases when the “check whether that value is zero” comparison happens in a branch condition.
- In this checker replacing
BranchCondition
withPreStmt
would be an obvious improvement. - However, this checker also has a fundamental issue, so we’re contemplating its complete removal, or, perhaps its rewrite in the dataflow framework. (This is offtopic, so I’m describing it at the end of this post.)
Part 3: The way forward
I can imagine several very different ways to resolve this situation:
- (A) Write a more complicated, but correct implementation and keep
check::BranchCondition
to make it available for further checker development.
My personal impression is that this callback is not very useful, and I don’t think that we should maintain callbacks without compelling use cases. - (B) Replace the buggy
for
loop with an assertion that there was no state split and document the currently upheld invariant thatcheck::BranchCondition
must not split the state.
A lazy solution, can be easily implemented. - (C) The same as (B), but migrate away the non-essential checkers and document that only
UndefBranchChecker
should usecheck::BranchCondition
.
Cleans up the situation as much as possible without touching the Undef checker family. - (D) Completely eliminate
check::BranchCondition
and replace the overlappingUndefinedSomethingChecker
s with an unifiedUndefinedValueChecker
which warns on all code that reads undefined values.
Radical solution that also simplifies an unrelated part of the codebase. - (E) Do nothing.
Even lazier than (B), this would produce weird incorrect behavior when somebody introduces acheck::BranchCondition
callback that splits the state. (However, apparently this is not a popular callback, so if we’re lucky it will stay practically unused forever )
@steakhal @NoQ @Xazax-hun @Szelethus What do you think about this callback? How should we approach this technical debt?
Appendix: the fatal flaw of TestAfterDivZeroChecker
This checker reports situations where a value is used as a denominator and then later there is at least one code path where it’s compared against zero.
However, the logically correct thing would be reporting situations where a value is used as a denominator and then later on every possible code path it’s compared against zero.
For example, consider the following function:
int foo(int a, int b) {
int c = a / b;
if (c > 10)
b = get_new_denominator(a, b);
// some other logic...
if (b == 0)
return c;
return a / b;
}
Here the check for b == 0
is an idiomatic way to handle cases where the opaque function get_new_denominator
returns 0 – but on the single execution path where c > 0
is assumed to be false this triggers TestAfterDivZeroChecker
.
This is a fundamental limitation of the path-sensitive framework of the static analyzer, and although we could theoretically find some hacky solutions, I think it would be better to cover this kind of coding error in the dataflow framework where it naturally belongs.