I’d love to ask for some help to understand how bug post-processing could fit into the analyzer’s pipeline.
My current rough understanding is that an ExplodedGraph is built for each top-level function (including other functions reachable from it). During analysis, reports (grouped into equivalence classes) are collected in the ExprEngine’s BugReporter’s EQClassesVector, and flushed after the construction is finished. Flushing includes selecting a ‘representative’ report from each class to generate a PathDiagnostic from, which is influenced by visitors and such. After some post-processing (pruning, filtering), diagnostics to display are collected into PathDiagnosticConsumer’s Diags set, and actually printed in some form in the end. As I see it now, this whole process is repeated for the next untouched top-level function in the CallGraph.
I was thinking about whether the whole ExplodedGraph needs to be re-built if we want to post-process bugs occured in its realm. After looking at the EGs of some code snippets analyzed using the default solver and then Z3, it seems to me that constraints, but also environment and store contents are changing, so one guess would be a yes. But again, completely re-building EGs might not be far in slow-down from analyzing with Z3 from the beginning, depending on what portion of the code is buggy. So one could work with the information already present in the graph (this might need telling the analyzer core not to throw away stuff it does not understand when the false positive refutation option is turned on, if such thing happens?), and re-evaluate those, perhaps around the time we call FindReportInEquivalenceClass().
But I might be on a completely wrong track, so I’d love to hear your thoughts about it
I was thinking about whether the whole ExplodedGraph needs to be re-built if we want to post-process bugs occured in its realm.
Generally all of post-processing is done in the BugReporterVisitors.cpp file,
with general entry point being trackNullOrUndefValue function for many checkers.
The bug reporter visitor gets access to the last node in the exploded graph, but can generally very easily retrace the past,
by going up the chain of predecessors.
After looking at the EGs of some code snippets analyzed using the default solver and then Z3, it seems to me that constraints, but also environment and store contents are changing, so one guess would be a yes.
In general, I think it should be possible to get a good increase in precision just by re-analyzing the existing exploded graph.
and destroys all constraints which can not be processed by the current solver.
If a further increase in precision is needed, it should not be very difficult to change that destroying logic to keep those constraints as well.
But again, completely re-building EGs might not be far in slow-down from analyzing with Z3 from the beginning, depending on what portion of the code is buggy.
Yes, that’s why post-processing is done.
So one could work with the information already present in the graph (this might need telling the analyzer core not to throw away stuff it does not understand when the false positive refutation option is turned on, if such thing happens?), and re-evaluate those, perhaps around the time we call FindReportInEquivalenceClass().
As mentioned before, it’s better to do those in your own visitor inside BugReporterVisitors.cpp.
Visitors get the BugReport object, and can call BugReport::markInvalid if invalidation is needed.
I was thinking about whether the whole ExplodedGraph needs to be re-built if we want to post-process bugs occured in its realm.
Generally all of post-processing is done in the `BugReporterVisitors.cpp` file,
with general entry point being `trackNullOrUndefValue` function for many checkers.
The bug reporter visitor gets access to the last node in the exploded graph, but can generally very easily retrace the past,
by going up the chain of predecessors.
After looking at the EGs of some code snippets analyzed using the default solver and then Z3, it seems to me that constraints, but also environment and store contents are changing, so one guess would be a yes.
In general, I think it should be possible to get a good increase in precision just by re-analyzing the existing exploded graph.
From my understanding, many constraints are different because the constraint manager tries to be as efficient as possible,
and destroys all constraints which can not be processed by the current solver.
I suspect that simply not destroying constraints on symbol that we cannot handle would already be a slight increase in precision. Because at least we can take the same path when the same symbol we cannot handle is encountered twice. Eg.:
if (x^3 + y^3 == z^3) {
// assume any branch, mark symbol (($x)^3 + ($y)^3 == ($z)^3)
// as [ 0, 0 ] or [-INT_MAX, -1] U [1, INT_MAX] respectively
}
if (x^3 + y^3 == z^3) {
// take the same branch that has been taken
// on the previous if.
}
Which leads me into believing that regardless of the solver, keeping those around is a good thing. I guess they are only removed for performance reasons, not because keeping them around was causing false positives. And i'm in favor of limiting symbol complexity to avoid performance problems (this way we'd handle all obvious cases, but if the arithmetic is ridiculously complex - the user would be more likely to forgive us).