Wow, that's a lot of stuff!
Thanks a lot for all your encouragement!
=== Regarding inter-unit and purposes: ===
> Is this within a translation unit? Are you planning to make this
> method usable across translation units?
> This sounds interesting! If I understand correctly, your goal was to
> improve analyzer performance by avoiding inlining? Could we also use
> this high-level approach to enable cross-translation unit analysis?
> Is the participation needed for serializing out state if we want to
> make these summaries persistent?
We do eventually aim for cross-translation-unit analysis as well, however it's quite a bit of a separate story. Oh, i guess Alexey has already covered that. Apart from this, of course, we also desire to improve time-performance of path-sensitive checks inside a single translation unit, because it seems to make some sense when the analyzer is integrated into build systems of large projects, and such, so we wish to reduce analysis overhead in such use-cases.
We did not yet try out any sort of serialization of summaries. Once we do, then, yeah, most likely, we'd probably need extra checker-side support to serialize their data.
=== Regarding the checker-side support: ===
> What makes the help from the checkers inevitable here?
> What kind of participation is needed from a checker author to
> participate in this design? Since it is based on the ExplodedGraph,
> it would seem their participation would be automatic.
> Can you tell us more about these delayed checks? I’m particularly
> wondering about type-state-like checks that don’t issue warnings on
> unknown values.
>
> For example, suppose you had a checker like SimpleStreamChecker that
> issues a warning on a double close. If a symbol ref for a file
> descriptor is marked as closed in its state then the checker issues a
> warning, but if the symbol ref is unknown, it just transitions it to
> the closed type state.
>
> So it you had:
>
> FILE *f = ...
> fclose(f)
> myFunction(f);
>
> where
>
> void myFunction(FILE *p) {
> fclose(p)
> }
>
> Then with inlining you would get a diagnostic about a double close at
> fclose(p). But with a summary-based approach like you suggest the
> unmodified SimpleStreamChecker wouldn’t issue a warning. How would
> you recommend changing such a checker to handle summaries? Are there
> common patterns here that we could expose to checker writers?
Precisely, the code sample provided by Devin Coughlin, together with the "opposite" sample:
FILE *f = ...;
myFunction(f);
fclose(f);
, is the simplest motivational example that shows that checker-side support is inevitable. There's no defect in myFunction() itself, however it changes the GDM (which is visible to checkers only) to cause the warning.
We did plan to make some "How to add summary IPA support to your checker in 24 hours" how-to
The modification needed/proposed-by-our-approach to let SimpleStreamChecker conduct its double-close checks with summary is as follows:
1. Store a list of all open/close operations in the program state as a separate list-type trait, with node pointers, pointer symbols, and operation types.
2. In summary-apply callback, iterate over this list found in the summary leaf, actualize symbols, and re-model the operations on top of the caller state:
2a. See if the operation is valid, report double-close if not; supply the stored node to the bug reporter so that it could glue the report path together in the Devin Coughlin's example and throw the report against a correct node; storing nodes is also useful for implementing bug visitors.
2b. Mark the relevant symbols as open or closed in the caller context. This would let us handle the "opposite" example, which would throw the report in the caller context after changes in GDM caused by the callee.
2c. Repeat these changes in the caller's operation list, so that deeper call stacks would be supported. If bug report is constructed through a deep call stack, the checker would need to provide BugReporter with a stack of nodes.
Well, roughly, these things. Most of the checkers that work with GDM in a similar manner would require very similar facilities, in fact such approach worked for ~6 pretty checkers we tried it upon, some of which were pretty weird (eg. statistical analysis of the whole graph) (we measure productivity on only two of them, because other checkers are too quiet).
It's not as scary as it sounds, probably ~20 lines of code per checker with small number of GDM entries.
On how to avoid modifying checkers: it might be possible to simplify this procedure significantly, but only by throwing away the idea of the opaque *Generic* data map in the state, replacing it with a limited set of structures to allow handling most of the actualization by the analyzer engine. This approach seems worse, as it greatly decreases the flexibility, limiting the number of checkers we can implement without adding stuff into core, so we quickly realized that it's not really worth trying.
So i think yeah, modifying checkers is inevitable. They need to have their own summary of the branch, need to know how to apply it.
Todo for myself: we didn't yet have a chance to think deeply about leak-related checks, eg. lifetime of symbols that were created during actualization, or how symbols leak through/inside summaries, or probably avoid actualizing symbols that would die instantly.
=== Regarding the ranges and reachibility: ===
> I think reachability is a very important question. How do you
> associate the symbols, the constraints and the branches?
> In general, it is great, to have a summary based approach to
> interprocedural analysis, but could you list the high level goals of
> your approach and what improvements are you expecting?
> One consequence of analyzing the function at the top level is that
> the summary produced makes no assumptions about context. This means
> the summary is always applicable (i.e., the precondition holds) but
> also that applying it may yield many more paths than are actually
> feasible.
Yeah, so what we do:: We consider each non-sink[1] leaf in the ExplodedGraph of the summary separately. In this leaf, we take all info stored by RangeConstraintManager - ranges on all symbols, including ranges on dead symbols. These are *the* pre-conditions[2] for the branch. Unless we have sinks, disjunction of these pre-conditions for all branches is `true'[3].
For every branch, since RangeConstraintManager maps symbols to ranges, we actualize these symbols according to the context. After this, we search for ranges on the same symbols in the caller context. If any of the new ranges contradict (do not overlap) the new ranges, we throw away the whole branch. Hence, we avoid applying impossible branches. It is done before any other work on the branch is being done (eg. checkers or GDM calls), so it saves time.
It should be possible to optimize this procedure by joining branches with completely equal range informations[4], though we haven't coded this yet.
This procedure interacts with delayed-checks in a fancy manner, namely it opens up warnings like:
void bar(int *b) {
foo(b); // [1] calling foo()
}
void foo(int *a) {
*a = 0;
if (a == 0) {
// [2] warn: null-dereference
}
}
during summary-apply, because constraints on 'a' would be present in the summary input constraints. It might or might not be worth it to suppress such warnings, because they may be really unobvious, or modify the bugreporter to make them more obvious by drawing later paths through the callee.
/* [1] Just realized that actually omitting sink-branches is not really a sound decision. Other checkers may have delayed checks inside these branches, which would cause a warning before the sink, which we loose with our approach. */
/* [2] It is possible that checkers add their own checker-specific pre-conditions, but we haven't yet encountered any sensible practical use-cases. */
/* [3] Note that if the summary is incomplete (eg. we exceeded max-nodes or block-count for the summary), then we go for conservative evaluation, so all branches are either complete or sinks when we actually apply any summary. */
/* [4] This would most likely (though not always) mean that they only differ with their GDM information, eg. result of state-split done by checkers. */
=== Regarding performance: ===
> You can also measure the number of basic blocks visited if you want
> to compare coverage. Take a look at debug.Stats checker and
> -analyzer-stats option. These might give you a better understanding
> of what causes the slowdown.
That's a great idea, we didn't try that yet. We tried to estimate node count including the "summarized" nodes, and had some funny numbers (eg. summary analyzes ~20 times more nodes per second).
> We use various heuristics for balancing precision and performance for
> inlining. We do not actually inline all functions. This is why you
> see more reports with your summaries. Theoretically, inlining should
> always be more precise than any summary. There are several parameters
> controlling this apart from max-nodes:
> - We do not inline functions that are too large -analyzer-config
> max-inclinable-size (50 basic blocks).
> - We do not inline large functions(min-cfg-size-treat-functions-
> as-large) too many times (max-times-inline-large).
We hacked directly into inlineCall(), so we should be ipa-calling the same set of functions, i guess. However, we didn't yet work on the BifurcateCall mechanism.
> You mentioned in a follow-up email that you see a ~5x increase in
> memory usage with your prototype. Do have a sense of where this cost
> comes from? Is it keeping the summaries themselves?
Partially from storing summaries. Partially from the fact that we actually do a lot more analysis; with the same max-nodes, we go through a lot more paths and create heavier nodes with more stuff in their states and such. So it's not really correct to compare analysis runs for summary and inlining with same max-nodes: by design, summary would be slower and heavier in such case, while finding deeper defects.
> You can possibly trim the summary down substantially by doing a BFS
> from the exit/error nodes of the graph to the entrance nodes. We use
> this trick when displaying trimmed graphs in GraphViz to show a path
> to an error node. That could result in a massive savings.
Yeah, that's one more possible memory usage optimization, thanks for the idea! Other things that we tried include removing some summary-irrelevant program state traits in checkEndAnalysis, reclaiming exploded graph nodes more aggressively than usual when summary construction is over, re-using memory allocators across analyses (most importantly the exploded node allocator, otherwise freeing nodes doesn't save any memory) etc.
=== Miscellaneous: ===
> > Step (c.ii): Actualize constraint keys, intersect "new" and "old"
> > ranges, throw away the branch if intersection is empty, otherwise
> > constrain the actual symbol with the intersected range. As we
> > mentioned on step (a), this requires a separate pass through reaped
> > symbols which are stored separately.
> Does this modify the summary? Or do you make a copy?
We create a new ProgramState that is based on the original caller-context state, with new range constraints for the symbols (either add new (symbol, range) pairs, if the symbol was not yet constrained, or replace the range, if the intersection was computed).
> > Step (c.iv): Path-sensitive bug reports are constructed by
> > frankensteining the report path from different pieces of different
> > exploded graphs, be it a report that originates from the callee (and
> > hence thrown on step (c.iii) from eval::SummaryApply), or an
> > intermediate call that exited before the bug (which may be
> > interesting for the bug reporter visitors). The only trouble we
> > currently have is the "Assuming..." pieces, which may be redundant,
> > as no assumption was made, but rather the context determined the
> > branch.
> In some cases it seems like it ought to be possible to query the
> state at “assuming…” to determine whether the context always implies
> that the assumption holds and elide the piece if not needed. Does the
> context always determine that the branch? Similarly, can there be
> cases where a path is dead (the branch could not be taken based on
> the actual) but the actualized post-state does not have enough
> information about to yield a contradiction? If I understand
> correctly, this is why you keep the ranges of dead symbols around?
> Would you also need need to keep checker state for these symbols
> around so they can detect contradictions when applying the summary?
Yeah, this problem can be solved this way! We just didn't yet take up this task. However, storing ranges for dead symbols is inevitable anyway, as there may be a lot of significant pre-conditions (eg. the function relies on value of a global variable and then overwrites it; the value symbol of this variable dies, however we still need to discard branches if we know something about this symbol in the caller context).
> > - Only default implementations of Store Manager and Constraint
> > Manager are supported;
> By this you mean the RegionStoreManager and RangeConstraintManager,
> right?
Yes, right; we didn't yet think deeply about how exactly to abstract away from the fact that RangeConstraintManager uses symbol-range pairs, and RegionStoreManager uses base-offset-value triples, what sort of virtual functions should be defined in the abstract ConstraintManager and StoreManager classes in order to stay as flexible as possible.
> > Step (c.v): Actualize the binding keys and values, overwrite
> > existing values. Invalidation, as in writing some SymbolConjured to
> > some memory space, is handled similarly. before all bindings. We do
> > this last, because it would affect actualization (eg.
> > SymbolRegionValue should actualize itself to the value of the
> > region before the call, not after the call; quite a sensitive point
> > here, but seems to work this way).
> Given the region store’s differing representations of concrete and
> symbolic offsets, this sounds like it could be quite tricky. For
> example, does this work for parameter-dependent offsets in the
> summary post-state where in the summary the parameter is symbolic but
> at the call site the parameter is concrete? More generally, I’d love
> to hear more about how you apply summaries to the region store, how
> you identify the frame and the footprint, how you unify symbolic
> variables, etc.
Unfortunately, we did not yet handle symbolic offsets in the store properly, and some parts of it actually make a nice problem (also it's surprising that we didn't yet run into too many issues because of this stuff).
For concrete offsets, we had to extend the StoreManager::BindingsHandler to provide complete information about the binding keys, not just the base. Then we actualize the base of the binding. Then we split the actualized binding base into its own base and offset, because a base-region may actualize into a non-base-region. Then we *add* the offset of the actualized base to the original offset of the binding, and obtain the complete offset of the actualized binding. Then we actualize the store value and bind it to the actualized binding key, probably replacing the existing binding.
We also skip bindings to stack locals, and discriminate between direct and default bindings. Another important thing to notice is that when we actualize all bindings, we use the original store (without any new bindings) as the reference.
For symbolic offsets in the summary, i guess we should have done a similar thing. Take the binding key, which is base region and its sub-region representing a symbolic offset. Then discard the base region and take only the offset region, and actualize it. Then split the actualized region into base and offset, which can be either concrete or symbolic. The actualization procedure would automatically replace symbolic index inside any element region with concrete index if the symbolic index is constrained to a constant or actualized into a constant. Additionally, if we have (a+i)+j, where 'a' is an array, '+i' is done in the caller context, and '+j' is done in the callee context, then the actualization procedure would covert it automatically into 'a+(i+j)' (the actualization procedure detects such cases when it tries to actualize the super-region of an ElementRegion and obtains another ElementRegion as an answer) which allows easily adding symbolic offsets where necessary. Then actualize the value and bind it to the actual key.
Well, the real problem arises when we have a concrete offset in the summary, but the actualized base has symbolic offset. Then we somehow need to add an integer to a region, and i've no idea how to do that, without adding some new kind of key into the region store.
Once we tried to solve the actualization of the store keys via restoring the original sub-region by knowing the base and the offset. This almost-worked, but was extremely hacky and inevitably still unreliable, so we discarded this approach and went for simplier conversions of store keys.
So this is still a point for discussion.
=== Other stuff i forgot to mention earlier: ===
We had to make a modification of SVal hierarchy in order to make actualization work. Namely, we introduced an auxiliary atomic symbol kind, called SymbolRegionAddress, which represents the unknown address of a non-symbolic region. Why do we need that? Consider an example:
void foo(int *ptr) {
if (!ptr) {}
}
void bar() {
int a = 1;
foo(&a);
}
The summary of the function foo() contains ranges on reg_$0<ptr>. When we actualize this range, we figure out that we have no good symbol kind to represent the result, to carry the range, etc. So we call the resulting symbol "addr_$0<a>". With this, by looking at the kind of the symbol we receive, we can discard the "ptr == 0" branch of the summary, even though we never obtain the "&a" as the value of the condition (only symbols are stored in RangeConstraintManager). Additionally, such symbol is extremely convenient when we think of actualization as a recursive procedure: if we want to actualize &SymRegion{reg_$0<ptr>} to &a, which is the right thing to do, then to what do we recursively actualize reg_$0<ptr>? We also implemented the same hack with label addresses, creating SymbolLabelAddress. Well, this funny implementation detail is probably worth another discussion and a separate commit, maybe if i think deeper i can probably even find inlining-related test cases to support this feature.