[analyzer] Summary IPA thoughts

Dear All,

We down there (mostly me and Alexey Sidorin, whom you might have also
seen here in the lists), after some digging into the analyzer code,
have coded a working proof-of-concept draft of summary-based
inter-procedural analysis implementation for the static analyzer. The
code is still too rough and dirty to publish as a review, however, as
tl;dr of the current status, we reached the point where:

  (good.1) The implementation doesn't crash, on quite large test
codebases, eg. Android;
  (good.2) It also doesn't seem to cause any noticable increase in
false positive rate;
  (good.3) Path-sensitive bug reports are constructed more or less correctly.

Additionally, the implementation is not really extremely complicated
or hard-wired together, so many things can be re-discussed and
re-thought.

The bad parts, still many, but hopefully some of them would get sorted
out in time:

  (bad.1) Doesn't yet offer as much performance gain, in
bugs-per-second, as we initially expected;
  (bad.2) Requires significant support on checker side, unlike
inlining, and most official checkers were not yet updated, instead we
were tweaking our own checkers, yet the method seems universal enough;
  (bad.3) Could use some refactoring and also needs rebase on the
latest clang, cause we diverged quite a bit, before we publish;
  (bad.4) Certain features are not yet supported:
    - Recursion is not yet modeled correctly;
    - Taint analysis is not yet supported;
    - Only default implementations of Store Manager and Constraint
Manager are supported;
    - Objective-C is not yet supported (unfortunately, none of us
understands it well enough).

Now, the design we implemented as a quick proof-of-concept is roughly
as follows:
  (a) Analyze the function without any context, as top-level;
  (b) Save its ExplodedGraph, which *is* *the* summary of the function;
  (c) When modeling the call, take the final states in leaf nodes of
its graph, and
    (c.i) Rename the symbolic values inside them according to the context;
    (c.ii) Merge the renamed range constraints into the current state;
(c.iii) Merge the GDM into the current state, with inevitable help
from checkers;
    (c.iv) Throw bug reports if a bug was revealed while applying the summary;
    (c.v) Merge the renamed store bindings into the current state,
including invalidation markers.

Step (a): We conduct a normal symbolic execution, as usual. When we
encounter a call, we pause current analysis until we collect the
summary of the callee function. If we encounter recursion, we go for
conservative evaluation. If normal analysis exceeds node limit, we
discard summary and evaluate the function conservatively.
Additionally, we add a few program state traits: most importantly,
when SymbolReaper destroys a symbol, we need to keep its range in a
separate trait, because it's a valuable piece of info on function
branch reachibility conditions, which we want to have reflected in the
final node.

Step (b): We simply don't delete the ExprEngine object, so that all
entities, such as States and SVals, had no stale pointers to various
managers and contexts inside them. Storing the whole graph, rather
than only leaf states, has a single purpose of reconstructing the
path-sensitive bug report; it's not useful for anything else. Of
course, it still offers a lot of room for memory optimizations (clean
unnecesary nodes, unnecessary program state traits, and that sort of
stuff, or even delete the graph and re-construct it only if we need to
actually throw the report).

Step (c): In ExplodedGraph of the caller, we create a node with a
special program point called CallSummaryPreApply. Out of this node, we
create new nodes for each reachable branch of the summary, that is,
for every leaf node of the summary graph which is reachable in current
context. Reachability is determined on step (c.ii). After a few
technical intermediate transitions, we reach another node, program
point of which is CallSummaryPostApply.

Step (c.i): So far, we did well without complete alpha-renaming
support. Instead, all we needed was to recursively scan through the
structure of the SVal and replace context-dependent stuff stored
deeply inside them with actual stuff, eg. SymbolRegionValue of a
formal parameter with actual argument value. Hence, to avoid confusion
with full-featured alpha-renaming, we called this "actualization";
it'd take extra work to see if we can actually generalize it to create
a good full-featured alpha-renaming mechanism (replace arbitrary
values with arbitrary values inside the whole states). This procedure
is one of the most complicated parts of our work, as it needs to
support the whole SVal hierarchy, and its implementation is also
already quite isolated. For SymbolMetadata, we ask checkers to perform
actualization through a callback, which seems reasonable. For
SymbolConjured, we fake the block counts in order to avoid accidental
collisions with existing symbols. Also, actualization isn't really a
separate step in the process; it's performed regularly for all
symbolic values on all steps while applying a summary.

Step (c.ii): Actualize constraint keys, intersect "new" and "old"
ranges, throw away the branch if intersection is empty, otherwise
constrain the actual symbol with the intersected range. As we
mentioned on step (a), this requires a separate pass through reaped
symbols which are stored separately.

Step (c.iii): Call the checker callback, "eval::SummaryApply", in
which checkers receive the final state in callee and the current state
in caller, and figure out how to move significant stuff from the
former to the latter. Checkers may add transitions or split states on
this phase. Most of the time, they do three things: - Carry
information that changed in the caller to the callee; - Check stuff in
which they were uncertain during callee analysis, maybe they'll be
sure now that they have a context (delayed checks); - Carry delayed
checks through to the callee if they could not be resolved anyway.
Note that delayed checks would need to be manually stored in the state
so that checkers could note them during apply. So overally, if the
checker has ~5 or so GDM traits, implementing a summary-apply
procedure may be pretty complicated.

Step (c.iv): Path-sensitive bug reports are constructed by
frankensteining the report path from different pieces of different
exploded graphs, be it a report that originates from the callee (and
hence thrown on step (c.iii) from eval::SummaryApply), or an
intermediate call that exited before the bug (which may be interesting
for the bug reporter visitors). The only trouble we currently have is
the "Assuming..." pieces, which may be redundant, as no assumption was
made, but rather the context determined the branch.

Step (c.v): Actualize the binding keys and values, overwrite existing
values. Invalidation, as in writing some SymbolConjured to some memory
space, is handled similarly. before all bindings. We do this last,
because it would affect actualization (eg. SymbolRegionValue should
actualize itself to the value of the region before the call, not after
the call; quite a sensitive point here, but seems to work this way).

So our current plan, in case nobody minds, would be:
  (1) Implement an SValVisitor, which may be useful for other
applications as well, as a separate patch;
  (2) Try to implement a full-featured alpha-renaming support, which
we almost have, on top of SValVisitor, probably add some useful
features to the constraint manager, as a separate patch;
  (3) Try to port the rest of the summary code, with heavy discussions
on fixing existing problems, some of which were listed above;
  (4) Dunno, maybe something else shows up, probably more ways to
split the patch into smaller pieces.

We'd be extremely greatful for [and promise to listen carefully to]
any feedback!

Thanks in advance, Artem [and Alexey].

Awesome!

So this implementation doesn't give the performance gain you were expecting, but what kind of performance gain does it give?
Do you get roughly the same or better quality of analysis? I would measure this in terms of number of bugs from before vs. number of bugs after, but maybe there is a better metric for quality of analysis.
How is memory usage affected? Based on your design, I suspect that you are using more memory, as you aren't throwing out engines and graphs after each function, but I don't know that for sure.

Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

So this implementation doesn't give the performance gain
you were expecting, but what kind of performance gain
does it give?

It varies, some things better, some things worse, not exactly a dramatic improvement (yet, hopefully), especially considering the complexity trade-off. We hope it gets better while we keep debugging and/or porting it further.

While good metrics are actually hard to find, the following has been observed in our test runs:

   -1- We can obtain ~15% more bug reports with similar analyzer options (eg. -analyzer-config max-nodes), but it takes ~twice more time and ~five times more memory.
   -2- We can find a similar number of bugs 2-3 times faster, with similar memory limitations, by reducing the max-nodes option; in this case, ~15% of the bugs do not overlap.
   -3- The true positive rate remains similar to inlining, newly obtained reports aren't like all incorrect or something like that.

In fact, we should be consuming more resources with similar max-nodes in our approach, because summary nodes are heavier and harder to obtain, and yeah, storing exploded graphs doesn't help either, so it's reasonable to reduce this parameter by default. So, eg., "-2-" was obtained when running summary with max-nodes=16000 and inlining with max-nodes=128000 (the default is 150000). However, by reducing max-nodes, we also reduce our possibilities to analyse functions with few IPA calls; and when we try to avoid this, we inevitably fail on memory improvement. So it's hard to obtain truly accurate comparison.

Our typical test run assumes analysis of the whole Android open-source project repo, and gathers around ~2000 warnings of two proof-of-concept checkers (these checkers usually don't find critical bugs, but they're noisy enough for statistical speculations); we only had time to evaluate a hundred of them by hand for each checker, more-or-less uniformly.

Hi!

Great!

Dear All,

We down there (mostly me and Alexey Sidorin, whom you might have also
seen here in the lists), after some digging into the analyzer code,
have coded a working proof-of-concept draft of summary-based
inter-procedural analysis implementation for the static analyzer.

This sounds interesting! If I understand correctly, your goal was to improve analyzer performance by avoiding inlining? Could we also use this high-level approach to enable cross-translation unit analysis?

The
code is still too rough and dirty to publish as a review, however, as
tl;dr of the current status, we reached the point where:

(good.1) The implementation doesn't crash, on quite large test
codebases, eg. Android;
(good.2) It also doesn't seem to cause any noticable increase in
false positive rate;

This is interesting and suggests that levels of context sensitivity higher than greater than 1 (as afforded by inlining) might not really add much in terms of reducing false positives.

(good.3) Path-sensitive bug reports are constructed more or less correctly.

Additionally, the implementation is not really extremely complicated
or hard-wired together, so many things can be re-discussed and
re-thought.

The bad parts, still many, but hopefully some of them would get sorted
out in time:

(bad.1) Doesn't yet offer as much performance gain, in
bugs-per-second, as we initially expected;
(bad.2) Requires significant support on checker side, unlike
inlining, and most official checkers were not yet updated, instead we
were tweaking our own checkers, yet the method seems universal enough;
(bad.3) Could use some refactoring and also needs rebase on the
latest clang, cause we diverged quite a bit, before we publish;
(bad.4) Certain features are not yet supported:
   - Recursion is not yet modeled correctly;
   - Taint analysis is not yet supported;
   - Only default implementations of Store Manager and Constraint
Manager are supported;

By this you mean the RegionStoreManager and RangeConstraintManager, right?

   - Objective-C is not yet supported (unfortunately, none of us
understands it well enough).

Now, the design we implemented as a quick proof-of-concept is roughly
as follows:
(a) Analyze the function without any context, as top-level;
(b) Save its ExplodedGraph, which *is* *the* summary of the function;
(c) When modeling the call, take the final states in leaf nodes of
its graph, and
   (c.i) Rename the symbolic values inside them according to the context;
   (c.ii) Merge the renamed range constraints into the current state;
(c.iii) Merge the GDM into the current state, with inevitable help
from checkers;
   (c.iv) Throw bug reports if a bug was revealed while applying the summary;
   (c.v) Merge the renamed store bindings into the current state,
including invalidation markers.

Step (a): We conduct a normal symbolic execution, as usual. When we
encounter a call, we pause current analysis until we collect the
summary of the callee function. If we encounter recursion, we go for
conservative evaluation. If normal analysis exceeds node limit, we
discard summary and evaluate the function conservatively.
Additionally, we add a few program state traits: most importantly,
when SymbolReaper destroys a symbol, we need to keep its range in a
separate trait, because it's a valuable piece of info on function
branch reachibility conditions, which we want to have reflected in the
final node.

Step (b): We simply don't delete the ExprEngine object, so that all
entities, such as States and SVals, had no stale pointers to various
managers and contexts inside them. Storing the whole graph, rather
than only leaf states, has a single purpose of reconstructing the
path-sensitive bug report; it's not useful for anything else. Of
course, it still offers a lot of room for memory optimizations (clean
unnecesary nodes, unnecessary program state traits, and that sort of
stuff, or even delete the graph and re-construct it only if we need to
actually throw the report).

You mentioned in a follow-up email that you see a ~5x increase in memory usage with your prototype. Do have a sense of where this cost comes from? Is it keeping the summaries themselves?

Step (c): In ExplodedGraph of the caller, we create a node with a
special program point called CallSummaryPreApply. Out of this node, we
create new nodes for each reachable branch of the summary, that is,
for every leaf node of the summary graph which is reachable in current
context. Reachability is determined on step (c.ii). After a few
technical intermediate transitions, we reach another node, program
point of which is CallSummaryPostApply.

Step (c.i): So far, we did well without complete alpha-renaming
support. Instead, all we needed was to recursively scan through the
structure of the SVal and replace context-dependent stuff stored
deeply inside them with actual stuff, eg. SymbolRegionValue of a
formal parameter with actual argument value. Hence, to avoid confusion
with full-featured alpha-renaming, we called this "actualization";
it'd take extra work to see if we can actually generalize it to create
a good full-featured alpha-renaming mechanism (replace arbitrary
values with arbitrary values inside the whole states). This procedure
is one of the most complicated parts of our work, as it needs to
support the whole SVal hierarchy, and its implementation is also
already quite isolated. For SymbolMetadata, we ask checkers to perform
actualization through a callback, which seems reasonable. For
SymbolConjured, we fake the block counts in order to avoid accidental
collisions with existing symbols. Also, actualization isn't really a
separate step in the process; it's performed regularly for all
symbolic values on all steps while applying a summary.

Step (c.ii): Actualize constraint keys, intersect "new" and "old"
ranges, throw away the branch if intersection is empty, otherwise
constrain the actual symbol with the intersected range. As we
mentioned on step (a), this requires a separate pass through reaped
symbols which are stored separately.

Step (c.iii): Call the checker callback, "eval::SummaryApply", in
which checkers receive the final state in callee and the current state
in caller, and figure out how to move significant stuff from the
former to the latter. Checkers may add transitions or split states on
this phase. Most of the time, they do three things: - Carry
information that changed in the caller to the callee; - Check stuff in
which they were uncertain during callee analysis, maybe they'll be
sure now that they have a context (delayed checks); - Carry delayed
checks through to the callee if they could not be resolved anyway.
Note that delayed checks would need to be manually stored in the state
so that checkers could note them during apply. So overally, if the
checker has ~5 or so GDM traits, implementing a summary-apply
procedure may be pretty complicated.

Can you tell us more about these delayed checks? I’m particularly wondering about type-state-like checks that don’t issue warnings on unknown values.

For example, suppose you had a checker like SimpleStreamChecker that issues a warning on a double close. If a symbol ref for a file descriptor is marked as closed in its state then the checker issues a warning, but if the symbol ref is unknown, it just transitions it to the closed type state.

So it you had:

  FILE *f = ...
  fclose(f)
  myFunction(f);

where

  void myFunction(FILE *p) {
    fclose(p)
  }

Then with inlining you would get a diagnostic about a double close at fclose(p). But with a summary-based approach like you suggest the unmodified SimpleStreamChecker wouldn’t issue a warning. How would you recommend changing such a checker to handle summaries? Are there common patterns here that we could expose to checker writers?

Step (c.iv): Path-sensitive bug reports are constructed by
frankensteining the report path from different pieces of different
exploded graphs, be it a report that originates from the callee (and
hence thrown on step (c.iii) from eval::SummaryApply), or an
intermediate call that exited before the bug (which may be interesting
for the bug reporter visitors). The only trouble we currently have is
the "Assuming..." pieces, which may be redundant, as no assumption was
made, but rather the context determined the branch.

In some cases it seems like it ought to be possible to query the state at “assuming…” to determine whether the context always implies that the assumption holds and elide the piece if not needed. Does the context always determine that the branch? Similarly, can there be cases where a path is dead (the branch could not be taken based on the actual) but the actualized post-state does not have enough information about to yield a contradiction? If I understand correctly, this is why you keep the ranges of dead symbols around? Would you also need need to keep checker state for these symbols around so they can detect contradictions when applying the summary?

Step (c.v): Actualize the binding keys and values, overwrite existing
values. Invalidation, as in writing some SymbolConjured to some memory
space, is handled similarly. before all bindings. We do this last,
because it would affect actualization (eg. SymbolRegionValue should
actualize itself to the value of the region before the call, not after
the call; quite a sensitive point here, but seems to work this way).

Given the region store’s differing representations of concrete and symbolic offsets, this sounds like it could be quite tricky. For example, does this work for parameter-dependent offsets in the summary post-state where in the summary the parameter is symbolic but at the call site the parameter is concrete? More generally, I’d love to hear more about how you apply summaries to the region store, how you identify the frame and the footprint, how you unify symbolic variables, etc.

Devin

So this implementation doesn't give the performance gain
you were expecting, but what kind of performance gain
does it give?

It varies, some things better, some things worse, not exactly a dramatic improvement (yet, hopefully), especially considering the complexity trade-off. We hope it gets better while we keep debugging and/or porting it further.

While good metrics are actually hard to find, the following has been observed in our test runs:

-1- We can obtain ~15% more bug reports with similar analyzer options (eg. -analyzer-config max-nodes), but it takes ~twice more time and ~five times more memory.
-2- We can find a similar number of bugs 2-3 times faster, with similar memory limitations, by reducing the max-nodes option; in this case, ~15% of the bugs do not overlap.
-3- The true positive rate remains similar to inlining, newly obtained reports aren't like all incorrect or something like that.

In fact, we should be consuming more resources with similar max-nodes in our approach, because summary nodes are heavier and harder to obtain, and yeah, storing exploded graphs doesn't help either, so it's reasonable to reduce this parameter by default. So, eg., "-2-" was obtained when running summary with max-nodes=16000 and inlining with max-nodes=128000 (the default is 150000). However, by reducing max-nodes, we also reduce our possibilities to analyse functions with few IPA calls; and when we try to avoid this, we inevitably fail on memory improvement. So it's hard to obtain truly accurate comparison.

Our typical test run assumes analysis of the whole Android open-source project repo, and gathers around ~2000 warnings of two proof-of-concept checkers (these checkers usually don't find critical bugs, but they're noisy enough for statistical speculations); we only had time to evaluate a hundred of them by hand for each checker, more-or-less uniformly.

You can also measure the number of basic blocks visited if you want to compare coverage. Take a look at debug.Stats checker and -analyzer-stats option. These might give you a better understanding of what causes the slowdown.

We use various heuristics for balancing precision and performance for inlining. We do not actually inline all functions. This is why you see more reports with your summaries. Theoretically, inlining should always be more precise than any summary. There are several parameters controlling this apart from max-nodes:
- We do not inline functions that are too large -analyzer-config max-inclinable-size (50 basic blocks).
- We do not inline large functions(min-cfg-size-treat-functions-as-large) too many times (max-times-inline-large).

Anna.

One consequence of analyzing the function at the top level is that the summary produced makes no assumptions about context. This means the summary is always applicable (i.e., the precondition holds) but also that applying it may yield many more paths than are actually feasible. That is, viewing the the summary as an implication, it looks like:

    true => PostState1 OR PostState2 OR PostState3 OR … OR PostStateN

If I understand your approach correctly, you mitigate this to some extent by keeping extra range information about dead symbols in the PostState so you can check for contradictions in the actualized PostState.

Did you consider the “functional” (in the sense of Sharir and Pnueli 81) approach that discovers and maintains tables of input-output summaries (i.e., pre-states and post-states) for functions?

This would allow for more precise summaries. With this approach, you would apply the post-state at a call site when it finds a pre-state that yields a hit in the table — if not it would essentially inline the function and then populate the table with the pre- and post-state from running the inlined functions.

This results in summaries that, when viewed as an implication, look like:

    (Prestate1 => PostState1_1 OR Postate1_2 OR ... OR PostState1_N)
  AND
    (Prestate2 => PostState2_1 OR Postate2_2 OR ... OR PostState2_M)
  AND
   …

The precondition used to compute these summaries can be generalized in a variety of ways to increase the chance of a hit — with the most-general precondition being ‘true’.

One nice thing about this approach is that it can be viewed as a performance optimization for inlining, where summaries “cache” the (approximate) result of inlining a function. With this view, you can trade off the memory requirements of summaries in this cache with running time from avoiding inlining. (On the other hand, because the pre-states for these summaries are informed by analysis state at call sites, this approach is harder to use for cross-translation-unit summarization.)

Devin

I’ve had a similar idea to summaries like this for a long time, and I’m glad to see it get pushed forward.

You can possibly trim the summary down substantially by doing a BFS from the exit/error nodes of the graph to the entrance nodes. We use this trick when displaying trimmed graphs in GraphViz to show a path to an error node. That could result in a massive savings.

What kind of participation is needed from a checker author to participate in this design? Since it is based on the ExplodedGraph, it would seem their participation would be automatic. Is the participation needed for serializing out state if we want to make these summaries persistent?

Hello Devin,

Our approach is a bit similar since we have pre-conditions (symbolic constraints) and post-conditions (information stored in Store and in GDM). But we don't compute summary directly. We did it for some time, however, but later it becomes unnecessary. Instead, we use final nodes of ExplodedGraph that keep final states in the end of function execution.

In fact, every branch in exploded graph is an equivalence class that form 'true' together so we just extract information from final states while applying a summary. Our approach in general is similar to Godefroid's "Compositional dynamic test generation" paper (we have reinvented some his ideas) with extensions like data structure support, ability to handle different check kinds.

You can possibly trim the summary down substantially by doing a BFS from the exit/error nodes of the graph to the entrance nodes. We use this trick when displaying trimmed graphs in GraphViz to show a path to an error node. That could result in a massive savings.

Hello Ted,

Currently we use trimming but in another way. We need to keep all the branches of ExplodedGraph since they form a summary. Instead, after analysis is finished, we reclaim some graph nodes that are not Summary[Pre/Post]Apply, not have tag, not are branching nodes and not are root/end nodes (shouldCollectNode() was modified a bit to support this "aggressive" mode). This reduces node count in resulting graph in 2-3 times. However, there is still room for memory optimization - for example, removing GDM entries from states that are not used neither in summary nor in report construction.

Is this within a translation unit? Are you planning to make this method
usable across translation units?

You force us to disclose :wink:
Yes, we have a working proof-of-concept of cross-translation-unit analysis (aka XTU). It is based on ASTImporter (which was fully implemented) and some infrastructure to collect function information. But that's another big story.

  // BTW, is there a nice way to write ASTImporter unit-tests?

So, our approach is applicable to XTU too since all translation units become one. Moreover, it is independent (well, almost independent) from inlining approach and IPA mode may be configured easily via -analyzer-config. To enable summary-based IPA, we just switch "-analyzer-config ipa=summary" and don't touch defaults. This should also make upstreaming easier in future since it doesn't influence default behaviour.

Wow, that's a lot of stuff!
Thanks a lot for all your encouragement!

=== Regarding inter-unit and purposes: ===

> Is this within a translation unit? Are you planning to make this
> method usable across translation units?

> This sounds interesting! If I understand correctly, your goal was to
> improve analyzer performance by avoiding inlining? Could we also use
> this high-level approach to enable cross-translation unit analysis?

> Is the participation needed for serializing out state if we want to
> make these summaries persistent?

We do eventually aim for cross-translation-unit analysis as well, however it's quite a bit of a separate story. Oh, i guess Alexey has already covered that. Apart from this, of course, we also desire to improve time-performance of path-sensitive checks inside a single translation unit, because it seems to make some sense when the analyzer is integrated into build systems of large projects, and such, so we wish to reduce analysis overhead in such use-cases.

We did not yet try out any sort of serialization of summaries. Once we do, then, yeah, most likely, we'd probably need extra checker-side support to serialize their data.

=== Regarding the checker-side support: ===

> What makes the help from the checkers inevitable here?

> What kind of participation is needed from a checker author to
> participate in this design? Since it is based on the ExplodedGraph,
> it would seem their participation would be automatic.

> Can you tell us more about these delayed checks? I’m particularly
> wondering about type-state-like checks that don’t issue warnings on
> unknown values.
>
> For example, suppose you had a checker like SimpleStreamChecker that
> issues a warning on a double close. If a symbol ref for a file
> descriptor is marked as closed in its state then the checker issues a
> warning, but if the symbol ref is unknown, it just transitions it to
> the closed type state.
>
> So it you had:
>
> FILE *f = ...
> fclose(f)
> myFunction(f);
>
> where
>
> void myFunction(FILE *p) {
> fclose(p)
> }
>
> Then with inlining you would get a diagnostic about a double close at
> fclose(p). But with a summary-based approach like you suggest the
> unmodified SimpleStreamChecker wouldn’t issue a warning. How would
> you recommend changing such a checker to handle summaries? Are there
> common patterns here that we could expose to checker writers?

Precisely, the code sample provided by Devin Coughlin, together with the "opposite" sample:
   FILE *f = ...;
   myFunction(f);
   fclose(f);
, is the simplest motivational example that shows that checker-side support is inevitable. There's no defect in myFunction() itself, however it changes the GDM (which is visible to checkers only) to cause the warning.

We did plan to make some "How to add summary IPA support to your checker in 24 hours" how-to :slight_smile:

The modification needed/proposed-by-our-approach to let SimpleStreamChecker conduct its double-close checks with summary is as follows:

   1. Store a list of all open/close operations in the program state as a separate list-type trait, with node pointers, pointer symbols, and operation types.
   2. In summary-apply callback, iterate over this list found in the summary leaf, actualize symbols, and re-model the operations on top of the caller state:
     2a. See if the operation is valid, report double-close if not; supply the stored node to the bug reporter so that it could glue the report path together in the Devin Coughlin's example and throw the report against a correct node; storing nodes is also useful for implementing bug visitors.
     2b. Mark the relevant symbols as open or closed in the caller context. This would let us handle the "opposite" example, which would throw the report in the caller context after changes in GDM caused by the callee.
     2c. Repeat these changes in the caller's operation list, so that deeper call stacks would be supported. If bug report is constructed through a deep call stack, the checker would need to provide BugReporter with a stack of nodes.

Well, roughly, these things. Most of the checkers that work with GDM in a similar manner would require very similar facilities, in fact such approach worked for ~6 pretty checkers we tried it upon, some of which were pretty weird (eg. statistical analysis of the whole graph) (we measure productivity on only two of them, because other checkers are too quiet).

It's not as scary as it sounds, probably ~20 lines of code per checker with small number of GDM entries.

On how to avoid modifying checkers: it might be possible to simplify this procedure significantly, but only by throwing away the idea of the opaque *Generic* data map in the state, replacing it with a limited set of structures to allow handling most of the actualization by the analyzer engine. This approach seems worse, as it greatly decreases the flexibility, limiting the number of checkers we can implement without adding stuff into core, so we quickly realized that it's not really worth trying.

So i think yeah, modifying checkers is inevitable. They need to have their own summary of the branch, need to know how to apply it.

Todo for myself: we didn't yet have a chance to think deeply about leak-related checks, eg. lifetime of symbols that were created during actualization, or how symbols leak through/inside summaries, or probably avoid actualizing symbols that would die instantly.

=== Regarding the ranges and reachibility: ===

> I think reachability is a very important question. How do you
> associate the symbols, the constraints and the branches?

> In general, it is great, to have a summary based approach to
> interprocedural analysis, but could you list the high level goals of
> your approach and what improvements are you expecting?

> One consequence of analyzing the function at the top level is that
> the summary produced makes no assumptions about context. This means
> the summary is always applicable (i.e., the precondition holds) but
> also that applying it may yield many more paths than are actually
> feasible.

Yeah, so what we do:: We consider each non-sink[1] leaf in the ExplodedGraph of the summary separately. In this leaf, we take all info stored by RangeConstraintManager - ranges on all symbols, including ranges on dead symbols. These are *the* pre-conditions[2] for the branch. Unless we have sinks, disjunction of these pre-conditions for all branches is `true'[3].

For every branch, since RangeConstraintManager maps symbols to ranges, we actualize these symbols according to the context. After this, we search for ranges on the same symbols in the caller context. If any of the new ranges contradict (do not overlap) the new ranges, we throw away the whole branch. Hence, we avoid applying impossible branches. It is done before any other work on the branch is being done (eg. checkers or GDM calls), so it saves time.

It should be possible to optimize this procedure by joining branches with completely equal range informations[4], though we haven't coded this yet.

This procedure interacts with delayed-checks in a fancy manner, namely it opens up warnings like:
   void bar(int *b) {
     foo(b); // [1] calling foo()
   }
   void foo(int *a) {
     *a = 0;
     if (a == 0) {
       // [2] warn: null-dereference
     }
   }
during summary-apply, because constraints on 'a' would be present in the summary input constraints. It might or might not be worth it to suppress such warnings, because they may be really unobvious, or modify the bugreporter to make them more obvious by drawing later paths through the callee.

/* [1] Just realized that actually omitting sink-branches is not really a sound decision. Other checkers may have delayed checks inside these branches, which would cause a warning before the sink, which we loose with our approach. */

/* [2] It is possible that checkers add their own checker-specific pre-conditions, but we haven't yet encountered any sensible practical use-cases. */

/* [3] Note that if the summary is incomplete (eg. we exceeded max-nodes or block-count for the summary), then we go for conservative evaluation, so all branches are either complete or sinks when we actually apply any summary. */

/* [4] This would most likely (though not always) mean that they only differ with their GDM information, eg. result of state-split done by checkers. */

=== Regarding performance: ===

> You can also measure the number of basic blocks visited if you want
> to compare coverage. Take a look at debug.Stats checker and
> -analyzer-stats option. These might give you a better understanding
> of what causes the slowdown.

That's a great idea, we didn't try that yet. We tried to estimate node count including the "summarized" nodes, and had some funny numbers (eg. summary analyzes ~20 times more nodes per second).

> We use various heuristics for balancing precision and performance for
> inlining. We do not actually inline all functions. This is why you
> see more reports with your summaries. Theoretically, inlining should
> always be more precise than any summary. There are several parameters
> controlling this apart from max-nodes:
> - We do not inline functions that are too large -analyzer-config
> max-inclinable-size (50 basic blocks).
> - We do not inline large functions(min-cfg-size-treat-functions-
> as-large) too many times (max-times-inline-large).

We hacked directly into inlineCall(), so we should be ipa-calling the same set of functions, i guess. However, we didn't yet work on the BifurcateCall mechanism.

> You mentioned in a follow-up email that you see a ~5x increase in
> memory usage with your prototype. Do have a sense of where this cost
> comes from? Is it keeping the summaries themselves?

Partially from storing summaries. Partially from the fact that we actually do a lot more analysis; with the same max-nodes, we go through a lot more paths and create heavier nodes with more stuff in their states and such. So it's not really correct to compare analysis runs for summary and inlining with same max-nodes: by design, summary would be slower and heavier in such case, while finding deeper defects.

> You can possibly trim the summary down substantially by doing a BFS
> from the exit/error nodes of the graph to the entrance nodes. We use
> this trick when displaying trimmed graphs in GraphViz to show a path
> to an error node. That could result in a massive savings.

Yeah, that's one more possible memory usage optimization, thanks for the idea! Other things that we tried include removing some summary-irrelevant program state traits in checkEndAnalysis, reclaiming exploded graph nodes more aggressively than usual when summary construction is over, re-using memory allocators across analyses (most importantly the exploded node allocator, otherwise freeing nodes doesn't save any memory) etc.

=== Miscellaneous: ===

> > Step (c.ii): Actualize constraint keys, intersect "new" and "old"
> > ranges, throw away the branch if intersection is empty, otherwise
> > constrain the actual symbol with the intersected range. As we
> > mentioned on step (a), this requires a separate pass through reaped
> > symbols which are stored separately.
> Does this modify the summary? Or do you make a copy?

We create a new ProgramState that is based on the original caller-context state, with new range constraints for the symbols (either add new (symbol, range) pairs, if the symbol was not yet constrained, or replace the range, if the intersection was computed).

> > Step (c.iv): Path-sensitive bug reports are constructed by
> > frankensteining the report path from different pieces of different
> > exploded graphs, be it a report that originates from the callee (and
> > hence thrown on step (c.iii) from eval::SummaryApply), or an
> > intermediate call that exited before the bug (which may be
> > interesting for the bug reporter visitors). The only trouble we
> > currently have is the "Assuming..." pieces, which may be redundant,
> > as no assumption was made, but rather the context determined the
> > branch.
> In some cases it seems like it ought to be possible to query the
> state at “assuming…” to determine whether the context always implies
> that the assumption holds and elide the piece if not needed. Does the
> context always determine that the branch? Similarly, can there be
> cases where a path is dead (the branch could not be taken based on
> the actual) but the actualized post-state does not have enough
> information about to yield a contradiction? If I understand
> correctly, this is why you keep the ranges of dead symbols around?
> Would you also need need to keep checker state for these symbols
> around so they can detect contradictions when applying the summary?

Yeah, this problem can be solved this way! We just didn't yet take up this task. However, storing ranges for dead symbols is inevitable anyway, as there may be a lot of significant pre-conditions (eg. the function relies on value of a global variable and then overwrites it; the value symbol of this variable dies, however we still need to discard branches if we know something about this symbol in the caller context).

> > - Only default implementations of Store Manager and Constraint
> > Manager are supported;
> By this you mean the RegionStoreManager and RangeConstraintManager,
> right?

Yes, right; we didn't yet think deeply about how exactly to abstract away from the fact that RangeConstraintManager uses symbol-range pairs, and RegionStoreManager uses base-offset-value triples, what sort of virtual functions should be defined in the abstract ConstraintManager and StoreManager classes in order to stay as flexible as possible.

> > Step (c.v): Actualize the binding keys and values, overwrite
> > existing values. Invalidation, as in writing some SymbolConjured to
> > some memory space, is handled similarly. before all bindings. We do
> > this last, because it would affect actualization (eg.
> > SymbolRegionValue should actualize itself to the value of the
> > region before the call, not after the call; quite a sensitive point
> > here, but seems to work this way).
> Given the region store’s differing representations of concrete and
> symbolic offsets, this sounds like it could be quite tricky. For
> example, does this work for parameter-dependent offsets in the
> summary post-state where in the summary the parameter is symbolic but
> at the call site the parameter is concrete? More generally, I’d love
> to hear more about how you apply summaries to the region store, how
> you identify the frame and the footprint, how you unify symbolic
> variables, etc.

Unfortunately, we did not yet handle symbolic offsets in the store properly, and some parts of it actually make a nice problem (also it's surprising that we didn't yet run into too many issues because of this stuff).

For concrete offsets, we had to extend the StoreManager::BindingsHandler to provide complete information about the binding keys, not just the base. Then we actualize the base of the binding. Then we split the actualized binding base into its own base and offset, because a base-region may actualize into a non-base-region. Then we *add* the offset of the actualized base to the original offset of the binding, and obtain the complete offset of the actualized binding. Then we actualize the store value and bind it to the actualized binding key, probably replacing the existing binding.

We also skip bindings to stack locals, and discriminate between direct and default bindings. Another important thing to notice is that when we actualize all bindings, we use the original store (without any new bindings) as the reference.

For symbolic offsets in the summary, i guess we should have done a similar thing. Take the binding key, which is base region and its sub-region representing a symbolic offset. Then discard the base region and take only the offset region, and actualize it. Then split the actualized region into base and offset, which can be either concrete or symbolic. The actualization procedure would automatically replace symbolic index inside any element region with concrete index if the symbolic index is constrained to a constant or actualized into a constant. Additionally, if we have (a+i)+j, where 'a' is an array, '+i' is done in the caller context, and '+j' is done in the callee context, then the actualization procedure would covert it automatically into 'a+(i+j)' (the actualization procedure detects such cases when it tries to actualize the super-region of an ElementRegion and obtains another ElementRegion as an answer) which allows easily adding symbolic offsets where necessary. Then actualize the value and bind it to the actual key.

Well, the real problem arises when we have a concrete offset in the summary, but the actualized base has symbolic offset. Then we somehow need to add an integer to a region, and i've no idea how to do that, without adding some new kind of key into the region store.

Once we tried to solve the actualization of the store keys via restoring the original sub-region by knowing the base and the offset. This almost-worked, but was extremely hacky and inevitably still unreliable, so we discarded this approach and went for simplier conversions of store keys.

So this is still a point for discussion.

=== Other stuff i forgot to mention earlier: ===

We had to make a modification of SVal hierarchy in order to make actualization work. Namely, we introduced an auxiliary atomic symbol kind, called SymbolRegionAddress, which represents the unknown address of a non-symbolic region. Why do we need that? Consider an example:
   void foo(int *ptr) {
     if (!ptr) {}
   }
   void bar() {
     int a = 1;
     foo(&a);
   }
The summary of the function foo() contains ranges on reg_$0<ptr>. When we actualize this range, we figure out that we have no good symbol kind to represent the result, to carry the range, etc. So we call the resulting symbol "addr_$0<a>". With this, by looking at the kind of the symbol we receive, we can discard the "ptr == 0" branch of the summary, even though we never obtain the "&a" as the value of the condition (only symbols are stored in RangeConstraintManager). Additionally, such symbol is extremely convenient when we think of actualization as a recursive procedure: if we want to actualize &SymRegion{reg_$0<ptr>} to &a, which is the right thing to do, then to what do we recursively actualize reg_$0<ptr>? We also implemented the same hack with label addresses, creating SymbolLabelAddress. Well, this funny implementation detail is probably worth another discussion and a separate commit, maybe if i think deeper i can probably even find inlining-related test cases to support this feature.

Hmm, probably, this problem can be solved by always storing the region through which the binding happened, even for concrete offsets. This assumes storing an extra pointer for every binding.

There's one more thing we don't yet handle in during summary-apply in the store, namely, the case when two different base regions actualize into the same region, eg:

   int a;
   void foo(int *x, int *y) {
     *x = 0;
     *y = 1;
   }
   void bar() {
     foo(&a, &a);
     clang_analyzer_eval(a); // expected-warning{{TRUE}}
   }

- in this case, with summary IPA, we are not yet sure if 'a' going to be 0 or 1.

This problem can also be solved by making the bindings heavier, eg. by introducing a timestamp for every binding, and sorting by timestamps before applying the summary.

Therefore, the most sensible way i see to solve these two issues is to make the bindings 2x heavier (now we have two pointers, we'd add a pointer and an integer; probably we can change the concrete offset to 32-bit and let the timestamp be 32-bit as well, and it'd be 1.5x heavier then, on 64-bit systems).

Not sure how easy it is to solve this without actually forking the whole RegionStore (we do not want this because current RegionStore has a lot of fancy stuff, such as constructing bindings which are not yet stored in the binding map, which would be great to re-use; hmm, why are they in the store manager anyway?).

Dear All,

We've published a snapshot of one of our development branches, which contains the Summary IPA prototype preview. As we mentioned, there are a lot of problems with it, including,

   - it's branched from clang-3.4,
   - it includes quite a bit of unrelated changes, including our inter-unit draft, in a single blob;

so even though it's in no way ready for any sort of review, i guess it could help a bit as a temporary reference in discussing issues.

We deeply apologize for the mess; of course the code will be changed significantly if/once we decide to put it on actual review for accepting into the base clang repo, but at least we got it at least somehow working and got a chance to make properly :slight_smile:

The github branch with a single blob for everything is at:

   https://github.com/haoNoQ/clang/tree/summary-ipa-draft

=== Places to look at, with necessary pre-cautions :slight_smile: ===

   * ExprEngine::inlineCall()
https://github.com/haoNoQ/clang/blob/summary-ipa-draft/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp#L468

Most of the calling bureaucracy is coded there, such as storing summaries, creating auxiliary ExprEngine's for constructing ExplodedGraphs for summaries, etc.

Hi Artem and Alexey,

I have not looked at the implementation yet but here are some higher-level comments about the approach, based on a discussion I had with Ted and Devin.

We do believe that summary based IPA is required in order to scale the analyzer to work across translation unit boundaries (whole project analysis), so it is great to see progress made in this direction.

Here are the strong goals that we believe in:

  • Suitable for XTU analysis.

  • Checkers should be very easy to write. There are many more checker developers (then analyzer core developers) so we should make checker development as easy as possible.

Hello Anna,

Thanks a lot for another detailed reply!

It took a while for us to think through the idea of "common-law" summaries, as opposed to our current "civil-law" summaries, and it still sounds as an approach probably promising but clearly very complicated, for which we do not currently have enough thought put into.

Yeah, the problem of understanding what is essential in the caller state, and what parts of it can be simplified out as irrelevant, seems the most complicated so far. It is true that using the whole state as a pre-condition would result in too many summaries. The very nature of symbolic execution assumes that there are only few classes of interesting program states.

It seems complicated to use a branch of inlined code as a summary for future reference, because alpha-renaming becomes much more complex (it would need to understand the context-less meaning of an in-context symbolic value, and only then transfer it into another context).

Instead, we'd rather consider creating an auxiliary ExprEngine for summary collection, as we do now, and put the current caller state into it "for the reference": before taking a branch, ensure that we do not contradict the "reference state" after renaming.

Then, yeah, we try to figure out how to handle the differences in the store, eg. between
     (x,0,direct): 0 S32b
     ...
     (x,0,direct): 2015 S32b
     ...
without creating separate summary branches for each case. Even if the execution path seems to be determined only by the fact that, say, reg_$0<x> is in range [0, 10], we cannot be sure that it is absolutely irrelevant for the checker whether reg_$0<x> is 2 S32b or 3 S32b, until we ask the checker (for instance, reg_$0<x> may be some sort of enum or bitfield passed into an API call modelled by this checker inside the summary graph, and the checker leaves its internal mark in the GDM regarding its value, so that to take a completely different path later, but the engine may accidentally consider the difference between 2 S32b and 3 S32b irrelevant, because all the if's took the same branch so far, and hence ignore important information and cause false positives.

Finally, i do not instantly see how the "problem of unknowns" is solved with this approach (as far as i understand, the point was about non-concrete values rather than about UnknownVal specifically). In a particular "precedent" context, for which the summary is constructed, even if it is not the "True" state, some values may still be unknown. And we'd still need to record and replay our decisions based on those values when we apply the summary branch. Alternatively, we need make the "not knowing" some things a part of the pre-condition, but then the checkers still need to explain which particular things they should "not know". That is, contexts may not only be contradictory, but also they may be more specific or less specific, and this also needs to be taken into account, as in these cases they are significantly different even if comparable (nested).

Skype call minutes - part. Anna, Devin, Alexey, and me:

- The idea of context-insensitive summaries (vs. context-sensitive summaries), is still controversal and non-mainstream in academic works, and causes too much intrusion into the checker API. While context-sensitive approach to the summaries (through abstraction of an inlined analysis graph, and re-application in another context, with a lattice of contexts by order of known-ness and heuristics to determine if existing known-ness is fine enough or needs to be improved through constructing a summary for a more specific context) requires a lot of work, which we would probably be unable to perform instantly, but the idea is clearly worth nurturing.

- In particular, there is still uncertainty regarding heap shape renaming correctness and invalidation support in the context-insensitive approach we propose.

- Alpha-renaming is worth it regardless of summary, and we're preparing the necessary patches, and there are still minor technical issues to be discussed. It can be implemented as a combination of constraint-solver-side equality checks for symbols, and whole-state-renaming passes, whichever turns out to be more performant.

- All of the above would be simplified greatly with implementation of a "Smart" Data Map, as opposed to the current Generic Data Map: such map would automate certain operations that currently require checker-side support. In fact, otherwise alpha-renaming itself, when implemented with whole-state-renaming passes, would require checker-side support for renaming GDM structures. The core needs to have access to checker structures in the program state. Designing a declarative language for coding typestate-analysis checkers would probably be a part of this project.

- Incremental development of any approved approaches "under a flag" is welcome, i guess that's the way we're going to proceed.

I hope i didn't forget anything important.

Hi!

Do you have some updates? I checked the repository, and there are python scripts indicating cross TU analysis support (and there were also some ASTImporter work). I was wondering what is the state of this? What is the approach you are using? Does that approach work, when the unified AST possibly not fit into the memory?

Best Regards,

Gábor

Hi!

Hello,

Yeah, there's one particular known-issue with aliasing, which i mentioned earlier (http://lists.llvm.org/pipermail/cfe-dev/2015-October/045704.html):

    int a;
    void foo(int *x, int *y) {
      *x = 0;
      *y = 1;
    }
    void bar() {
      foo(&a, &a);
      clang_analyzer_eval(a); // expected-warning{{TRUE}}
    }

Here we may randomly get TRUE or FALSE, kind of race condition.

We didn't fix it yet, and it can be fixed by adding timestamps to stores, i.e. which bind was earlier.

With timestamps, i think the store would be fine.

On the other hand, i don't think that there are other places that explicitly rely on assuming different symbols to represent *different* values. It's only true for the store, which expects different base-regions to non-overlap, even when the regions are symbolic. Say, the constraint manager, upon encountering "if (a == b)", should split the state (we did enable SymSymExpr for this to work), rather than produce always-false.

And then actualization can take care of the rest. Say the checker, upon encountering, say, "fclose(f1); fclose(f2);" and then later, when applying the summary, finding out that f1 and f2 is the same descriptor, would detect the problem upon actualization of the respective symbols to the same symbol. Or the store would correctly merge bindings to different sub-regions of aliased regions even if they were not aliased outside the context.

So, apart from the issues that boil down to the example above, i'm not instantly aware of aliasing problems yet.