EQTDDataStructures omits obvious, direct callee from DSCallGraph

I am using EQTDDataStructures (from the poolalloc project) to resolve indirect function calls to over-approximated sets of possible callees. Unfortunately I find that it yields incorrect results even on a very simple test input. My LLVM and poolalloc sources are Subversion trunk checkouts, no more than a day older than the current trunk head. My test input is the following C source, compiled to bitcode using Clang:

  void foo();

  void test()
  {
    foo();
  }

It should be rather obvious that each of the two call sites in test() must have exactly one callee: foo(). However, DSCallGraph reports *zero* possible callees at the first call site. (It does correctly report foo() as the one possible callee at the second call site.)

Why is the analysis getting this wrong? Of course I am trying to build up to more complex, indirect calls. But right now this very simple case has me stuck.

Attached below is the complete source code for my pass which prints out the possible callees at each call site. The important logic is at the end, in ShowCallGraph::runOnBasicBlock(). It seems simple enough. Am I doing something wrong here?

Thanks,
Ben

ShowCallGraph.cpp (1.37 KB)

I am using EQTDDataStructures (from the poolalloc project) to resolve
indirect function calls to over-approximated sets of possible callees.

If I remember correctly, it only tries to resolve indirect calls. The
analysis doesn't track direct calls because you can do it just as well
yourself.

Andrew

Andrew Lenharth wrote:

If I remember correctly, it only tries to resolve indirect calls. The
analysis doesn't track direct calls because you can do it just as well
yourself.

DSCallGraph::callee_begin() and DSCallGraph::callee_end() cooperate to
iterate over an empty set (EmptyActual) for any call site not found in
the ActualCallees map. So if direct calls are not tracked, then I would
expect the callee iterators for *both* direct calls to yield this empty
set. Getting the empty set for one direct call but a correct singleton
set for the other direct call is a troubling inconsistency that leaves
me unsure which results I can trust and which I cannot.

Hi Ben,

I just tested the example you gave, and get the same results--one set
is empty, the other contains the expected single function.

As Andrew mentioned, DSA handles indirect and direct callsites
differently, and for direct callsites it's expected that the user
simply looks at the function itself to determine what is called.

In this example, we only track one callsite in test() since as far as
alias analysis goes, the effects of both callsites are same, and we
don't need to consider both callsites to build our analysis. This kind
of optimization is extremely useful in making DSA run on larger codes.

That said, we probably could more cleanly export this information to
the user, but for now just handle direct calls yourself. Sorry for
the confusion :).

Hope this helps!

~Will

Will Dietz wrote:

DSA handles indirect and direct callsites differently, and for direct callsites it's expected that the user
simply looks at the function itself to determine what is called.

In this example, we only track one callsite in test() since as far as
alias analysis goes, the effects of both callsites are same

OK, that definitely makes sense, including why one direct call site gives the expected singleton set while the other gives the empty set. Thank you (and Andrew!) for the clarification.

That said, we probably could more cleanly export this information to
the user, but for now just handle direct calls yourself.

I'd humbly suggest that one of the following changes be applied:

(1) Change DSCallGraph::callee_begin() to recognize when its argument is a direct call site, and return an iterator over a singleton set in that situation. This has the advantage of subsuming the direct-call check that many clients of this class might otherwise need to perform themselves. As an alternative, lazily add direct call sites (and their singleton callee sets) to ActualCallees on demand whenever such a site is passed as an argument to DSCallGraph::callee_begin().

(2) Change DSCallGraph::callee_begin() to fail an assertion when its argument is a direct call site. At least this provides some safety net to help clients (like me!) who do not realize that DSCallGraph does not claim to track direct calls reliably.

(3) If fix (1) is not used, then document this direct-calls-are-not-tracked-reliably restriction *somewhere*. I'm not sure where exactly, since "poolalloc/dsa-manual/manual.tex" does not mention DSCallGraph at all. At least it could be put into the "DSCallGraph.h" header as a comment somewhere.

Regards,
Ben