Missing edges in analysis call graph

Hi all,

I’ve been working with the static analyzer and I’m having issues with the call graph that it produces. This is the same call graph that is output by the debug.DumpCallGraph and debug.ViewCallGraph checkers. I’m fairly new to clang, so I would like some feedback on this. Maybe I’ve missed the mark entirely or there is a better way to fix it. Thanks in advance, the static analyzer is awesome.

With clang version 3.6.0 (trunk 222230), running -analyzer-checker=debug.DumpCallGraph for two equivalent pieces of code produces different results:

First sample

Thanks Daniel. This looks like a bug.

Anna: I think this might be your area of the analyzer. Can you review Daniel's patch?

Thanks Daniel. This looks like a bug.

Anna: I think this might be your area of the analyzer. Can you review Daniel’s patch?

Hi all,

I’ve been working with the static analyzer and I’m having issues with the call graph that it produces. This is the same call graph that is output by the debug.DumpCallGraph and debug.ViewCallGraph checkers. I’m fairly new to clang, so I would like some feedback on this. Maybe I’ve missed the mark entirely or there is a better way to fix it. Thanks in advance, the static analyzer is awesome.

With clang version 3.6.0 (trunk 222230), running -analyzer-checker=debug.DumpCallGraph for two equivalent pieces of code produces different results:

First sample

void a() {}
void b() { a(); }

— Call graph Dump —
Function: < root > calls: a b
Function: b calls: a
Function: a calls:

Second sample

void a();
void b() { a(); }
void a() {}

— Call graph Dump —
Function: < root > calls: b a
Function: a calls:
Function: b calls:

Shouldn’t the call graph in the second code sample also have an edge from b to a?

Thanks for finding this!

I think this happens because the check at /lib/Analysis/CallGraph.cpp:119 throws out Decls based on isThisDeclarationADefinition(). The comment says that this is used to discard template definitions, but the check is too broad.

I suspect the comment talks about the second check (isDependentContext). I think what we wanted is to include all functions for which we have definitions(bodies). The analyzer is currently the only user of this and it is not interested in calls for which we do not have definitions. (This could be changed if we have more users.)

In the case above, the definition for ‘a()’ is visible in the translation unit, so we should include it. I believe with your patch, the node will get added, but the definition used will not have the body attached to it, so the analyzer will not process the body.

How about changing the logic in the CallGraph to check if there is a redeclaration with the body and insert that one instead of the one referenced by the call? (This could also be handled on the analyzer side, but I think all the CallGraph users might prefer the declarations that have the bodies attached when those are available.) I would try to use hasBody(), followed by getBody(). What do you think?

/// getBody - Retrieve the body (definition) of the function. The
/// function body might be in any of the (re-)declarations of this
/// function. The variant that accepts a FunctionDecl pointer will
/// set that function declaration to the actual declaration
/// containing the body (if there is one).
/// NOTE: For checking if there is a body, use hasBody() instead, to avoid
/// unnecessary AST de-serialization of the body.
Stmt *getBody(const FunctionDecl *&Definition) const;

Thanks for your reply Anna. I finally got a chance to play around with this again today.

I’m thinking something along the lines of:

  • // If this is a function declaration without a body,
  • // then substitute the declaration that does have a body.
  • // (unchanged if no such definition exists)
  • FunctionDecl *SwapForDefinition(FunctionDecl *FD) {
  • FunctionDecl *Definition = FD;
  • if (! FD->doesThisDeclarationHaveABody())
  • for (auto I : FD->redecls())
  • if (I->doesThisDeclarationHaveABody())
  • Definition = I;

This is a good point. We might not be gaining anything by replacing the decl with the one that has a body since getBody is guaranteed to search the redeclarations regardless.

How about just removing both calls to isThisDeclarationADefinition() (from includeInGraph) and replacing the call to getBody with hasBody (to avoid the unnecessary AST de-serialization of the body)? That should work and would simplify the function greatly. What do you think?

I am not sure why some of the checks in your first patch were necessary (like this one).

  • // Skip definitions with previous declarations
  • if (FD->getPreviousDecl())
  • return false;

I believe with your patch, the node will get added, but the definition

used will not have the body attached to it, so the analyzer will not
process the body.

Hmm, doesn't the analyzer itself use getBody() to access the body, thereby
walking the list of redeclarations? It seems like it is still processing
the body of the function, but maybe I don't know what to look for.

This is a good point. We might not be gaining anything by replacing the
decl with the one that has a body since getBody is guaranteed to search the
redeclarations regardless.

How about just removing both calls to isThisDeclarationADefinition() (from
includeInGraph) and replacing the call to getBody with hasBody (to avoid
the unnecessary AST de-serialization of the body)? That should work and
would simplify the function greatly. What do you think?

I think replacing getBody() with hasBody() is a good idea regardless of the
approach taken to fix the missing edges. And avoiding node replacement
would make it cleaner.

I am not sure why some of the checks in your first patch were necessary

(like this one).
+ // Skip definitions with previous declarations
+ if (FD->getPreviousDecl())
+ return false;
+

The reason I put that in was to avoid inserting redundant nodes. Just
removing isThisDeclarationADefinition() yields:

void a();
void b() { a(); }
void a() { }

--- Call graph Dump ---
  Function: < root > calls: a b a
  Function: a calls:
  Function: b calls: a
  Function: a calls:

Since call expressions seem to reference the first declaration, I followed
suit. I still need to verify this is always the case.

The other check for templates is unnecessary.

Oh, I see! getOrInsertNode() would not insert an existing declaration, but it does not check for re-declarations.

But would your approach work for this set?
void a();

void a() { }
void b() { a(); } // Here, a call to “a()” might not be added since “a()” has a previous declaration.

Sounds like we would want to store canonical declarations, which brings us back to the suggestion of using “hasBody(Definition)” and only store the definitions that have bodies. Changing the signature of “includeInGraph” so that it mutates the Decl is fine; though, you might want to rename it as well.

Anna.

I believe with your patch, the node will get added, but the definition

used will not have the body attached to it, so the analyzer will not
process the body.

Hmm, doesn't the analyzer itself use getBody() to access the body,
thereby walking the list of redeclarations? It seems like it is still
processing the body of the function, but maybe I don't know what to look
for.

This is a good point. We might not be gaining anything by replacing the
decl with the one that has a body since getBody is guaranteed to search the
redeclarations regardless.

How about just removing both calls to isThisDeclarationADefinition()
(from includeInGraph) and replacing the call to getBody with hasBody (to
avoid the unnecessary AST de-serialization of the body)? That should work
and would simplify the function greatly. What do you think?

I think replacing getBody() with hasBody() is a good idea regardless of
the approach taken to fix the missing edges. And avoiding node replacement
would make it cleaner.

I am not sure why some of the checks in your first patch were necessary

(like this one).
+ // Skip definitions with previous declarations
+ if (FD->getPreviousDecl())
+ return false;
+

The reason I put that in was to avoid inserting redundant nodes. Just
removing isThisDeclarationADefinition() yields:

Oh, I see! getOrInsertNode() would not insert an existing declaration, but
it does not check for re-declarations.

But would your approach work for this set?
void a();
void a() { }
void b() { a(); } // Here, a call to "a()" might not be added since "a()"
has a previous declaration.

Alas, yes, you're right. I can't believe I didn't try that. Not going to
work.

Sounds like we would want to store canonical declarations, which brings us

back to the suggestion of using "hasBody(Definition)" and only store the
definitions that have bodies. Changing the signature of "includeInGraph" so
that it mutates the Decl is fine; though, you might want to rename it as
well.

Yep. The problem isn't with includeInGraph, though. It's with
hasBody(Definition). Since it takes a reference to a pointer-to-const,
using it get the canonical declaration means the canonical declaration
itself must be const. So pretty much all of the Decl* in CallGraph would
have to get changed to const.

I only know of two ways around it:
1) Change the FunctionDecl class
2) Walk redecls in some fashion similar to my second patch, which
essentially just re-implemented hasBody without forcing the declaration to
be const.

Any other way?

Looks like we can use getCanonicalDecl() instead (in both FunctionDecl and ObjCMethodDecl case).

Ah! Excellent. How aptly named :slight_smile:

Many thanks.

:slight_smile:

I’ll be on vacation next week.

Seems like we have a solution here. I might be able to approve the patch, but won’t be able to apply it. You should send the patch to cfe-dev and CC me.

Thank you!
Anna.

OK, here’s another go at it. This fixes the two test cases discussed, and doesn’t break anything else that I can see.

Because callees do not always match the canonical declaration, I swap FunctionDecls for their canonical counterparts when they are visited.

This differs from what we discussed by not using the canonical declaration for ObjCMethodDecls. This is because hasBody and getBody work differently for ObjCMethodDecls than FunctionDecls. Using the canonical declaration breaks the call graph. There isn’t a problem with the call graph created for ObjCMethodDecls, so I have left them alone. Also for ObjCMethodDecls, isThisDeclarationADefinition is just { return hasBody(); }, so that includeInGraph check is redundant.

Thoughts?

Hi Daniel,

This one prints “Function: eee calls:” twice. I think the check should be done inside getOrInsertNode (instead of VisitFunctionDecl). What do you think? If this looks good, I am going to go ahead and commit the patch. (I’ll also replace CHECK with CHECK-NEXT.)

CallGraphNode *CallGraph::getOrInsertNode(Decl *F) {
if (F && !isa(F))
F = F->getCanonicalDecl();

Anna.

Anna,

Yes, this is better. I had originally done the same but, unfortunately, moved it to avoid isa<> and didn’t notice the extra line. It makes more sense to have it in getOrInsertNode.

Thanks you for your help and patience,

Daniel

Committed in r224398.

Thank you!
Anna.