Replacing the analyzer's CallOrObjCMessage with something more general

Hi, Ted, Anna. In order to pave the way for proper handling of destructors and new-expressions, I'm working on replacing CallOrObjCMessage with a new class tree, CallAction. Like CallOrObjCMessage, CallAction bundles together a particular function/method/block invocation and a snapshot of where it occurred; unlike CallOrObjCMessage, it is done using "proper" OO and is a lot more maintainable.

The idea is that in many cases, we don't really care /what/ is being called as long as we can do something useful with it. Currently there are a lot of places that either switch based on what's being called (ExprEngine::invalidateArguments), or funnel a bunch of other checks into a common method (CallAndMessageChecker, RetainCountChecker -- currently using CallOrObjCMessage). Other checks like AttrNonNullChecker would probably be useful for constructors and possibly Objective-C methods, but currently just check CallExprs.

The other flaw with CallOrObjCMessage is that a lot of the checkers that use it assume that there is an expression in the source that corresponds to the call or for each argument, which is not true for destructors or operator new size arguments. CallAction will at least make it safe to ask for the SVal corresponding to an argument that isn't in the AST, and it should provide a source range for each argument that can be used for diagnostics.

Currently I have it set up like this:

CallAction

-AnyFunctionCall (things that might have FunctionDecls)
>-SimpleCall (things that are written as CallExprs)
> >-FunctionCall (simple C functions, and anything unknown)
> >-CXXMemberCall (calls to C++ non-static member functions)
> \-BlockCall (calls to blocks)***
>-CXXConstructorCall (calls to constructors)
>-*CXXDestructorCall
>-*CXXAllocator
>-*CXXDeallocator
\-*AttributeCleanup (__attribute__((cleanup))-style destructors)

\-ObjCMessageInvocation (explicit sends, property accesses, and eventually subscripting, via ObjCMessage)

The ones with * are not implemented yet. BlockCall is annotated with *** because while blocks are written as CallExprs, they do not have an associated FunctionDecl, meaning that SimpleCall's access to the arguments can be inherited by AnyFunctionCall's type-checking and such must be overridden. (Lambdas do not have this problem because they are actually objects with an operator(); they will show up in CXXMemberCall.)

Eventually I'd like to completely eliminate CallOrObjCMessage. ObjCMessage might still be good to keep around, since it's a little lighter (it doesn't wrap state and location context in with the expressions). We can then have new checker callbacks preCall and postCall, which don't care how a function or method is being invoked. (evalCall would probably also use CallAction, but still not be called for Objective-C messages.)

This first draft does use virtual methods. I haven't done any performance testing yet, but if it turns out to be an issue I can change to using "static dynamic" dispatch using a switch.

Finally, the names are definitely tentative. Any better suggestions?

Comments much appreciated,
Jordan

(attached: the base CallAction header file and one big diff of what I have so far)

CallAction.patch (62.5 KB)

Call.h (9.84 KB)

Basic performance testing (i.e., timing "TESTDIRS=Analysis make test" and "scan-build xcodebuild" on an existing Objective-C open source project) shows no change (within noise). I'm not sure of the best way to test the impact of this more deeply, but I would like to not be prematurely devirtualizing if there's no payoff for it.

Jordan

Hi Jordan,

I think this is great. It's a great infrastructural improvement that will serve as well. Comments inline.

Hi, Ted, Anna. In order to pave the way for proper handling of destructors and new-expressions, I'm working on replacing CallOrObjCMessage with a new class tree, CallAction. Like CallOrObjCMessage, CallAction bundles together a particular function/method/block invocation and a snapshot of where it occurred; unlike CallOrObjCMessage, it is done using "proper" OO and is a lot more maintainable.

Using proper OO is fine, and it remains lightweight.

The idea is that in many cases, we don't really care /what/ is being called as long as we can do something useful with it. Currently there are a lot of places that either switch based on what's being called (ExprEngine::invalidateArguments), or funnel a bunch of other checks into a common method (CallAndMessageChecker, RetainCountChecker -- currently using CallOrObjCMessage). Other checks like AttrNonNullChecker would probably be useful for constructors and possibly Objective-C methods, but currently just check CallExprs.

Makes sense. As you well know, many checkers *do* care what is being called, but I don't see why this model doesn't allow more specialized behavior when necessary.

The other flaw with CallOrObjCMessage is that a lot of the checkers that use it assume that there is an expression in the source that corresponds to the call or for each argument, which is not true for destructors or operator new size arguments. CallAction will at least make it safe to ask for the SVal corresponding to an argument that isn't in the AST, and it should provide a source range for each argument that can be used for diagnostics.

Sounds great to me. It abstracts away this detail, which is something checker author's shouldn't be worrying about anyway.

Currently I have it set up like this:

CallAction
>-AnyFunctionCall (things that might have FunctionDecls)
> >-SimpleCall (things that are written as CallExprs)
> > >-FunctionCall (simple C functions, and anything unknown)
> > >-CXXMemberCall (calls to C++ non-static member functions)
> > \-BlockCall (calls to blocks)***
> >-CXXConstructorCall (calls to constructors)
> >-*CXXDestructorCall
> >-*CXXAllocator
> >-*CXXDeallocator
> \-*AttributeCleanup (__attribute__((cleanup))-style destructors)
\-ObjCMessageInvocation (explicit sends, property accesses, and eventually subscripting, via ObjCMessage)

The ones with * are not implemented yet. BlockCall is annotated with *** because while blocks are written as CallExprs, they do not have an associated FunctionDecl, meaning that SimpleCall's access to the arguments can be inherited by AnyFunctionCall's type-checking and such must be overridden. (Lambdas do not have this problem because they are actually objects with an operator(); they will show up in CXXMemberCall.)

Sounds good. Note that many of these will eventually have proper CFG-support as well, including __attribute((cleanup)) style destructors. Having CallAction wrap the appropriate CFG elements might be the way to go, but we can implement those as the needed functionality comes up in libAnalysis.

Eventually I'd like to completely eliminate CallOrObjCMessage. ObjCMessage might still be good to keep around, since it's a little lighter (it doesn't wrap state and location context in with the expressions). We can then have new checker callbacks preCall and postCall, which don't care how a function or method is being invoked. (evalCall would probably also use CallAction, but still not be called for Objective-C messages.)

Is there really any benefit to keeping ObjCMessage around and just use ObjCMethodInvocation? The cost you are talking about is insignificant compare to where the real performance issues are with the analyzer. These objects are created far too infrequently to matter in practice.

This first draft does use virtual methods. I haven't done any performance testing yet, but if it turns out to be an issue I can change to using "static dynamic" dispatch using a switch.

I think virtual methods are fine. Again, this logic is called for too infrequently to really matter in practice. The cost of the dynamic dispatch is dwarfed by the work in the checkers themselves.

Finally, the names are definitely tentative. Any better suggestions?

CallAction seems like the name of a callback object. What about something like "Call". That's what you name the header file anyway. It's succinct, and is in a completely different namespace then the rest of the compiler.

My other suggestion is to make ObjCMessageInvocation be ObjCMethodCall. The "Call" part keeps the name matching with the rest of the subclasses of CallAction. While the language construct is a message expression, the semantics we are trying to model is a call. This seems natural to me since CallAction deals with the semantics, not the syntax.

Hi, Ted, Anna. In order to pave the way for proper handling of destructors and new-expressions, I'm working on replacing CallOrObjCMessage with a new class tree, CallAction. Like CallOrObjCMessage, CallAction bundles together a particular function/method/block invocation and a snapshot of where it occurred; unlike CallOrObjCMessage, it is done using "proper" OO and is a lot more maintainable.

The idea is that in many cases, we don't really care /what/ is being called as long as we can do something useful with it. Currently there are a lot of places that either switch based on what's being called (ExprEngine::invalidateArguments), or funnel a bunch of other checks into a common method (CallAndMessageChecker, RetainCountChecker -- currently using CallOrObjCMessage). Other checks like AttrNonNullChecker would probably be useful for constructors and possibly Objective-C methods, but currently just check CallExprs.

Makes sense. As you well know, many checkers *do* care what is being called, but I don't see why this model doesn't allow more specialized behavior when necessary.

Oops, this was supposed to be that most checkers don't care /how/ something is being called. This should giwe us a double-free warning, for example:

void *x __attribute__((cleanup(free)));
x = malloc(1);
free(x);

But yes, it should be easy to take apart this abstraction. And this is probably the right place for a few of the helper functions that we've been putting on CheckerContext.

Eventually I'd like to completely eliminate CallOrObjCMessage. ObjCMessage might still be good to keep around, since it's a little lighter (it doesn't wrap state and location context in with the expressions). We can then have new checker callbacks preCall and postCall, which don't care how a function or method is being invoked. (evalCall would probably also use CallAction, but still not be called for Objective-C messages.)

Is there really any benefit to keeping ObjCMessage around and just use ObjCMethodInvocation? The cost you are talking about is insignificant compare to where the real performance issues are with the analyzer. These objects are created far too infrequently to matter in practice.

I was thinking about our checker callbacks for Objective-C messages, and trying to keep a distinction between preObjCMessage and preGenericCall. The actual issue that arises there is that other checkers may have updated the state inside the Call. But I'm very carefully making the state and location context an internal detail of Call objects, so the only thing being exposed is values bound to expressions...which shouldn't be changed by callbacks… As long as we're careful about what accessors are actually state-dependent, it should be okay, and then it's a nice unification.

Finally, the names are definitely tentative. Any better suggestions?

CallAction seems like the name of a callback object. What about something like "Call". That's what you name the header file anyway. It's succinct, and is in a completely different namespace then the rest of the compiler.

I don't like "Call" because it's the natural variable name for these things. "C" is usually already a CheckerContext, and we're trying not to use lowercase variable names anymore. "Invocation"? "AbstractCall"? "AnyCall"?

Alternately, what's the natural variable name for a "Call" argument?

My other suggestion is to make ObjCMessageInvocation be ObjCMethodCall. The "Call" part keeps the name matching with the rest of the subclasses of CallAction. While the language construct is a message expression, the semantics we are trying to model is a call. This seems natural to me since CallAction deals with the semantics, not the syntax.

Fair enough.

(copying to list with compressed patches, because the series of patches resulted in a 400kb message)

All right, I think this ready to be merged into trunk!

Since there’s really way too much here, I’ll try to condense the important things:

  • CallEvent is a wrapper around some part of the source that triggers a call to something (function, constructor, method, etc.)
  • CallEvent is path-sensitive; it captures the state and location context where the call happens.
  • CallEvent and its subclasses replace CallOrObjCMessage everywhere possible.
  • CallEvent subclass ObjCMethodCall and its subclasses replace ObjCMessage everywhere possible.
  • Inlining for C++ constructors is now totally feasible, but still disabled because we don’t track destructors yet.
  • New checker callbacks check::preCall and check::postCall should be called any time ExprEngine processes something that could call outside code.
  • Calls.h attached for your convenience; everything else is in the ten patch files. (Sorry! If you’d prefer per-patch review or something, we can do that too.)

% git log1

5ed29a5 [analyzer] Convert existing checkers to use check::preCall and check::postCall.
a1940c6 [analyzer] Add generic preCall and postCall checks.
ef97dfa [analyzer] Convert CXXConstructExpr over to use CallEvent for evaluation.
ebf2c52 [analyzer] Use CallEvent for inlining and call default-evaluation.
94dde7d [analyzer] Finish replacing ObjCMessage with ObjCMethodDecl and friends.
7afaef7 [analyzer] Begin replacing ObjCMessage with ObjCMethodCall and friends.
ba3cb76 [analyzer] Move the last bits of CallOrObjCMessage over to CallEvent.
321bb87 [analyzer] Convert CallAndMessageChecker and ObjCSelfInitChecker to CallEvent.
30502d6 [analyzer] Convert RetainCountChecker to use CallEvent as much as possible.
bc61f9d [analyzer] Add a new abstraction over all types of calls: CallEvent

Calls.h (13.3 KB)

Patches.zip (67.9 KB)

Looks great, this makes my C++ constructor inlining significantly less hacky. I’ll have to integrate and see how it works with my branch, but I’m hoping to be able clean it up enough to contribute new/delete handling very soon, including MallocChecker support.

Tom

That would be awesome. Please take note of http://llvm.org/bugs/show_bug.cgi?id=12014 if you work on new/delete, though -- the upshot is that the visit order for the sub-operations in a new-expression is pretty unusual.

For my part, I'm planning to work on enabling destructors once this gets in; turning on constructors without destructors pretty much results in false positives everywhere. Once we have ctors, dtors, new, and delete, we'll probably be able to turn on analysis for a huge portion of C++ code.

Jordan

Landed in r159554-159563!