Adding a CFG node for the allocation call in a new-expression

Hi, Ted (and list). After the negative feedback on representing new-expression allocations using CXXOperatorCallExpr, I went back to the original idea of adding a CFG node. Without properly modelling constructors this doesn't gain us very much right now, but it does set the stage for eventually being able to get all of the behavior of a new-expression into the analyzer.

Missing pieces:

- What does it mean to be PreStmt<CXXNewExpr> if the allocation and the initializer have already happened? (Note that right neither PreStmt nor PostStmt checks are done on CXXNewExpr, but that's easily fixed.)

- How can we represent "calls" that don't have associated expressions? I'm starting to think CallOrObjCMessage needs to turn into a real "abstract call" model which may or may not have expressions for the arguments and can handle regular calls, messages, constructors, destructors, allocations, and deletions. New checks: preCall and postCall. But that's a bit outside the scope of this patch.

Anyway, how does this part look?
Jordy

CFGAllocation.patch (25.6 KB)

Updated patch that doesn't crash horribly on trunk.

CFGAllocation.patch (25.6 KB)

I’m a bit late to the party, but…

I like this approach, it should be enough to start with. I have a very hacked approach in my own tree for handling new re: checker work I’m doing, but I like the idea of the CFG node better.

Regarding the PreStmt, perhaps we could trigger it when encountering the allocation CFG node (before allocation/initialisation), and then trigger the PostStmt as normal after encounting the new expression? It kind of breaks the generalised pre/post visit that we plan to implement, but maybe that will be a problem solved by the abstract call model you propose.

Another thing that Ted, Zhongxing and I discussed off-list was making a new memory region type specifically for C++ allocated memory regions. We don’t currently have anything that completely matches the model (typed, system allocated, independent.) I’ve been using a hacked together version of this to help detect mismatched (de)allocations involving new/delete.

Looking forward to other opinions.

Tom

Hi, Ted (and list). After the negative feedback on representing new-expression allocations using CXXOperatorCallExpr, I went back to the original idea of adding a CFG node. Without properly modelling constructors this doesn't gain us very much right now, but it does set the stage for eventually being able to get all of the behavior of a new-expression into the analyzer.

Missing pieces:

- What does it mean to be PreStmt<CXXNewExpr> if the allocation and the initializer have already happened? (Note that right neither PreStmt nor PostStmt checks are done on CXXNewExpr, but that's easily fixed.)

What are the issues with calling PreStmt<CXXNewExpr> before evaluation of any statements related to new (except that it's more difficult to implement)? That's what checker writers would expect from the callback.

- How can we represent "calls" that don't have associated expressions? I'm starting to think CallOrObjCMessage needs to turn into a real "abstract call" model which may or may not have expressions for the arguments and can handle regular calls, messages, constructors, destructors, allocations, and deletions. New checks: preCall and postCall. But that's a bit outside the scope of this patch.

Is this needed to allow callbacks on the allocation? Do we have to generalize here or could we just provide a separate callback for it (if someone needs it)?

- What does it mean to be PreStmt<CXXNewExpr> if the allocation and the initializer have already happened? (Note that right neither PreStmt nor PostStmt checks are done on CXXNewExpr, but that's easily fixed.)

What are the issues with calling PreStmt<CXXNewExpr> before evaluation of any statements related to new (except that it's more difficult to implement)? That's what checker writers would expect from the callback.

Now that the CFG is linearized, it's hard to know what that means. For a normal call, the arguments are evaluated before the PreStmt<CallExpr> callback, then the call itself is inlined or modelled, then there's the PostStmt<CallExpr> callback. For a CXXNewExpr, though, the CFG looks like this:

1. Evaluate all placement args.
2. CFGAllocation ***
3. Evaluate all constructor args.
4. Evaluate the constructor.
5. Evaluate the CXXNewExpr.

It's impractical to put the PreStmt<CXXNewExpr> call before all of that, because that removes the whole effort of a linearized CFG. The constructor args really might not be evaluated until after the allocation, according to the standard, but moving them is impractical since there /is/ a CXXConstructExpr (or similar) inside the CXXNewExpr, and the CFG treats that like any other expression. IIRC, we don't have any other Pre/Post pairs that stretch across multiple CFG nodes.

- How can we represent "calls" that don't have associated expressions? I'm starting to think CallOrObjCMessage needs to turn into a real "abstract call" model which may or may not have expressions for the arguments and can handle regular calls, messages, constructors, destructors, allocations, and deletions. New checks: preCall and postCall. But that's a bit outside the scope of this patch.

Is this needed to allow callbacks on the allocation? Do we have to generalize here or could we just provide a separate callback for it (if someone needs it)?

It's certainly possible to add a separate callback; what I'm thinking of, though, is inlining and generic checks. Right now RetainCountChecker implements several callbacks and awkwardly filters them into a single evalSummary. AttrNonNullChecker checks the "nonnull" attribute, which ought to apply to the allocation call as much as anything else. I feel like the majority of call checkers that /don't/ deal with specific functions (i.e. unlike CStringChecker and such) probably need to reason about /every/ sort of call.

This is a long-term idea, though; I'm not angling to put it in right now.

Jordy

I've looked at this a little, too; it'd be nice to consider malloc() regions to be known-heap regions. Part of the problem is that we can't have a symbol associated with a region /and/ store information about the region. Figuring out a way to support conjured regions with known super-regions, and/or associate non-conjured regions with symbols might turn out to be a good general solution.

(Just throwing ideas out, haven't thought it all the way through.)

Jordy

- What does it mean to be PreStmt<CXXNewExpr> if the allocation and the initializer have already happened? (Note that right neither PreStmt nor PostStmt checks are done on CXXNewExpr, but that's easily fixed.)

What are the issues with calling PreStmt<CXXNewExpr> before evaluation of any statements related to new (except that it's more difficult to implement)? That's what checker writers would expect from the callback.

Now that the CFG is linearized, it's hard to know what that means. For a normal call, the arguments are evaluated before the PreStmt<CallExpr> callback, then the call itself is inlined or modelled, then there's the PostStmt<CallExpr> callback. For a CXXNewExpr, though, the CFG looks like this:

1. Evaluate all placement args.
2. CFGAllocation ***
3. Evaluate all constructor args.
4. Evaluate the constructor.
5. Evaluate the CXXNewExpr.

It's impractical to put the PreStmt<CXXNewExpr> call before all of that, because that removes the whole effort of a linearized CFG. The constructor args really might not be evaluated until after the allocation, according to the standard, but moving them is impractical since there /is/ a CXXConstructExpr (or similar) inside the CXXNewExpr, and the CFG treats that like any other expression. IIRC, we don't have any other Pre/Post pairs that stretch across multiple CFG nodes.

If I understand correctly, we want to evaluate the arguments of the constructor before PreStmt<CallExpr>, but it is difficult to do because their evaluation is a part of evaluating CXXConstructExpr, which occurs after CFGAllocation in the CFG. But the following order would be best but difficult to implement. The standard allows evaluation of constructor args before allocation, correct? And the allocation is a part of CXXNewExpr as far as AST is concerned?
(just trying to understand the issue..)

1. Evaluate all placement args.
2. Evaluate all constructor args.
4. Evaluate PreStmt<CXXNewExpr>
4. Evaluate PreStmt< CXXConstructExpr >
5. Evaluate the constructor.
6. Evaluate CXXNewExpr.
...

- How can we represent "calls" that don't have associated expressions? I'm starting to think CallOrObjCMessage needs to turn into a real "abstract call" model which may or may not have expressions for the arguments and can handle regular calls, messages, constructors, destructors, allocations, and deletions. New checks: preCall and postCall. But that's a bit outside the scope of this patch.

Is this needed to allow callbacks on the allocation? Do we have to generalize here or could we just provide a separate callback for it (if someone needs it)?

It's certainly possible to add a separate callback; what I'm thinking of, though, is inlining and generic checks. Right now RetainCountChecker implements several callbacks and awkwardly filters them into a single evalSummary. AttrNonNullChecker checks the "nonnull" attribute, which ought to apply to the allocation call as much as anything else. I feel like the majority of call checkers that /don't/ deal with specific functions (i.e. unlike CStringChecker and such) probably need to reason about /every/ sort of call.

I agree about having a generic pre/post call callbacks. I was not sure why you'd need to have CallOrObjCMessage with no expression. I am guessing the issue is that you don't want to treat CXXNewExpr as one call, but 2 calls: calling the constructor and calling the allocation. In this case, you don't have an expression associated with an allocation (unless you associate something like CXXNewExpr for example).

- What does it mean to be PreStmt<CXXNewExpr> if the allocation and the initializer have already happened? (Note that right neither PreStmt nor PostStmt checks are done on CXXNewExpr, but that's easily fixed.)

What are the issues with calling PreStmt<CXXNewExpr> before evaluation of any statements related to new (except that it's more difficult to implement)? That's what checker writers would expect from the callback.

Now that the CFG is linearized, it's hard to know what that means. For a normal call, the arguments are evaluated before the PreStmt<CallExpr> callback, then the call itself is inlined or modelled, then there's the PostStmt<CallExpr> callback. For a CXXNewExpr, though, the CFG looks like this:

1. Evaluate all placement args.
2. CFGAllocation ***
3. Evaluate all constructor args.
4. Evaluate the constructor.
5. Evaluate the CXXNewExpr.

It's impractical to put the PreStmt<CXXNewExpr> call before all of that, because that removes the whole effort of a linearized CFG. The constructor args really might not be evaluated until after the allocation, according to the standard, but moving them is impractical since there /is/ a CXXConstructExpr (or similar) inside the CXXNewExpr, and the CFG treats that like any other expression. IIRC, we don't have any other Pre/Post pairs that stretch across multiple CFG nodes.

If I understand correctly, we want to evaluate the arguments of the constructor before PreStmt<CallExpr>, but it is difficult to do because their evaluation is a part of evaluating CXXConstructExpr, which occurs after CFGAllocation in the CFG. But the following order would be best but difficult to implement. The standard allows evaluation of constructor args before allocation, correct? And the allocation is a part of CXXNewExpr as far as AST is concerned?
(just trying to understand the issue..)

1. Evaluate all placement args.
2. Evaluate all constructor args.
4. Evaluate PreStmt<CXXNewExpr>
4. Evaluate PreStmt< CXXConstructExpr >
5. Evaluate the constructor.
6. Evaluate CXXNewExpr.
...

The allocation is just the CXXNewExpr in the AST, but the reason the allocation ought to happen /before/ the constructor is so that the constructor can initialize the right region. Right now that doesn't matter since we're just conjuring a symbol, but if we even get to modelling placement new, let alone handling arbitrary allocators or a new region type like Tom mentioned, we're going to need to do better.

- How can we represent "calls" that don't have associated expressions? I'm starting to think CallOrObjCMessage needs to turn into a real "abstract call" model which may or may not have expressions for the arguments and can handle regular calls, messages, constructors, destructors, allocations, and deletions. New checks: preCall and postCall. But that's a bit outside the scope of this patch.

Is this needed to allow callbacks on the allocation? Do we have to generalize here or could we just provide a separate callback for it (if someone needs it)?

It's certainly possible to add a separate callback; what I'm thinking of, though, is inlining and generic checks. Right now RetainCountChecker implements several callbacks and awkwardly filters them into a single evalSummary. AttrNonNullChecker checks the "nonnull" attribute, which ought to apply to the allocation call as much as anything else. I feel like the majority of call checkers that /don't/ deal with specific functions (i.e. unlike CStringChecker and such) probably need to reason about /every/ sort of call.

I agree about having a generic pre/post call callbacks. I was not sure why you'd need to have CallOrObjCMessage with no expression. I am guessing the issue is that you don't want to treat CXXNewExpr as one call, but 2 calls: calling the constructor and calling the allocation. In this case, you don't have an expression associated with an allocation (unless you associate something like CXXNewExpr for example).

Right. And I'm figuring the same infrastructure would be used for destructors (and [[cleanup]] attributes).

Correct, they're indeterminately sequenced. AFAIK, in practice, everyone evaluates the constructor arguments after allocation. This generally creates better code — for example, if the constructor is an elidable copy-construction, you can just evaluate the constructor argument in-place. For example:
  struct A { ...; static A make(); };
  ...
  new A(A::make());

Obviously you don't necessarily need to model it that way in the CFG, though.

John.

Now that the CFG is linearized, it’s hard to know what that means. For a normal call, the arguments are evaluated before the PreStmt callback, then the call itself is inlined or modelled, then there’s the PostStmt callback. For a CXXNewExpr, though, the CFG looks like this:

  1. Evaluate all placement args.
  1. CFGAllocation ***
  1. Evaluate all constructor args.
  1. Evaluate the constructor.
  1. Evaluate the CXXNewExpr.

It’s impractical to put the PreStmt call before all of that, because that removes the whole effort of a linearized CFG. The constructor args really might not be evaluated until after the allocation, according to the standard, but moving them is impractical since there /is/ a CXXConstructExpr (or similar) inside the CXXNewExpr, and the CFG treats that like any other expression. IIRC, we don’t have any other Pre/Post pairs that stretch across multiple CFG nodes.

If I understand correctly, we want to evaluate the arguments of the constructor before PreStmt, but it is difficult to do because their evaluation is a part of evaluating CXXConstructExpr, which occurs after CFGAllocation in the CFG. But the following order would be best but difficult to implement. The standard allows evaluation of constructor args before allocation, correct? And the allocation is a part of CXXNewExpr as far as AST is concerned?

(just trying to understand the issue…)

  1. Evaluate all placement args.
  1. Evaluate all constructor args.
  1. Evaluate PreStmt
  1. Evaluate CFGAllocation
  1. Evaluate PreStmt< CXXConstructExpr >
  1. Evaluate the constructor.
  1. Evaluate rest of CXXNewExpr.

The allocation is just the CXXNewExpr in the AST, but the reason the allocation ought to happen /before/ the constructor is so that the constructor can initialize the right region. Right now that doesn’t matter since we’re just conjuring a symbol, but if we even get to modelling placement new, let alone handling arbitrary allocators or a new region type like Tom mentioned, we’re going to need to do better.

I’ve just realized that I omitted the allocation from the sequence. I think it might be best if we process PreStmt before evaluating the constructor AND CFGAllocation. This might not be trivial to implement, but seems to be the right thing to do.