Representing the allocation in CXXNewExpr

Hi, all. PR12014 is tracking the analyzer's model of operator new. The bug was originally from Sebastian's refactoring of CXXNewExpr to use a subexpression for the initializer; coupled with the linearization of the analysis CFG, we ended up getting that part of the analysis for free. (That is, just looking at the subexpressions of CXXNewExpr will cause us to analyze the embedded initializer like any other constructor, for those new-expressions that use constructors.)

What we don't have a model for, however, is the allocation, and because of this the constructor doesn't actually have a reasonable "memory region" the analyzer can reason about later. While we could augment the CFG to put a new "allocation" node before the constructor, Ted came up with the question of whether this information ought to be in the AST itself.

I'm definitely not a lawyer on what is supposed to be explicit in the AST and what isn't; this seems to fall on the borderline of the general guideline of "preserve the structure in the original source". Looking at the existing appearances of CXXNewExpr doesn't help either -- adding an allocation sub-expression wouldn't significantly affect any of the existing code positively or negatively, AFAICT.

On the other hand, a new-expression's /behavior/ is basically this:
1. Call the allocation function.
2. If allocation succeeds, call the constructor.
3. If the constructor fails, call the deallocation function.*

Seeing each of those "calls" makes me think something should be abstracted out. The deallocation is a little funny because the arguments have already been evaluated, but the allocation function call really is equivalent to calling an "operator new" function directly.

(That made me think CXXOperatorCallExpr, but the source range for a new-allocation is a lot messier. Possibly a new CallExpr subclass?)

Any ExprCXX experts have thoughts on this? It would be very tidy for the analyzer, but of course that alone shouldn't be the reason for a major representation change like this.

Thank you,
Jordy

* Today I learned about placement delete. Yuck.

I think it would make sense to represent at least the call to operator new as its own CallExpr. I'd leave the destructor as a simple FunctionDecl pointer in the CXXNewExpr, though, since correctly modeling that call sounds hard.

Sebastian

Hi, all. PR12014 is tracking the analyzer's model of operator new. The bug was originally from Sebastian's refactoring of CXXNewExpr to use a subexpression for the initializer; coupled with the linearization of the analysis CFG, we ended up getting that part of the analysis for free. (That is, just looking at the subexpressions of CXXNewExpr will cause us to analyze the embedded initializer like any other constructor, for those new-expressions that use constructors.)

What we don't have a model for, however, is the allocation, and because of this the constructor doesn't actually have a reasonable "memory region" the analyzer can reason about later. While we could augment the CFG to put a new "allocation" node before the constructor, Ted came up with the question of whether this information ought to be in the AST itself.

I'm definitely not a lawyer on what is supposed to be explicit in the AST and what isn't; this seems to fall on the borderline of the general guideline of "preserve the structure in the original source". Looking at the existing appearances of CXXNewExpr doesn't help either -- adding an allocation sub-expression wouldn't significantly affect any of the existing code positively or negatively, AFAICT.

On the other hand, a new-expression's /behavior/ is basically this:
1. Call the allocation function.
2. If allocation succeeds, call the constructor.
3. If the constructor fails, call the deallocation function.*

Seeing each of those "calls" makes me think something should be abstracted out. The deallocation is a little funny because the arguments have already been evaluated, but the allocation function call really is equivalent to calling an "operator new" function directly.

I think it would make sense to represent at least the call to operator new as its own CallExpr.

Yes, that makes sense.

I'd leave the destructor as a simple FunctionDecl pointer in the CXXNewExpr, though, since correctly modeling that call sounds hard.

Agreed, and there's precedent for just keeping around this pointer rather than forming an AST to call the destructor.

Okay, here's my first pass on this. It's not going to be as useful as I thought because the size argument /still/ needs to be treated specially in a lot of places, but it still helps the analyzer (or will when I put the appropriate changes through).

This passes the test suite, but I'm not sure if I missed any edge cases, especially when the expression is type-dependent.

CXXNewExpr-allocation.patch (29.1 KB)

Hi Jordan,

// Was the usage ::new, i.e. is the global new to be used?
bool GlobalNew : 1;

  • // Do we allocate an array? If so, the first SubExpr is the size expression.
  • bool Array : 1;
    // If this is an array allocation, does the usual deallocation
    // function for the allocated type want to know the allocated size?
    bool UsualArrayDeleteWantsSize : 1;
  • // The number of placement new arguments.
  • unsigned NumPlacementArgs : 13;

With the number of bits needed by flags down this much, is it now feasible to sink this into the shared bitfield of Stmt?

  • // FIXME: Should these be removed?
    typedef ExprIterator arg_iterator;
    typedef ConstExprIterator const_arg_iterator;

Really depends on who uses them and what usage patterns these places have. Does it make sense for them to iterate over the placement-new arguments?

— a/include/clang/StaticAnalyzer/Core/PathSensitive/ObjCMessage.h
+++ b/include/clang/StaticAnalyzer/Core/PathSensitive/ObjCMessage.h
@@ -196,10 +196,18 @@ public:

/// Check if the callee is declared in the system header.
bool isInSystemHeader() const {

  • if (const Decl *FD = getDecl()) {
  • const SourceManager &SM =
  • State->getStateManager().getContext().getSourceManager();
  • return SM.isInSystemHeader(FD->getLocation());
  • if (const Decl *D = getDecl()) {
  • SourceLocation Loc = D->getLocation();
  • if (Loc.isValid()) {
  • const SourceManager &SM =
  • State->getStateManager().getContext().getSourceManager();
  • return SM.isInSystemHeader(Loc);
  • } else if (const FunctionDecl *FD = dyn_cast(D)) {
  • // Operator new and operator delete may be implicitly declared.
  • // We can’t just check /any/ implicit declarations, though, because
  • // C89 allows any function call to create an implicit declaration.
  • return FD->isOverloadedOperator() && FD->isImplicit();
  • }
    }
    return false;
    }

This looks unrelated.

  • // FIXME: Can the array size ever throw?
  • if (const Expr *ArraySize = cast(this)->getArraySize())
  • CT = MergeCanThrow(CT, ArraySize->CanThrow(C));

Of course it can throw.
new int[callSomeFunction()]

  • } else if (Kind == OO_New || Kind == OO_Array_New) {
  • // FIXME: Should the array brackets/array size be included for new?
  • if (getRParenLoc().isValid())
  • // There are explicit placement args.
  • return SourceRange(getArg(0)->getSourceRange().getBegin(), getRParenLoc());
  • else
  • // There are no explicit placement args.
  • return getArg(0)->getSourceRange();

I think it would be better, but perhaps inconsistent with non-array new, since I don’t think we include the type range there, do we?

— a/lib/AST/StmtPrinter.cpp
+++ b/lib/AST/StmtPrinter.cpp
@@ -1154,6 +1154,8 @@ void StmtPrinter::VisitCXXOperatorCallExpr(CXXOperatorCallExpr *Node) {
OS << ‘[’;
PrintExpr(Node->getArg(1));
OS << ‘]’;

  • } else if (Kind == OO_New || Kind == OO_Array_New) {
  • llvm_unreachable(“should be handled within CXXNewExpr”);
    } else if (Node->getNumArgs() == 1) {
    OS << OpStrings[Kind] << ’ ';
    PrintExpr(Node->getArg(0));
    @@ -1407,13 +1409,17 @@ void StmtPrinter::VisitCXXNewExpr(CXXNewExpr *E) {
    if (E->isGlobalNew())
    OS << “::”;
    OS << "new ";
  • unsigned NumPlace = E->getNumPlacementArgs();
  • if (NumPlace > 0) {
  • if (E->getNumPlacementArgs() > 0) {
    OS << “(”;
  • PrintExpr(E->getPlacementArg(0));
  • for (unsigned i = 1; i < NumPlace; ++i) {
  • OS << ", ";
  • PrintExpr(E->getPlacementArg(i));
  • bool NeedComma = false;
  • for (CXXNewExpr::arg_iterator P = E->placement_arg_begin(),
  • PEnd = E->placement_arg_end();
  • P != PEnd; ++P) {
  • if (NeedComma)
  • OS << ", ";
  • else
  • NeedComma = true;
  • PrintExpr(*P);
    }
    OS << ") ";
    }

I think it would make more sense to leave the printing of the new and the placement arguments to the CXXOperatorCallExpr. But I’m not very sure about this opinion.

It appears that there are quite a few places that get more complicated with this, e.g. the need to represent the allocation size expression in the AST. We need to find a way to improve this.

Sebastian

With the number of bits needed by flags down this much, is it now feasible to sink this into the shared bitfield of Stmt?

Good point, didn't think of that. Changed.

--- a/include/clang/StaticAnalyzer/Core/PathSensitive/ObjCMessage.h
+++ b/include/clang/StaticAnalyzer/Core/PathSensitive/ObjCMessage.h

This looks unrelated.

Without it, the analyzer crashes when trying to evaluate a CXXOperatorCallExpr that uses an implicitly-declared new. It's not /directly/ related but it's part of "not breaking tests". (Before this we didn't even look at the allocation.)

+ // FIXME: Can the array size ever throw?
+ if (const Expr *ArraySize = cast<CXXNewExpr>(this)->getArraySize())
+ CT = MergeCanThrow(CT, ArraySize->CanThrow(C));

Of course it can throw.
new int[callSomeFunction()]

At some point I read about C++ not having VLAs and crossed wires in my head; when I realized my mistake I never removed the comment. Thanks.

+ } else if (Kind == OO_New || Kind == OO_Array_New) {
+ // FIXME: Should the array brackets/array size be included for new?
+ if (getRParenLoc().isValid())
+ // There are explicit placement args.
+ return SourceRange(getArg(0)->getSourceRange().getBegin(), getRParenLoc());
+ else
+ // There are no explicit placement args.
+ return getArg(0)->getSourceRange();

I think it would be better, but perhaps inconsistent with non-array new, since I don't think we include the type range there, do we?

Right...we /could/ include the type range there as well, but that seems wrong. "new Foo" is not exactly a single unit in "new Foo(1,2,3)", especially if the allocator being used is global. It's a tradeoff.

I think it would make more sense to leave the printing of the new and the placement arguments to the CXXOperatorCallExpr. But I'm not very sure about this opinion.

It appears that there are quite a few places that get more complicated with this, e.g. the need to represent the allocation size expression in the AST. We need to find a way to improve this.

This is essentially the cause of ALL the complications. I considered a new expression type CXXAllocSizeExpr, or a new kind of UnaryExprOrTypeTraitExpr, but even then there's still cases where we treat the size argument differently:

- Printing the new-expression. CXXOperatorCallExpr would be taught to skip the size argument when it's an OO_New or OO_Array_New.
- Itanium name-mangling, which currently skips the size argument. The Itanium spec does seem to imply that this is correct (the size is skipped for the expression but not for the allocator function's name), but there's no clause dealing specifically with new and new.
- TreeTransform, which currently rebuilds the new expression /and/ the allocation call from scratch (because the two aren't distinct in Sema). This probably wouldn't be /too/ hard to change; I just haven't done it yet.
- CodeGen: Delete cleanups if the allocator fails. We're already keeping all the placement args around, though...no reason not to just extract the size argument from there too.
- CodeGen: We need to know the array+cookie size for the call to the allocator and the array size alone for the call to the initializer.

That last one is the killer. Teaching CXXOperatorCallExpr to just skip its first argument for printing and mangling wouldn't be so bad, but I'm not so happy about computing the array size again for the initializer call. And if we extract it from the arguments list, we just have to subtract the cookie size off again.

I do want to point out a couple good things I /hadn't/ put in the first version that this would allow -- namely, killing an old FIXME about using CheckFunctionCall, which we can actually do now. (Though right now CheckFunctionCall skips C++ operators, even though there could easily be nonnull/format attributes attached for the placement args.) If we can solve the CodeGen problem, then we also get to re-use all the real argument emission code for that too. (Of course, we could do that anyway by adding a flag to EmitCallArgs.)

(Slightly) updated patch attached...
Jordy

CXXNewExpr-allocation.patch (31.8 KB)

- Itanium name-mangling, which currently skips the size argument. The Itanium spec does seem to imply that this is correct (the size is skipped for the expression but not for the allocator function's name), but there's no clause dealing specifically with new and new.

This is actually a bug, since the size operand of a new could be instantiation-dependent and therefore affect overload resolution. I'll raise that with the ABI list.

- TreeTransform, which currently rebuilds the new expression /and/ the allocation call from scratch (because the two aren't distinct in Sema). This probably wouldn't be /too/ hard to change; I just haven't done it yet.
- CodeGen: Delete cleanups if the allocator fails. We're already keeping all the placement args around, though...no reason not to just extract the size argument from there too.
- CodeGen: We need to know the array+cookie size for the call to the allocator and the array size alone for the call to the initializer.

That last one is the killer. Teaching CXXOperatorCallExpr to just skip its first argument for printing and mangling wouldn't be so bad, but I'm not so happy about computing the array size again for the initializer call. And if we extract it from the arguments list, we just have to subtract the cookie size off again.

And the fact that IR-gen has to give special treatment to the fact that the computation can overflow. I assume that's been covered elsewhere here.

I do want to point out a couple good things I /hadn't/ put in the first version that this would allow -- namely, killing an old FIXME about using CheckFunctionCall, which we can actually do now. (Though right now CheckFunctionCall skips C++ operators, even though there could easily be nonnull/format attributes attached for the placement args.) If we can solve the CodeGen problem, then we also get to re-use all the real argument emission code for that too. (Of course, we could do that anyway by adding a flag to EmitCallArgs.)

To be perfectly honest, it sounds like this proposal is just complicating the representation of CXXNewExpr without significantly reducing the amount of special-case work that needs to be done in the clients.

John.

- Itanium name-mangling, which currently skips the size argument. The Itanium spec does seem to imply that this is correct (the size is skipped for the expression but not for the allocator function's name), but there's no clause dealing specifically with new and new.

This is actually a bug, since the size operand of a new could be instantiation-dependent and therefore affect overload resolution. I'll raise that with the ABI list.

I think this is actually not an issue: C++11[basic.std.dynamic.allocation]p1 says "The first parameter shall have type std::size_t". So it can't affect overload resolution, right?

- CodeGen: We need to know the array+cookie size for the call to the allocator and the array size alone for the call to the initializer.

That last one is the killer. Teaching CXXOperatorCallExpr to just skip its first argument for printing and mangling wouldn't be so bad, but I'm not so happy about computing the array size again for the initializer call. And if we extract it from the arguments list, we just have to subtract the cookie size off again.

And the fact that IR-gen has to give special treatment to the fact that the computation can overflow. I assume that's been covered elsewhere here.

Right, this behavior is already correct now, and we'd wrap it up in the handling of "CXXAllocSizeExpr" or whatever. Still doesn't solve the double-calculation problem, though. (We don't /really/ need to handle the overflow case for the second calculation, though, because our recovery is just to make the allocation fail by passing an all-ones value.)

To be perfectly honest, it sounds like this proposal is just complicating the representation of CXXNewExpr without significantly reducing the amount of special-case work that needs to be done in the clients.

I'm still on the fence myself; the original impetus for this (for me) was to improve the analyzer's model of new-expressions. If everyone agrees CXXOperatorCallExpr is the wrong way to go, we'll make up a new CFG node to represent the allocation instead.

Jordy

  • Itanium name-mangling, which currently skips the size argument. The Itanium spec does seem to imply that this is correct (the size is skipped for the expression but not for the allocator function’s name), but there’s no clause dealing specifically with new and new.

This is actually a bug, since the size operand of a new could be instantiation-dependent and therefore affect overload resolution. I’ll raise that with the ABI list.

I think this is actually not an issue: C++11[basic.std.dynamic.allocation]p1 says “The first parameter shall have type std::size_t”. So it can’t affect overload resolution, right?

Is not std::size_t a typedef though ? I think the underlying type could vary, especially between 32 bits and 64 bits platform.

– Matthieu.

- Itanium name-mangling, which currently skips the size argument. The Itanium spec does seem to imply that this is correct (the size is skipped for the expression but not for the allocator function's name), but there's no clause dealing specifically with new and new.

This is actually a bug, since the size operand of a new could be instantiation-dependent and therefore affect overload resolution. I'll raise that with the ABI list.

I think this is actually not an issue: C++11[basic.std.dynamic.allocation]p1 says "The first parameter shall have type std::size_t". So it can't affect overload resolution, right?

The size operand of an array allocation can fail to instantiate at particular template arguments, causing the template to be ignored by SFINAE.

- CodeGen: We need to know the array+cookie size for the call to the allocator and the array size alone for the call to the initializer.

That last one is the killer. Teaching CXXOperatorCallExpr to just skip its first argument for printing and mangling wouldn't be so bad, but I'm not so happy about computing the array size again for the initializer call. And if we extract it from the arguments list, we just have to subtract the cookie size off again.

And the fact that IR-gen has to give special treatment to the fact that the computation can overflow. I assume that's been covered elsewhere here.

Right, this behavior is already correct now, and we'd wrap it up in the handling of "CXXAllocSizeExpr" or whatever. Still doesn't solve the double-calculation problem, though. (We don't /really/ need to handle the overflow case for the second calculation, though, because our recovery is just to make the allocation fail by passing an all-ones value.)

Yeah, I think when you're adding new AST nodes which cannot conceivably be generalized, you're not actually improving the abstraction.

And even though I implemented it in clang, I've always been a bit skeptical of the "passing -1 to operator new" trick, because it's not actually sanctioned by the standard. I would treat that as an ABI-specific implementation detail rather than something that should necessarily be modeled by the CFG. The language just says that overflow causes an exception to be thrown.

To be perfectly honest, it sounds like this proposal is just complicating the representation of CXXNewExpr without significantly reducing the amount of special-case work that needs to be done in the clients.

I'm still on the fence myself; the original impetus for this (for me) was to improve the analyzer's model of new-expressions. If everyone agrees CXXOperatorCallExpr is the wrong way to go, we'll make up a new CFG node to represent the allocation instead.

It's interesting, because I can understand the desire to catch bugs involving the allocation and deallocation operators. On the other hand, there is a *lot* of situation-specific special case logic for this stuff.

John.

All right. Guess I'll shelve this idea and keep the model in the analyzer for now. Thanks for the guidance.

Jordy