APValue lifetime problem when evaluating constant expressions

The following reproducer does not work in clang with C++20 right now. If D128248 is applied, clang will show an error about the initializer not being a constant expression. If it is not applied, clang will crash because and assertion is hit.

struct Foo {
  int a;
  constexpr Foo()
      : a(get_int()) {
  }

  constexpr int get_int() {
    return 5;
  }
};

static constexpr Foo bar[2][1] = {
    {{}},
};

godbolt: Compiler Explorer (assertions are not enabled here)

So, I went looking why this happens. When assertions are enabled, clang ultimately runs into an assertion because all APValue accessors check for e.g. isArray() etc. which simply check that this->Kind == Array etc., however, when the above code is compiled, Kind is a garbage value.

When one tries hard enough, a debugger will reveal that Kind is set to a garbage value in APValue::~APValue() when called in expandArray().

expandArray() is a function in ExprConstant.cpp which takes an array APValue, creates a new one, swaps all the values and expands the array by filling it with the array’s array-filler: llvm-project/ExprConstant.cpp at main · llvm/llvm-project · GitHub

So, notably, all elements of the Array parameter of expandArray() are replaced and thus cease to exist.

One of the callers of expandArray() is findSubobject(): llvm-project/ExprConstant.cpp at main · llvm/llvm-project · GitHub, this passes *O as Array, which in our case is simply the Obj.Value given by the Obj parameter. Note here that expandArray modifies Obj.Value, even though “findSubobject” doesn’t sound like it would modify anything.

In the reproducer above, this is a problem because the Obj passed to findSubobject can come from findCompleteObject, which in our case return the APValue for the full bar array: llvm-project/ExprConstant.cpp at main ¡ llvm/llvm-project ¡ GitHub

So… while evaluating one of the elements of bar, the APValue for bar is completely redone in expandArray, which kind of breaks everything.

I’ve tried to modify findSubobject to not modify anything but it seems like a lot of the code (seemingly implicitly) depends on this behavior.

Does anyone have a better idea on how to fix this?

CC @zygoloid @AaronBallman @shafik

Thanks

1 Like

Oofda, thank you for looking into this, that sounds like a hairy thing to have tracked down!

Yeah, I think trying to modify findSubobject() to not modify contents will be a bit of a challenge; I suspect things expect the array to have been expanded once you get the subobject so that the caller doesn’t have to deal with the filler.

What specifically breaks? e.g., does knowing that the object has been invalidated out from under you help you to avoid the breakage? Can you write the code to assume the object is always invalidated?

I suppose another way to consider (no idea how good or bad of an idea this is) would be to leave all the filler in place and make the callers handle replacing filler. But, I suspect that’s a risky and involved change (with potentially interesting problems with designated initializers and things like that).

It’s hard to say what breaks specifically, but e.g. int n = 1; while (n--) {} is an infinite loop afterwards, so the decrement is not being applied for instance.

Over the weekend I came up with another idea of fixing this problem. If the array simply got expanded before evaluating the filler expression, the problem wouldn’t even exist. I tried a few things, but the patch in the end is basically equivalent to:

diff --git a/clang/lib/AST/ExprConstant.cpp b/clang/lib/AST/ExprConstant.cpp
index 6ffb65d8e71d..c093a0e031c5 100644
--- a/clang/lib/AST/ExprConstant.cpp
+++ b/clang/lib/AST/ExprConstant.cpp
@@ -10732,7 +10732,7 @@ bool ArrayExprEvaluator::VisitInitListExpr(const InitListExpr *E,

   // If the initializer might depend on the array index, run it for each
   // array element.
-  if (NumEltsToInit != NumElts && MaybeElementDependentArrayFiller(FillerExpr))
+  //if (NumEltsToInit != NumElts && MaybeElementDependentArrayFiller(FillerExpr))
     NumEltsToInit = NumElts;

   LLVM_DEBUG(llvm::dbgs() << "The number of elements to initialize: "

However, that breaks a test in clang/test/SemaCXX/aggregate-initialization.cpp:

namespace HugeArraysUseArrayFiller {
  // All we're checking here is that initialization completes in a reasonable
  // amount of time.
  struct A { int n; int arr[1000 * 1000 * 1000]; } a = {1, {2}};
}

because LLVM runs out of memory while trying to allocate APValues. So, the test case works I guess. Adding a static_assert(a.arr[large_value] == 0) doesn’t break it either, the expansion needs to happen while constructing the array.

The test above still breaks down them the array is of a struct type and not an integer though. So… not much progress here. One possibility could still be to change the special case from above with MaybeElemetDependentArrayFiller(FillerExpr) to also happen for multidimensional arrays.

I’ll try to investigate more and try your other suggestions, thanks for the feedback so far.

My initial thought was “oh, good idea!” but my second thought was “I wonder how that impacts overhead and compile times”, so your second observation makes sense to me.

One thing I hadn’t noticed before is this comment on EvaluateInPlace():

/// EvaluateInPlace - Evaluate an expression in-place in an APValue. In some
/// cases, the in-place evaluation is essential, since later initializers for
/// an object can indirectly refer to subobjects which were initialized earlier.

Perhaps what’s missing is an in-place evaluation of the array expansion rather than a replacement of the prior object? (Unsure how helpful this question is, so take it with a grain of salt.)

Okay, so I’ve now tried to make findSubobject() not modify anything and instead leave that responsibility to the caller. However, that doesn’t even matter in the end.

In the reproducer I posted, clang will expand the array (the outermost APValue, which is being returned by findCompleteObject() while evaluating its array filler…

Perhaps what’s missing is an in-place evaluation of the array expansion rather than a replacement of the prior object?

… which is also why this doesn’t work (even tried adding expandArray() to APValue before you suggested this). The APValue that’s being evaluated when things go wrong is not the array itself, but an element of the array (or rather its filler, but that doesn’t matter in this case).

So, since the expandArray() in findSubobject is a requirement (so huge arrays only get expanded when accessed), I’m basically out of ideas.

Aaron sniped me into looking into this. IMO, there is an architectural problem here, in that the ArrayExprEvalutor is allowed to have a reference to the APValue that isn’t necessarily ‘final’, as it is possible to be replaced during array-expansion.

My only thought is that perhaps ArrayExprEvaluator should NOT be allowed to take a direct reference to the APValue and instead should store some sort of ‘proxy’ object that can have its APValue replaced out from under it? Something where it stores a “Top Level Entity ref” PLUS some sort of “How do I get to my Result”.

Unfortunately I haven’t spent sufficient time working through our constant evaluator to know the answer to how to do this, or what the right answer is here.

But the problem basically comes down to: We have a recursive creation model with a top-level owner, BUT it is also possible for the evaluation of a bottom-level thing to modify things ‘upwards’ to the point that it breaks things it does not own. AND in this case, the thing it breaks is ‘itself’.

ONE thought I have is to have the ArrayExprEvaluator::VisitInitListExpr have to “update” its reference to Result (that is, it is no longer a &, but a pointer) after every call to EvaluateInPlace. HOWEVER, I THINK this also means that its reference of This (stored in SubObject) ALSO needs to be updated as it goes.

1 Like

Also played around with this a bit, to see if the APValues processed during findSubobject could be fixed on the fly so that their non-constness would not be problematic (in this case anyway), and found that the following hack prevents the assert from being tripped, though the error remains:

  1. Add a setArrayInitializedElt(unsigned, APValue &Val) to APValue.
  2. Add variables
APValue *LastArray = nullptr;
uint64_t LastArrayIndex = 0;

before the subobject path iteration:

  1. Fix LastArray’s data after you access an array element; i.e. change
    https://github.com/llvm/llvm-project/blob/main/clang/lib/AST/ExprConstant.cpp#L3730-L3736
    to
      APValue *CurArray = O;
     
      if (O->getArrayInitializedElts() > Index)
        O = &O->getArrayInitializedElt(Index);
      else if (!isRead(handler.AccessKind)) {
        expandArray(*O, Index);
        O = &O->getArrayInitializedElt(Index);
      } else
        O = &O->getArrayFiller();
      
      // CurArray may have been changed, so update the 
      // outer array's data accordingly.
      if (LastArray)
        LastArray->setArrayInitializedElt(LastArrayIndex, *CurArray);
      LastArray = CurArray;
      LastArrayIndex = Index;

Significantly, the assert is still tripped if you only update the outer array data immediately after the expandArray call; at least one of those other if-branches seems to also be complicit.

Okay, I think I came up with something: ⚙ D131155 [clang] Expand array expressions if the filler expression's filler is element dependent