Skipping constexpr initializer expression evaluations

The Clang Static Analyzer uses the CFG for walking the expressions.
In the case of DeclStmts, the CFG will emit the subexpressions prior to the declaration itself.
This is in line with the expectation that we need to evaluate those expressions prior to binding any value to the declaration.

However, in constexpr variable declarations, we can assume that the initialized expression is free of undefined behavior; if the code compiles.
Consequently, (most of) the Static Analyzer reports are false-positives in the given calling context at least. Some code-smell warnings such as deadstore or similar warnings might be still justified though, but that’s a different topic.

So, we have false-positives in constexpr initializers, which we want to get rid of.
The D117553 patch would suppress these reports, but that solution is less than ideal.
We should not evaluate any of the initializer subexpressions in the first place if all the reports coming from there will be suppressed.

The idea is that we (probably) shouldn’t emit such initializer expressions in the CFG in the first place.
One can easily omit these expressions by modifying the CFG construction in just a few lines, but this would result in losing precision since we no longer bind anything to the VarDecl in question.
We need to get the result of Expr::EvaluateAsInitializer() of the constexpr initializer, transform that value into an SVal and bind it to the VarDecl.

Other tools consuming the CFG might need to do this as well, or we should introduce an option for turning this feature on/off.

There are different kinds of clang::APValues:

  • None, Indeterminate: I think, we shouldn’t expect these if the code compiles.
  • Int: Take the APSInt and construct a concrete integer sval from this.
  • Float, FixedPoint, ComplexInt, ComplexFloat, LValue, Vector, AddrLabelDiff, Union: IDK, probably just return UnknownVal() for now.
  • Array: Take the individual initializer APValue elements and transform them into SVals recursively, then create a compound value from this SVal list.
  • Struct: I was thinking about constructing a CXXTempObjectRegion to which I could get FieldRegions and bind the initializer SVals to there; and return a MemRegionVal{tmpregion} SVal.
  • MemberPointer: I haven’t thought about it yet. I don’t expect it to be more difficult than the struct case.

I’ve got a slightly confused, that in each case I’m constructing an SVal directly, without binding anything to the store; but in case of structs I suddenly need to add bindings to the store. I guess it’s because the this should point somewhere right?

I just wanted to have some discussion about this topic before I dive into the implementation details.

PS: We could take this idea one step further. Beyond constexpr initializer expressions, we could say that for any function call of which we know all the arguments, we could create an argument substitution and call the Exp::EvaluateWithSubstitution(); than convert the resulting clang::APValue back to the SVals domain.
However, I would expect this to happen sooo rarely, that honestly, I’m not too excited about it.

What do you think, do you think it is worth the effort to explore the constexpr initializer case?
@NoQ @Xazax-hun @martong @Szelethus


Example code:

constexpr int inc(int x) { return x + 1; }
void fn() {
  constexpr int a[] = {0, inc(1), 1, inc(2)};
}

CFG previously:

[B2 (ENTRY)]
  Succs (1): B1

[B1]
   1: 0
   2: inc
   3: [B1.2] (ImplicitCastExpr, FunctionToPointerDecay, int (*)(int))
   4: 1
   5: [B1.3]([B1.4])
   6: 1
   7: inc
   8: [B1.7] (ImplicitCastExpr, FunctionToPointerDecay, int (*)(int))
   9: 2
  10: [B1.8]([B1.9])
  11: {[B1.1], [B1.5], [B1.6], [B1.10]}
  12: constexpr int a[] = {0, inc(1), 1, inc(2)};
  Preds (1): B2
  Succs (1): B0

[B0 (EXIT)]
  Preds (1): B

CFG after omitting all the initializer expressions of constexpr variables:

[B2 (ENTRY)]
  Succs (1): B1

[B1]
  1: constexpr int a[] = {0, inc(1), 1, inc(2)};
  Preds (1): B2
  Succs (1): B0

[B0 (EXIT)]
  Preds (1): B

Hi!

However, in constexpr variable declarations, we can assume that the initialized expression is free of undefined behavior; if the code compiles.

I am really skeptical of this statement. While I understand that this is what the standard expects the compilers to do, I saw many examples where many of the major compilers missed certain UBs in constexpr code. I think the implementations are simply not capable yet catching every bit of UB, and I am also not convinced that they ever will be able to do that.

But more importantly, the static analyzer is not only about undefined behavior. There are many checks and potential checks that can catch portability, performance or logic issues. One such example is the dead store check that can potentially hint that one forgot to use a variable probably due to a copy paste error. A potential example for a performance check could be suggesting a reserve method call before pushing back to a vector in a loop. Now that we have constexpr vectors, it is possible to reach such code from a constexpr initializer. I am pretty sure we could find more examples where the check is looking for some undesired behavior that is not UB.

I wanted to add some actual proof for this argument, so here is a code snippet courtesy of Cody Miller: Compiler Explorer

This code invokes undefined behavior in constexpr context and that undefined behavior is missed both in MSVC and GCC. Clang at least detects that there is a problem but the diagnostic is confusing and misleading. This is the current state of the art for constexpr code.

I kind of like the idea that we should do exactly the same thing that CodeGen does, even if it drops some warnings. If there’s literally nothing going on at runtime, why not do the same during symbolic execution. Even in that example there’s no actual stack UAF at runtime right? And it’ll never be there because the compiler will always simply replace the call with a compile-time constant? So I think this is an ok thing to do, it may even improve precision in some cases when we’d otherwise mis-model something(?)

No-no you cannot simply change this in the middle of the object’s lifetime. We should either make a region for the constexpr variable and put them there, or if we don’t want store bindings at all, we can keep them around as nonloc::CompoundVal and bind it whenever it actually materializes.

No-no you cannot simply change this in the middle of the object’s lifetime. We should either make a region for the constexpr variable and put them there, or if we don’t want store bindings at all, we can keep them around as nonloc::CompoundVal and bind it whenever it actually materializes.

I tried that direction, but I found the representation of nonloc::CompoundVals lacking. They are capable of holding a list of SVals, but a struct might not be ‘just as flat’. Consider the case when a class with a constexpr constructor inherits from some other class: example. I’m not even sure if we could reconstruct the structure.
Any other ideas?

Great initiative!

IMHO, we don’t need an option, we should just have this feature enabled always. I don’t see any reason to analyze an init expression that can be evaluated as a constant expression.

So, IIUC we could initialize not just constexpr but const variables if a function can be evaluated during “compile time” (and that’s what happens when we codegen). That does not seem rare at all and I am excited about that.

Why is that a problem? This is exactly what we do in the ExprEngine when we evaluate an expression (with the help of the SValBuilder).

You must create a binding in the Environment, where we keep the Expr->SVal bindings. Once that is done then the next step could be to create bindings in the Store e.g. for the Array case.

So, IIUC we could initialize not just constexpr but const variables if a function can be evaluated during “compile time” (and that’s what happens when we codegen). That does not seem rare at all and I am excited about that.

IDK why I did not think about that. I agree.

You must create a binding in the Environment, where we keep the Expr->SVal bindings. Once that is done then the next step could be to create bindings in the Store e.g. for the Array case.

The problem is that if we omit the initializer subexpressions, the analyzer won’t see them, hence even if I would bind values in the Environment to them, the engine still won’t make use of that. So, in that sense, we don’t have expressions, yet I still want to bind some SVal to the VarDecl. To add a little bit more context, I’ve updated the initial proposal to include an example CFG to demonstrate why this won’t work.

While it is true that there will be no UB at runtime, but I am not sure such constexpr code would always provide the intended behavior when the compiler does not diagnose the UB at compile time, e.g., it might compute the wrong value. Moreover, we have checks like alpha.core.Conversion that could also uncover unexpected behavior without invoking UB. My other concern is constexpr initializers often containing calls to constexpr functions. Similar calling context in those initializers might happen at rune time as well, but maybe the analyzer could not explore those due to some limitations, lack of CTU or the restrictions in the number of hops between TUs.

I think in this case simulating the exact runtime behavior vs finding unexpected results are in odds. I was wondering if there were any real user complaints for warnings coming from constexpr initializers. If no user complained about those warnings does it worth to reduce the usefulness of SA for constexpr code? Wouldn’t it be confusing for the users when certain warnings disappear after they mark some code constexpr? I know that this would be not the only case where an unrelated edit to a problem would make a warning disappear/appear. It looks like I am the only one concerned about the lost warnings, so I think in that case it is probably ok to go ahead with this. I was just surprised that on one hand we wanted to change CTU to avoid losing warnings as much as possible, but on the other hand we are ok losing warnings in other context.

Moreover, we have checks like alpha.core.Conversion that could also uncover unexpected behavior without invoking UB.

We do have checkers for detecting code-smell, but probably clang-tidy is a better fit for that task.
Those types of bugs are probably less context sensitive, compared to multi-step UB scenarios. Anyway, it’s more of a detail if we decide to enable this by default or leave it as an opt-in feature. Measurements should tell the difference.

My other concern is constexpr initializers often containing calls to constexpr functions. Similar calling context in those initializers might happen at rune time as well, but maybe the analyzer could not explore those due to some limitations, lack of CTU or the restrictions in the number of hops between TUs.

I cannot follow how CTU is affected. Any functions that a constexpr function calls have to be defined in the same translation unit. We would only skip evaluating constexpr context initializers ATM. So, at any other place we would still evaluate it step-by-step.

I was wondering if there were any real user complaints for warnings coming from constexpr initializers

We had internally. IDK about upstream reports.

Wouldn’t it be confusing for the users when certain warnings disappear after they mark some code constexpr?
I know that this would be not the only case where an unrelated edit to a problem would make a warning disappear/appear.

As my prototype goes, the situation is even worse. If we have calls to this constexpr function only from constexpr contexts, the constexpr function would be analyzed as a top-level function as usual. But as soon we would have a call to it from a regular context, the analyzer would prefer that compared to analyzing it in top-level context. This way the user can see ‘disappearing’ reports by basically adding a call to it :smiley: It might make sense rework this behavior though.

My other concern is that the Clang Static Analyzer might run out of budget during a constexpr initializer evaluation, at which point we lose precision, compared to just relying on the constexpr evaluator. There could be other cases when an engine, which was designed to deal with concrete values can outperform the accuracy of the Analyzer, which focuses on symbolic values for the most part.

It looks like I am the only one concerned about the lost warnings, so I think in that case it is probably ok to go ahead with this. I was just surprised that on one hand we wanted to change CTU to avoid losing warnings as much as possible, but on the other hand we are ok losing warnings in other context.

What if we could achieve both?
Somehow we should suppress bugreports that don’t make much sense in constexpr initializers, such as the infamously known arraybound reports, or any other fatal-bugreports. The constexpr interpreter probably has an edge on tracking those accesses anyway. While by reaching the VarDecl, we would still call the constexpr evaluator and bind the result of that instead to the variable.
This way we would benefit from more accurate values and have warnings for code-smells.
On the other hand, we would get even slower.

The alpha.core.Conversion check is not syntactic check, it reasons about the values of the variables. So it is just as multi-step, context-, and path-sensitive as any other path-sensitive check in CSA. We also have soft-UB checks, like use after move check. Use after move might not trigger UB in the language but it could trigger unexpected behavior in the library. Those would not be diagnosed by the constexpr interpreter. Admittedly, most checks in CSA are looking for UB, but that might also change over time.

Let’s consider:

constexpr int val = foo(args);

We might have a similar foo(args) call in a different TU in a non-constexpr context and as far as I am aware it is perfectly legal to do this. So you might not be able to evaluate it in the other TU in the non-constepxr context.

While this is theoretically true, I am not convinced this is actually the case for the current implementations. See MSVC and GCC failing to diagnose the UB in the godbolt link above.

Feels like we should go through the usual empirical measurement process. I mean let’s implement the prototype, which would turn this on, then let’s compare the runtime and the quality of bugreports with the baseline.

Well, I still have a problem :smiley:

As I earlier I posted, I don’t really know how to convert struct APValues into SVals.
Thus, I cannot move forward with the prototype. Any ideas?

What if you’d target Int and Array first? Just to see some initial results. I mean it would be great to see how many const(expr) variables can be evaluated this way in a project. Might turn out that this is very rare, or who knows. Could you find an open source project that has many const(expr)s in it?