Constant expression checking rewrite

The other day, I ran into an annoying false positive in the constant
expression-checking code; that code has actually been annoying me for
a while, so I decided that the code needed to be replaced with some
more accurate checking. Attached patch does this; it passes make
test, so it's at least mostly right. It's also significantly stricter
about what it allows through as a constant expression, so it shouldn't
let through expressions that can't be codegen'ed properly. The
implementation is complete in the sense that it follows all the C99
rules for constant expressions.

I'll put together some additional tests to cover some of the new cases
I'm checking before I commit.

I haven't really carefully considered the diagnostics yet; we probably
want more than just the generic "expression isn't constant" error, but
I'm not sure exactly what. I've also marked some places that we might
want to warn in -pedantic mode with FIXMEs. That said, the section
about constant expressions in C99 is a bit messy, and I'm not sure
what warnings are necessary/useful.

If Expr::isConstantExpr in the sense that it is currently implemented
is useful for outside code to query, I can refactor the code to
calculate it. However, that will make things more complicated, and I
don't think we need it. Being a constant expression in the C99 sense
is not really an interesting property for any purpose I can think of.
Besides Sema, there are only a couple of other users in the current
codebase, and neither of them really want precisely what
Expr::isConstantExpr returns.

-Eli

t.txt (14 KB)

Hi Eli,

I'm not parsing the first couple sentences of this paragraph very well. Can you please elaborate? I'm also not certain what you mean by "Being a constant expression in the C99 sense is not really an interesting property for any purpose I can think of."

It is very useful for isConstantExpr to evaluate the actual value of the expression as an APSInt constant.

Ted

I think you're confusing Expr::isConstantExpr and Expr::isIntegerConstantExpr

Okay, I'll try to rephrase the paragraph more simply: I don't think we
need Expr::isConstantExpr outside of Sema; the current users don't
really want it, and I can't think of any other interesting uses.

-Eli

You have indeed identified my confusion. Thanks for the clarification!

One potential client outside of Sema for Expr::isConstantExpr are refactoring clients that wish to determine if a transformed AST is constant expression in the C99 sense. Even this isn't strictly needed; transformed code can be checked by just running the parser+semantic analyzer on it.

The clients of Expr::isConstantExpr that I see outside of Sema are the DeadStores analysis and CGExprAgg. What interface do you suggest that these clients use instead?

Ted

CGExprAgg isn't really using it (it's an if around a FIXME), and the
code that might go there should really be checking the LLVM values,
not the AST value.

I'm not exactly sure why the DeadStores pass cares if the value is
constant; maybe it should just be checking for zero and other common
"dummy" initializations?

-Eli

I found that defining the scope of dummy initializations is not really tractable. Simply checking for a constant assignment was 99% accurate in pruning out dead stores resulting from defensive programming.

Aside from the dead store checker, I can see other cases in the static analyzer (and other clients such as refactoring) where knowing whether or not an expression is a constant expression is very useful.

I apologize for digressing this thread, so I want to return to the original point. I think having Sema more accurately reflect what is a C99 constant expression is good thing (as done by your patch).

My question now is what is the harm about Expr::isConstantExpr? This hasn't been made clear in this thread. Is the name just misleading? Does it return true when something isn't a constant expression? I don't have a problem removing it, but I'm trying to understand what the problem is, especially since there are clients of it outside of Sema. For the clients outside of Sema (including potential future clients who want to do similar queries on expressions), I want to understand what is fundamentally wrong or limited about Expr::isConstantExpr as it is now.

I understand that there is an issue of cleanly implementing the checking of constant expressions in Sema so that we can report diagnostics (as done in your patch), and that keeping isConstantExpr in Expr makes this a little more difficult to do cleanly. Is this the main issue?

Essentially, yes; that's the biggest issue. Implementing this stuff
outside of Sema would require a complicated return convention for
reasonable diagnostics. It might end up being the best approach, but
it's a lot more complicated.

Also, there's really a few different kinds of constant expressions.
First, there are integer constant expressions, which are precisely
defined in the C99 standard to be essentially integers and integer
arithmetic.

Then, there are general constant expressions, also defined in the C99
standard, which are essentially either arithmetic constants (computed
with integer/floating-point arithmetic), or an easily computable
address (the details are slightly more complicated). This is what my
patch implements.

Then, there is what Expr::isConstantExpr returns, which isn't exactly
clear; it's currently buggy, but it apparently tries to compute
whether an expression is constant in the sense that it doesn't depend
on the values of any variables/globals. This could potentially be
useful, but it's not acceptable for the Sema checking; we cannot
accept all expressions in this category as constant expressions
because some of them might not be computable by the linker.

One issue with my patch I haven't mentioned: it doesn't try to check
whether an expression has a defined result when doing
constant-expression checking. Unfortunately, this will significantly
complicate the code, and I'm not sure what the best approach is yet
(maybe leaving it to an analysis pass would be best). I wouldn't
normally worry about that sort of thing, but division by zero in a
global currently crashes llc, so it would be best if we emitted a hard
error.

-Eli

Essentially, yes; that's the biggest issue. Implementing this stuff
outside of Sema would require a complicated return convention for
reasonable diagnostics. It might end up being the best approach, but
it's a lot more complicated.

Understood. The patch as you have also may just be an intermediate design to a final solution, as there is also the issue of your last point about undefined operations) where one may need to (partially) compute constant values (see below).

Also, there's really a few different kinds of constant expressions.
First, there are integer constant expressions, which are precisely
defined in the C99 standard to be essentially integers and integer
arithmetic.

Then, there are general constant expressions, also defined in the C99
standard, which are essentially either arithmetic constants (computed
with integer/floating-point arithmetic), or an easily computable
address (the details are slightly more complicated). This is what my
patch implements.

Then, there is what Expr::isConstantExpr returns, which isn't exactly
clear; it's currently buggy, but it apparently tries to compute
whether an expression is constant in the sense that it doesn't depend
on the values of any variables/globals. This could potentially be
useful, but it's not acceptable for the Sema checking; we cannot
accept all expressions in this category as constant expressions
because some of them might not be computable by the linker.

That is an extremely lucid answer. Thank you!

One issue with my patch I haven't mentioned: it doesn't try to check
whether an expression has a defined result when doing
constant-expression checking. Unfortunately, this will significantly
complicate the code, and I'm not sure what the best approach is yet
(maybe leaving it to an analysis pass would be best). I wouldn't
normally worry about that sort of thing, but division by zero in a
global currently crashes llc, so it would be best if we emitted a hard
error.

Forgive my naivete, could Expr::isIntegerConstantExpr be used for some of these purposes? It seems like there are cases where you actually want to compute the constant value to do such checking, and this might be a valid approach (although I can see where it would be incomplete). I'm not certain if it would satisfy all the restrictions of a C99 constant expression.

Alternatively, it does seem to me that to catch such cases in general one would need to recursively evaluate the subexpressions, compute the actual constants (when necessary), and see in what cases where an undefined result could occur (e.g., divide-by-zero, shift by too many bits). The checking done in your patch is written in a recursive fashion; it seems like the evaluation of the constant values could be done in this way also. I think the main complication it would add to the code is that it would create several more recursive methods to do the checking (e.g., one for checking integer binary operations, floating point binary operations, pointer arithmetic, etc., instead of just having a single case for BinaryOperator). Such logic would not be all that different from what is going on in the static analysis engine when evaluating constant values along paths.

isIntegerConstantExpr doesn't accept general arithmetic expressions;
only integer arithmetic is allowed.

-Eli