integer constant expression oh my!

I wrote up some thoughts about how I think that constants should eventually be handled in clang:
http://clang.llvm.org/docs/InternalsManual.html#Constants

The goal is for Expr::isConstantExpr and many other things to eventually go away, and only have one piece of code that does the constant folding tree walk. Before we get too far along this path, I thought it would be good to see if this makes sense to others... so, what say you all? :slight_smile:

-Chris

I think it's generally a good direction; it's certainly a lot less
confusing than having multiple routines doing very similar things.

My only real concern about the approach is that putting ICE checking
into Expr::Evaluate might not be a good idea. isICE has to enforce
restrictions that seem unintuitive to enforce in Evaluate, like
restrictions on unevaluated sub-expressions. I think it would be
better to rewrite ICE checking around Evaluate, but keep the
ICE-specific logic separate.

Also, something you didn't mention: do we want to return information
about integer/pointer overflow (PR2729)? I'm guessing that we do,
although it'll be a bit of a pain to implement.

"__builtin_choose_expr: The condition is required to be an integer
constant expression..." Our implementation currently has this
requirement, but where in the gcc docs does it imply that it needs to
be an ICE? Not that I really trust the gcc docs at all... "This
built-in function returns exp1 if [...] Otherwise it returns 0."

-Eli

Excellent Chris, the direction looks good to me.

I wrote up some thoughts about how I think that constants should
eventually be handled in clang:
http://clang.llvm.org/docs/InternalsManual.html#Constants

The goal is for Expr::isConstantExpr and many other things to
eventually go away, and only have one piece of code that does the
constant folding tree walk. Before we get too far along this path, I
thought it would be good to see if this makes sense to others... so,
what say you all? :slight_smile:

I think it's generally a good direction; it's certainly a lot less
confusing than having multiple routines doing very similar things.

My only real concern about the approach is that putting ICE checking
into Expr::Evaluate might not be a good idea. isICE has to enforce
restrictions that seem unintuitive to enforce in Evaluate, like
restrictions on unevaluated sub-expressions. I think it would be
better to rewrite ICE checking around Evaluate, but keep the
ICE-specific logic separate.

Hmmm, this is true, but currently having two implementations that do
very similar traversals is a bit confusing and has been error prone
(i.e., bugs in one and not the other, patches that only fix one and
not the other).

I'm not sure what you mean about rewrite ICE checking around Evaluate.
Is this just about replacing pieces like sizeof with calls to
Evaluate?

Also, something you didn't mention: do we want to return information
about integer/pointer overflow (PR2729)? I'm guessing that we do,
although it'll be a bit of a pain to implement.

Seems like we should.

"__builtin_choose_expr: The condition is required to be an integer
constant expression..." Our implementation currently has this
requirement, but where in the gcc docs does it imply that it needs to
be an ICE? Not that I really trust the gcc docs at all... "This
built-in function returns exp1 if [...] Otherwise it returns 0."

It is explicit:
"You can use the built-in function __builtin_choose_expr ... if
const_exp, which is a constant expression that must be able to be
determined at compile time, ..."

- Daniel

I'm thinking the following: except for a few exceptional cases (like
ICEs containing a comma operator), isICE doesn't care about what value
the subexpressions evaluate to. Therefore, we can split isICE into
two parts: verifying that the expression has the structure of an ICE,
and actually evaluating it. The structure verification has relatively
little overlap with Evaluate, and the evaluation part can be taken
care of with a simple call to Evaluate.

-Eli

Ah, this makes total sense and I think you are right this will come
out cleaner. Plus it concentrates all the ICE logic in one place which
should make it easier to match to the standard.

- Daniel

I'd really like that, except that integer overflow prevents things from being an ICE. Doesn't that mean isICE needs to evaluate the constants?

-Chris

I don't think that is a problem. isICE just calls Evaluate to see if
it has overflow and if it can eval the expression, then does its own
little verification traversal to make sure it matches the ICE rules.

- Daniel

Chris Lattner wrote:

I wrote up some thoughts about how I think that constants should eventually be handled in clang:
http://clang.llvm.org/docs/InternalsManual.html#Constants

The goal is for Expr::isConstantExpr and many other things to eventually go away, and only have one piece of code that does the constant folding tree walk. Before we get too far along this path, I thought it would be good to see if this makes sense to others... so, what say you all? :slight_smile:
  

One thing to be aware of in the design is that we'll very likely want this to handle support for the C++0x constexpr feature. To sum it up for those who're not following the new standard: Simple functions can be marked constexpr. The compiler is required to evaluate them at compile time if all arguments are themselves constant expressions, and the result is again a constant expression. Objects constructed with a constexpr constructor have the status of object literals, and an access to their members is again a constant expression. There's also constexpr user-defined literal converters to consider, but they mostly act like ordinary constexpr functions in this regard.

I think the design as discussed in this thread can handle this.

Sebastian

This also means that one can end up instantiating a template as part
of the evaluation of an integral constant expression (if that template
is a constexpr function template). I think the only real effect this
has on the design is that the ICE checking and expression evaluation
code will need to move from AST into Sema.

  - Doug

Don't we have to instantiate the correct template in Sema as part of
typechecking? And isn't that before we might want to evaluate an
expression using the template in question?

-Eli

We have to instantiate the correct template *declaration* as part of
type-checking. We don't have to instantiation the template
*definition* until we try to evaluate the value of the expression.

  - Doug

Right... but I'm pretty sure we're still going to instantiate it
before we try to evaluate the value of the expression. Take the
following program:

template <int i> int f(int (*a)[-i]) {
  int b[i]; return 2;
}
int b() {return 0 ? f<-1>(0) : 0;}

I'm pretty sure we're still supposed to emit a diagnostic (at the very
least, g++ does). In -fsyntax-only mode, there isn't anything after
Sema; therefore, Sema has to force the instantiation of the definition
at some point. The only reasonable place to force the instantiation
of the definition is when Sema processes the call to f<-1>. This is
before the call exists in the AST, so the template will be
instantiated before we might want to evaluate it.

I don't really want to attempt to drag this out before we actually
have a plan for how we're going to implement templates, but if there
really is an architectural issue here, I would prefer to solve it
sooner rather than later.

-Eli

We have to instantiate the correct template *declaration* as part of
type-checking. We don't have to instantiation the template
*definition* until we try to evaluate the value of the expression.

Right... but I'm pretty sure we're still going to instantiate it
before we try to evaluate the value of the expression. Take the
following program:

template <int i> int f(int (*a)[-i]) {
int b[i]; return 2;
}
int b() {return 0 ? f<-1>(0) : 0;}

I'm pretty sure we're still supposed to emit a diagnostic (at the very
least, g++ does).

Yes, it emits a diagnostic. However, the diagnostic actually comes
later on, since GCC (like many compilers) delays the instantiation of
function templates until the end of the translation unit. (Or later,
if one is using a prelinker-based approach like the one EDG supports).

In -fsyntax-only mode, there isn't anything after
Sema; therefore, Sema has to force the instantiation of the definition
at some point. The only reasonable place to force the instantiation
of the definition is when Sema processes the call to f<-1>. This is
before the call exists in the AST, so the template will be
instantiated before we might want to evaluate it.

If you add a bit of bogus code that doesn't involve templates to the
end of your example and run it through GCC, you'll see that the error
message for the problem inside f<-1> comes *after* the error messages
for that bogus code. That's the delay in performing the instantiation.
It can be helpful: for one, it means that error messages due to failed
instantiations of function templates don't come from inside the bodies
of other functions that just happen to use the broken template.

I don't really want to attempt to drag this out before we actually
have a plan for how we're going to implement templates, but if there
really is an architectural issue here, I would prefer to solve it
sooner rather than later.

I don't think it's a major architectural issue, I just think that
we'll eventually have to move the evaluation code to Sema to deal with
constexpr function templates.

  - Doug

We have to instantiate the correct template *declaration* as part of
type-checking. We don't have to instantiation the template
*definition* until we try to evaluate the value of the expression.

Right... but I'm pretty sure we're still going to instantiate it
before we try to evaluate the value of the expression. Take the
following program:

template <int i> int f(int (*a)[-i]) {
int b[i]; return 2;
}
int b() {return 0 ? f<-1>(0) : 0;}

I'm pretty sure we're still supposed to emit a diagnostic (at the very
least, g++ does).

Yes, it emits a diagnostic. However, the diagnostic actually comes
later on, since GCC (like many compilers) delays the instantiation of
function templates until the end of the translation unit. (Or later,
if one is using a prelinker-based approach like the one EDG supports).

In -fsyntax-only mode, there isn't anything after
Sema; therefore, Sema has to force the instantiation of the definition
at some point. The only reasonable place to force the instantiation
of the definition is when Sema processes the call to f<-1>. This is
before the call exists in the AST, so the template will be
instantiated before we might want to evaluate it.

If you add a bit of bogus code that doesn't involve templates to the
end of your example and run it through GCC, you'll see that the error
message for the problem inside f<-1> comes *after* the error messages
for that bogus code. That's the delay in performing the instantiation.
It can be helpful: for one, it means that error messages due to failed
instantiations of function templates don't come from inside the bodies
of other functions that just happen to use the broken template.

Hmmm, interesting.

Actually, it looks like C++0x is making some significant changes to
the definitions of constant expressions. I think we'll just have to
wait until C++0x stabilizes a bit more, then figure out what's going
on... the current draft looks convenient to implement, but it seems
like they didn't think through the effect of the changes on
floating-point arithmetic, and I'm not sure it leaves constexpr as
usable as they might want.

I don't really want to attempt to drag this out before we actually
have a plan for how we're going to implement templates, but if there
really is an architectural issue here, I would prefer to solve it
sooner rather than later.

I don't think it's a major architectural issue, I just think that
we'll eventually have to move the evaluation code to Sema to deal with
constexpr function templates.

I guess we'll just have to wait until we get there... C++ is a
horribly complicated language, and it's not getting better in that
respect.

-Eli

Eli Friedman a écrit :

N2798 is a first public draft. The language can still change
significantly, if we find problems that need to be fixed. No new
features will be added, but it's possible that features in the
document will be removed or changed.

That said, this isn't the right forum for such discussions; I suggest
comp.lang.c++.moderated or comp.std.c++.

  - Doug

This is great as it's horrible to find the same bug over and over again in slightly different places because the code is duplicated.

One thing I would like to add (that I was discussing yesterday in IRC) is a Expr::EvaluateToBool() (just export the HandleConversionToBool() from ExprConstant.cpp). This function is important to fix a few bugs in codegen and sema. What do you think?

Nuno