Unravelling i-c-e + the gnu mess

Ok, I'm preparing to do battle with Expr::isIntegerConstantExpr and want to consult the guru's to see if this is a sane plan. :slight_smile:

Problem statement:
   1) we want to support fully conformant C code with GNU extensions disabled.
   2) We want to support GNU extensions, but emit ext-warns when they are used.
   3) If a subexpr is a GNU ice, we don't want to emit warnings multiple times in the subexpr. The canonical horrible example would be test/Sema/darwin-align-cast.c

This same problem occurs with both i-c-e's and c-e's, but I'm only going to poke at i-c-e's now.

My proposed fix is:

1. Introduce a new static function in Expr.cpp, something like this:

static unsigned isIntegerConstExprHelper(const Expr *E, llvm::APSInt &Result,
                                          ASTContext &Ctx, SourceLocation &Loc,
                                          bool isEvaluated, bool AllowGNU);

When AllowGNU is false, it evaluates the expr for i-c-e'ness strictly by the language standard indicated by ASTContext. If the expr *is* an ICE, it returns zero and fills in
Result. If it is not an ICE, it puts a caret position in "Loc" and returns a diag # to issue that explains why not.

When AllowGNU is true, it does the same, but 1) allows GNU craziness and 2) allows Expr to have pointer type (which is part of gnu craziness I suppose).

2. Change Expr::isIntegerConstantExpr to be something like this:

   // See if this expr is a real i-c-e.
   unsigned Diag = isIntegerConstExprHelper(this, ..., false)
   if (Diag == 0) return true;

   // See if it is a gnu one.
   if (!LangOptions.NoExtensions) {
     if (isIntegerConstExprHelper(this, ..., false) == 0) {
       Emit Warning about using GNU i-c-e.
       return true;
     }
   }
   return false;

3. Change Expr::isIntegerConstantExpr to return the diag reason up to its callers so they can emit more specific diags in some cases.

Does this seem like a reasonable approach?

-Chris

Ok, I'm preparing to do battle with Expr::isIntegerConstantExpr and
want to consult the guru's to see if this is a sane plan. :slight_smile:

Problem statement:
  1) we want to support fully conformant C code with GNU extensions
disabled.
  2) We want to support GNU extensions, but emit ext-warns when they
are used.
  3) If a subexpr is a GNU ice, we don't want to emit warnings
multiple times in the subexpr. The canonical horrible example would
be test/Sema/darwin-align-cast.c

I agree so far.

2. Change Expr::isIntegerConstantExpr to be something like this:

  // See if this expr is a real i-c-e.
  unsigned Diag = isIntegerConstExprHelper(this, ..., false)
  if (Diag == 0) return true;

  // See if it is a gnu one.
  if (!LangOptions.NoExtensions) {
    if (isIntegerConstExprHelper(this, ..., false) == 0) {
      Emit Warning about using GNU i-c-e.
      return true;
    }
  }
  return false;

This is where it get really messy... the issue is that saying that
something is an integer constant expression has indirect effects in
some cases. Non-ICE array bounds indicate a VLA, and the
compatibility rules for those are different from those for constant
arrays. A conditional expression with a null pointer constant can
silently end up with a different type from an conditional with a
pointer that happens to evaluate to zero. These are edge cases, but
-pedantic shouldn't affect semantics.

I think the right approach is what we're doing in
Sema::TryFixInvalidVariablyModifiedType: if we run into a situation
where we print a warning/error, but gcc accepts the code, try to
recover at the point where we would print the diagnostic.

If it's convenient, we could add a "RequireIntegerConstantExpr" method
which does roughly "if (Expr->ICE()) return val; if
(Expr->TryEvaluate()) {pedwarn(); return val;} error(); return
failure;".

Downsides to this approach: it's possible it will let stuff slip by
without a warning in non-pedantic mode that gcc wouldn't accept, and
this will leave constructs like the glibc tgmath.h broken.

On a related note, I'm considering rewriting
Expr::isIntegerConstantExpression() to delegate actual calculation to
Expr::TryEvaluate() at some point, to reduce code duplication.

3. Change Expr::isIntegerConstantExpr to return the diag reason up to
its callers so they can emit more specific diags in some cases.

We should definitely do this eventually in any case.

-Eli

This is where it get really messy... the issue is that saying that
something is an integer constant expression has indirect effects in
some cases.

Yeah, I know :(. Enabling extensions will change how code is sema'd. Unfortunately, I don't think this is avoidable.

Non-ICE array bounds indicate a VLA, and the
compatibility rules for those are different from those for constant
arrays. A conditional expression with a null pointer constant can
silently end up with a different type from an conditional with a
pointer that happens to evaluate to zero. These are edge cases, but
-pedantic shouldn't affect semantics.

I agree... unfortunately, it does with GCC. GCC also handles enums differently in some cases when -pedantic is enabled.

I think the right approach is what we're doing in
Sema::TryFixInvalidVariablyModifiedType: if we run into a situation
where we print a warning/error, but gcc accepts the code, try to
recover at the point where we would print the diagnostic.

If it's convenient, we could add a "RequireIntegerConstantExpr" method
which does roughly "if (Expr->ICE()) return val; if
(Expr->TryEvaluate()) {pedwarn(); return val;} error(); return
failure;".

Downsides to this approach: it's possible it will let stuff slip by
without a warning in non-pedantic mode that gcc wouldn't accept, and
this will leave constructs like the glibc tgmath.h broken.

Yeah, another way to say this is that some real code *depends* on GCC's mis-parsing code. For example, the "generous" null-pointer-constant case is needed in some situations.

On a related note, I'm considering rewriting
Expr::isIntegerConstantExpression() to delegate actual calculation to
Expr::TryEvaluate() at some point, to reduce code duplication.

Ok!

-Chris

This is where it get really messy... the issue is that saying that
something is an integer constant expression has indirect effects in
some cases.

Yeah, I know :(. Enabling extensions will change how code is sema'd.
Unfortunately, I don't think this is avoidable.

We *really* want to avoid this if at all possible. For issues
affecting system headers, this merely adds one more step to the steps
to reproduce: "add -pedantic to the clang commandline". And it's
extremely unintuitive.

Non-ICE array bounds indicate a VLA, and the
compatibility rules for those are different from those for constant
arrays. A conditional expression with a null pointer constant can
silently end up with a different type from an conditional with a
pointer that happens to evaluate to zero. These are edge cases, but
-pedantic shouldn't affect semantics.

I agree... unfortunately, it does with GCC. GCC also handles enums
differently in some cases when -pedantic is enabled.

15236 – pedantic switch modifies treatment of non-ISO compliant enumerations? Code depending on
that isn't even compatible with gcc without -pedantic, so I don't
think we need to worry about that.

I think the right approach is what we're doing in
Sema::TryFixInvalidVariablyModifiedType: if we run into a situation
where we print a warning/error, but gcc accepts the code, try to
recover at the point where we would print the diagnostic.

If it's convenient, we could add a "RequireIntegerConstantExpr" method
which does roughly "if (Expr->ICE()) return val; if
(Expr->TryEvaluate()) {pedwarn(); return val;} error(); return
failure;".

Downsides to this approach: it's possible it will let stuff slip by
without a warning in non-pedantic mode that gcc wouldn't accept, and
this will leave constructs like the glibc tgmath.h broken.

Yeah, another way to say this is that some real code *depends* on GCC's
mis-parsing code. For example, the "generous" null-pointer-constant case is
needed in some situations.

The only code which can't be fixed using my approach is code which
depends on types being resolved in a non-compliant way. The only such
code we've run into so far is glibc's tgmath.h. However, since that's
a system header, it can't be fixed in any way that depends on the
compiler flags. So your proposal doesn't actually solve any more
known broken cases than mine does.

If we really do end up having to do something like your proposal, it
at least shouldn't depend on the -pedantic flag. This might be a case
where should distinguish between c99 and gnu99 modes.

-Eli