[RFC] Rejecting WG14 DR312/N2713 (what is a valid integer constant expression?)

AaronBallman · June 8, 2022, 7:25pm

The C standard allows implementations to define other forms of constant expressions. DR312 went on to clarify that these additional constant expressions are not integer constant expressions (specifically). N2713 was adopted into C2x to ensure the changes from the DR are properly reflected by the standard.

The tl;dr is: we don’t conform to that, conforming to it could plausibly break code, and I’m wondering if we would like to explicitly reject that DR as not being plausible for us to implement. At this point, my recommendation is that we reject it.

Minutae

In C, and integer constant expression is specifically defined as (C2x 6.6p6): An integer constant expression shall have integer type and shall only have operands that are integer constants, enumeration constants, character constants, predefined constants, sizeof expressions whose results are integer constants, alignof expressions, and floating constants that are the immediate operands of casts. Cast operators in an integer constant expression shall only convert arithmetic types to integer types, except as part of an operand to the typeof operators, sizeof operator, or alignof operator.

Whether something is an integer constant expression or not has further ramifications on the language. For example, with array declarators (C2x 6.7.6.2p4): … If the size is an integer constant expression and the element type has a known constant size, the array type is not a variable length array type; otherwise, the array type is a variable length array type.

Examples

// Clang treats this as a constant array type, not a VLA. However, an ICE cannot
// include a function call, and calling a builtin is a function call. It's valid for us to
// treat this as an extension to the constant expression rules, but it's not valid for
// us to treat it as an ICE.
int array1[__builtin_constant_p((1,2))]; // int[1] (which is also wrong!)
// NB: Clang treat many builtins as valid in an ICE, so this is not specific to just
// this one builtin.

int array2[(1,2)]; // Amusingly, we get it correct here and create a VLA (which we
                   // then constant fold as an extension to be a constant array).

// _Generic is not one of the listed valid expressions for an ICE.
int array3[_Generic(1, int : 10, default : 0)]; // int[10] instead of a VLA

// The declaration of a bit-field also requires an integer constant expression.
// A compound expression is not a valid integer constant expression, but
// Clang accepts this code anyway as a GCC extension. GCC does not
// accept this code in C.
struct S {
  int i : (int){12}; // Should be an error, but Clang accepts with an extension warning
};

// ?: is not an allowed expression within an ICE.
int array4[1 ? 1 : 1]; // int[1] instead of VLA

This demonstrates that the situation is pretty complicated for us. Even if the above examples seem a bit silly, 1) it’s not an exhaustive list, I’m reasonably sure there are more examples, 2) there are plenty of demonstrations that we do things differently than the standard allows. But changing the behavior runs the risk of breaking code both loudly and quietly. In the best cases, the user will start to get new diagnostics about using an invalid ICE when one is required. In the worst cases, array types will suddenly change from VLAs to not be VLAs or vice versa.

Because of the rather high potential for breaking code in difficult to track ways, I figured it was worth asking the Clang community whether we expect to ever implement this DR (and related C2x paper). If we don’t expect to change, I can update our status pages to clarify the situation and give some explanation as to why we deviate.

erichkeane · June 8, 2022, 7:36pm

What are the cases we have where this turns a non-VLA into a VLA? Also, what observable fallout comes from the inverse (that is, treating a VLA as a stack variable?). It would seem to me that “Allocate a VLA at complie-time instead of alloca’ing it at runtime” would be a valid implementation, though there are some type-system-based issues with this, right?

I’m a little sympathetic toward the examples with comma, generic, and conditional operators (as these are not extensions, they places we just do the standards-wrong-thing), but this is a case where I don’t see why WG14 shouldn’t just ‘fix’ it to bless Clang’s implementation. I would imagine ‘side effects’ in a conditional, _Generic, or comma operator would already be a non ICE, so concerns related to that aren’t interesting.

As far as the builtins and ‘S’ example(which I think are your concerns with DR312?): I tend to agree that we should reject the DR. Telling an implementation that “you can have a constant expression that results in an integer, but it cannot be considered an ICE” (would an __int128 literal not be an ICE here?) is overstepping here on WG14’s part. And I don’t think we should give our users a worse experience here because of it.

jyknight · June 8, 2022, 8:00pm

Interesting!

Some of your examples seem like they are standards defects (or maybe “missing features in the standard”): it especially seems problematic that _Generic and ?:, are not supposed to be ICE. I think they ought to be.

__builtin_constant_p (and friends) are not really a problem: these are reserved identifiers, so we can define them to have absolutely whatever meaning we wish. That’s just a general rule, not specific to ICEs, and an ICE rule doesn’t override it.

For the remaining issues, though – it surprises me that floating-point expressions are accepted as an ICE. I had actually mistakenly thought Clang was trying to be strict as to what it accepts as an ICE vs the “wild-west” of impl-defined constant-expressions it accepted for e.g. a global initializer. But I guess that’s not actually the case. It might be worth exploring how much it code it’d break to fix this subset of non-conformances.

AaronBallman · June 8, 2022, 8:05pm

Situations where we used to treat something as a valid ICE but would then have to treat as a non-ICE and the expression was used in an array declarator. Also, some code constructs disallow VM types (like, you can’t use a VM type as an association in a _Generic selection expression).

I think this is somewhat safer (we already turn VLAs into constant arrays in some circumstances), but stack allocation strategies may differ, C’s version ODR violations (one TU decides it’s a constant expression and another says it’s now a VLA), that sort of thing.

Yup, and it’s those type-system-based issues that make me think we shouldn’t implement this DR. We already support __typeof__ which means the effects can cascade out to other declarations. I think we’re safe from _Generic being a way to see the issue (aside from getting new diagnostics where you didn’t previously).

I tend to agree; it also seems pretty silly that comma expressions aren’t allowed (within a paren expression) except if it’s unevaluted. e.g., int array[(1, 4)]; makes a VLA type while int array[sizeof((1, 4))]; makes a constant array type. However, this is really a question of whether we want to conform to what the standard actually says or whether we want to do something more user-friendly.

I don’t agree with this interpretation. The resolution to DR312 is extremely clear – we may not accept other forms of integer constant expressions. If you think about it – the whole rule is about how extensions behave, really. And calling a builtin is another form of constant expression that we absolutely can support. But we can’t say it’s an ICE, at least according to the standard.

jyknight · June 8, 2022, 8:53pm

However, this is really a question of whether we want to conform to what the standard actually says or whether we want to do something more user-friendly.

If we hadn’t already done so, I’d definitely say we should not have added these extensions ahead of their being accepted by the standard.

But, at this point – since we did (and since they seem pretty unobjectionable), I think it makes more sense to go to WG14 and propose to officially allow them as ICEs in the standard (either as a DR or a future standard). And IMO, we should continue to accept them in all standards modes without complaint, pending a decision. If they’re accepted, leave it be…if they’re rejected…then we have a decision to make at that point.

I don’t agree with this interpretation. The resolution to DR312 is extremely clear – we may not accept other forms of integer constant expressions.

We must not treat other forms as ICEs in a valid C program. But a program using __builtin_constant_p is not a valid C program anymore. That’s the way nonstandard extensions using reserved identifiers always work – and that must be the case. I mean, __builtin_constant_p is certainly not a function call – it doesn’t evaluate its arguments the way a function call would. So, regardless of the ICE issue, what the heck is it?

And, like, __attribute__((...))? That definitely uses the “non-standard, undefined behavior, compiler can do whatever it likes” rule to the max!

Anyways – what to do about floating-point seems like the really tricky issue here. It is quite clear that an FP expression like int[(int)+1.0] must be a VLA – that was the explicit intent of the DR. And we get that wrong. Can we make that change without breaking the world? If we can, I’d argue that we should – and if we cannot, we should go back and tell WG14 about that implementation experience.

jyknight · June 8, 2022, 8:57pm

And we get that wrong

Er…actually, no, we don’t get that wrong. We do consider that a VLA already. Phew!

(Sorry for the confusion!)

efriedma-quic · June 8, 2022, 10:45pm

As far as I know, clang complies with DR312/N2713. There was a relatively recent fix to ensure compliance in -std=gnu99 mode; see ⚙ D89523 PR44406: Follow behavior of array bound constant folding in more recent versions of GCC. .

As @jyknight notes, the standard intentionally does not try to define any rules for code that uses reserved identifiers, to allow room for compilers to implement extensions.

Can you cite the standard here? As far as I can tell, conditional operators and _Generic are allowed.

reinterpretcast · June 8, 2022, 11:20pm

I agree; however, how the restrictions upon “operands” apply to _Generic is not clear:

void g();
int f() {
  int x[_Generic(42.f, int: 42, float: 13)]; // is the "controlling expression" an "operand"?
  return sizeof(*(g(), &x));
}

AaronBallman · June 9, 2022, 11:04am

That is basically the goal I’m trying to march towards. First, I need the Clang community to agree that we won’t be implementing that DR/paper. From there, I can go to WG14 and say “here’s implementation feedback, we’re vetoing this for these reasons, can WG14 instead change the standard so Clang is conforming here?”

Are you confusing a strictly conforming program and a conforming program? Using an extension does not render a C program invalid, just renders it not strictly conforming.

And to be clear, the concerns here are not about user requirements (whether they have a conforming program or not), it’s about implementation requirements (whether Clang is a conforming implementation or not).

It’s an implementation-defined extension that should be documented in terms of its effects within the abstract machine. But regardless of how we define it, unless it’s defined it in terms of what’s already allowed in an ICE, we can’t extend it to be an ICE.

It’s cited at the top of [RFC] Rejecting WG14 DR312/N2713 (what is a valid integer constant expression?). It says that an ICE shall have integer type and shall only have operands that are integer constants, enumeration constants, character constants, predefined constants (_Generic is not a constant so none of these apply), sizeof expressions whose results are integer constants, alignof expressions, and floating-point constants that are the immediate operand of casts (_Generic is none of these expressions). There’s some further constraints about casts which does not apply to this situation.

This appears to be a list of what’s explicitly allowed and anything not on the list is therefore not allowed. p10 then goes on to tell us we can’t extend the list of what’s allowed.

(FWIW, I think this is a bad idea and my goal here is for us to not do this. Alternatively, I think the standard is really unclear and needs to be clarified.)

I think it’s recursive and the _Generic is (also) the “top-level” operand of the constant expression. Otherwise how is a DeclRefExpr an “operand” in int array[some_enum_constant]?

kosarev · June 9, 2022, 11:16am

That cited part doesn’t seem to forbid ?: in ICEs?

AaronBallman · June 9, 2022, 11:26am

Yup, I’m rethinking my position on that particular bit (I think I got it wrong, maybe?). This whole section is confusing (and we’re now discussing it on the WG14 reflectors as well). For example, a primary expression is also not an “operand” so what does “shall only have operands that are” actually mean? e.g., enum E { constant = 10; }; struct S { int bit_field : constant; }; what is the operand to that constant expression?

kosarev · June 9, 2022, 11:58am

I’m also not sure how with the new wording any conforming offsetof() implementations can be possible.

AaronBallman · June 9, 2022, 12:26pm

Implementations are required to make that work via whatever hand-waving exercises we need to go through. 7.20p3 says that the offsetof macro “… expands to an integer constant expression that has type size_t, …”

So by definition, offsetof results in an ICE (however we accomplish it).

kosarev · June 9, 2022, 12:52pm

If I understand it right that N2713 effectively prevents conforming implementations from extending what counts an ICE beyond what is mentioned in p6.6 #6, then I’m not sure I see what those exercises could possibly be.

AaronBallman · June 9, 2022, 1:14pm

We handwave it by implementing offsetof in <stddef.h> as __builtin_offsetof which we claim is an Expr and not a CallExpr (we probably should have claimed it was a ConstantExpr instead…) and treat as though it was an integer constant expression per the standard’s requirements.

So we’re not extending what counts as an ICE with offsetof, we’re implementing offsetof in a way that the standard requires, which is that it be treated as an ICE.

kosarev · June 9, 2022, 1:20pm

Right, so how __builtin_offsetof() is not an N2713’s ‘other form of constant expressions’?

AaronBallman · June 9, 2022, 1:32pm

By definition? I feel like I must be missing something here. The standard requires offsetof to result in an integer constant expression (by definition calls to offsetof are an ICE). We implement that requirement by doing special work to ensure that offsetof results in an integer constant expression. We’re not defining another form of integer constant expression, we’re implementing the standard’s requirement for offsetof.

__builtin_strlen() or other such builtins are not the same thing. These are pure extensions; we could define them in terms that the standard allows for an ICE, but I don’t believe we have (we call them builtin functions and handle them as function call expressions: Compiler Explorer).

kosarev · June 9, 2022, 2:29pm

With N2713 in place that special work seems to formally violate p6.6 #6, I’m afraid, and the ‘which expands to an integer constant expression’ bit doesn’t sound to me like whatever we decide our offsetof() should expand to, it makes it a legal ICE. Quite the opposite – more like it is our responsibility to make sure it’s going to be ICE given the definition in clause 6.

Formalities apart, I think the point was that if you are going to pursue this topic in WG14, maybe these offsetof() concerns could help to add to doubts that N2713 is a good idea considering existing practice, which includes compilers whose offsetof() implementations rely on the normal address-of and member-dereference operators.

AaronBallman · June 9, 2022, 2:49pm

I’m not certain I agree with that interpretation. The definition of offsetof is not providing a constraint on a program (“shall be an integer constant expression”), it’s a requirement for the implementation (it’s defining what offsetof is required to expand to). As another example of this same thing, errno says “The macros are which expand to integer constant expressions with type int, …” and we use that same phrasing for other macro definitions where the expansion is up to the implementation. Similar wording is used for the definition of the INTN_C function-like macros.

Ah, I see where you’re coming from now, thanks! I’m not certain if it’ll be persuasive in WG14 given what I wrote above, but at the same time, I think it is another demonstration that the standard is unclear.

I should note that there’s a fairly recent WG14 document in this area that I hadn’t spotted until its author pointed it out to me: Primary expressions and constant expressions. This hasn’t been seen by WG14 yet, so there’s no disposition for it, but it goes to show that others find this area to also need some tightening. I think that document helps clarify things somewhat, assuming the committee agrees with the intent of it.

reinterpretcast · June 9, 2022, 3:49pm

If it is recursive and applies at every level, then the rule also prohibits (1 + 2) + 3 because (1 + 2) is not one of the explicitly allowed operands.

The alternative argument for disallowing _Generic would be to consider it as being indivisible (i.e., it is a leaf operand).

Topic		Replies	Views
Constant expressions in clang Clang Frontend	0	82	February 13, 2008
integer constant expression oh my! Clang Frontend	16	105	November 18, 2008
clang bug: constant array size is recognized as variable array size Clang Frontend	7	103	May 28, 2008
Does you cfe consider this code in error? Clang Frontend	2	66	June 1, 2008
Semantics of Value Dependent and Integral Constant Expressions Clang Frontend	2	76	November 17, 2017

[RFC] Rejecting WG14 DR312/N2713 (what is a valid integer constant expression?)

Minutae

Examples

Related topics