Am Samstag, dem 01.07.2023 um 14:40 +0000 schrieb Aaron Ballman via LLVM Discussion Forums:
GCCâs extension with forward declared parameters is also suboptimal, which is why WG14 keeps having the discussion about what a decent approach looks like. Forward declared parameters has been rejected multiple times at this point because of the concerns with it as well as concerns with timing of making changes to the standard.
There were now several positive votes that we want to have this feature. It was rejected
for wording and timing issues. If not for timing, we would have this now in C23.
For example:
- In all other aspects of the language,
type ident;
introduces an object declaration, except for forward declarations of parameters
I do not understand this. A parameter (forward) declaration is also an object decleration? Where do
you see any inconsistency?
- That inconsistency is hard to explain, especially given that related languages like C++ already let you do
type ident;
in places where you cannot traditionally do so in C but do introduce declarations (e.g., if (int x; foo(x)) {}
).
I think this is nicely consistent. This is also a declaration which allows you to refer to an
object âxâ of type âintâ. (And I would be in favor of adding this C++ feature also to C.)
- Forward declared parameters run into all sorts of unpleasant edge cases due to the way C works:
I think the nice thing about forward declaration is that they do not really
introduce new edge cases. This implementation in GCC is quite simply.
void func(struct S { int a; } x; int whatever, struct S s); // Is this okay?
This would not be allowed because there is no parameter declaration for âxâ.
This is potentially the only special rule: That we allow forward declarations
only for later parameter declarations. It would also be possible to just allow
arbitrary object declarations and corresponding to paramter and this would
also not cause any fundamental problems, but with this rules I think we can
give better diagnostics.
void func(struct S { int a; } x; int whatever, struct S { int a; } x); // What about this?
In C23 this would be allowed because you can redeclare âstruct Sâ
with content. But this is not a a special thing about parameter forward
declarations, this is exactly how it works everywhere else:
// allowed before C23
static struct S { int a; } x;
static struct S x;
// allowed with C23:
static struct S { int a; } x;
static struct S { int a; } x;
void func(struct S { int a; } x; int whatever, struct S { int b; } x); // This?
With C23 we require redeclaration of structs to have identical content. Again, the same
rules also apply everywhere else, e.g,
static struct S { int a; } x;
static struct S { int b; } x; // invalid in C23
void func(int oh_no[g()]; int whatever, int oh_no[g()]); // How many times is g() called?
For simplicity and consistency, I think it should be called two times
because I do not see why there should be an exceptional rule for parameter
forward declarations that would have its size expression not evaluated when
it is evaluated for other declarations. This also how it works in GCC so there is
existing practice.
I think that size expressions with side effects are generally confusing and
a compiler should warn about those in general. This would affect parameter
declarations, type names in casts and typeof, and potentially VLAs.
We currently do not have the exact same situation for other redeclarations in
the standard (I think), because we have a constraint in other cases that avoids
external or internal linkage (where redeclaration is allowed) with variably
modified types. So adding a similar constraintsimply forbidding VM types
here would also be an option.
We would have to answer the same question if we allowed (as discussed) redeclarations
for compatible typedefs:
typedef int a[g()];
typedef int a[g()];
And I think we should do this to avoid âsame typeâ mess and I think then for
consistency g() should also be called twice.
void func(int no, int oh_no[no = 12]; int whatever, int no, int oh_no[no]); // What
is the value of no on entry to the function?
I think it should have the value 12 to be consistent with
the following example without forward declaration:
void foo(int n, int a[n = 12])
{
// n has value 12
}
- Thereâs challenges with recovering from incorrect syntax with forward declared parameters in order to give good QoI. e.g.,
void f(int a, int b, int a);
was this a typo with the third parameter or was this a missing semicolon after the first? Or void f(int a; int b, int c);
was this a typo with using a semicolon when a comma was intended, or did the user try to rename one identifier and forgot the other?
We do not really have a negative experience in GCC with this feature being
activated by default for many decades. Both your examples would violate a
constraint and there would be a corresponding error message that would tell
the user what is wrong:
https://godbolt.org/z/TqW8nrGex
In fact, - after studying different options - I believe that with delayed parsing such
questions will become more difficult and it will be much harder to give good
diagnostics to users. (Also compilers will have to add a lot of infrastructure that
clang may have for C++ but other compilers currently do not need.)
- Forward declared parameters donât generalize at all into solving the same problem elsewhere in the language. As this RFC points out, there is a need to refer to structure member declarations the same as referring to parameter declarations.
We could also consider some king of forward declarations in structs.
But I think the situation there is a bit different. The size expressions in
structs would not be evaluated when the declaration of the struct is
encountered, but when a member is referenced later.
So semantically, this is very different and it may make sense to keep the
syntax and semantics different:
In GCC, we have
void foo(int n)
{
struct foo { int (*x)[g(n)]; } a ;
}
where the size expression is evaluated when the declaration is encountered
similar to:
void foo(int n)
{
int vla[g(n)];
}
For structs if allowed
int n;
struct foo {
char (*buf)[g(n)];
int n;
} s;
then there is an ambiguity about what this means. The same is true
for attributes:
int n;
struct foo {
char *buf attribute((bounds(g(n)));
int n;
} s;
When we later access âs->bufâ we need to know the value of the size
expression. I am not sure we want to trigger arbitrary (re-)computations
of complex size expressions simply when accessing a member of a struct.
So I think we should have different and new syntax for this feature,
because it will have different semantics from the simply declarations that
we already have in C (in contrast to parameters which I think should
work exactly like other declarations).
A potential solution is to add new syntax with constraints that avoid side effects
struct foo {
char (*buf)[.n];
int n;
};
- We canât reuse these facilities to solve the structure problem because the syntax picked already means âdeclare a member subobjectâ today.
Yes, just using a â;â would not work for structs.
Donât get me wrong, late parsing isnât perfect either. e.g., the idea of âlet me find any identifier that would be in scope once the scope is finally closedâ applies anywhere in the TU (file-scope declarations, local variables, etc) and we donât want to turn C into a language needing two-pass facilities everywhere.
Yes, this is my concern. The problem is that we already allow basically all
of the language in prototypes via size expressions. So once you allow it
there, you already have all the complexity.
I think this was a bad idea not to contrain size expressions to very simple
expressions (e.g. just identifiers), but this is what we have now.
However, late parsing does work reasonably well for the cases where declaration order matters and cannot be changed due to ABI (function signatures and structure layouts) which is the hard problem to solve.
It is the problem forward declarations also solve.
Users can rearrange declarations more easily in other circumstances, but thereâs still a circular reference issue where you want to support something like type foo [[attr(bar)]]; type bar [[attr(foo)]];
Late parsing does not add significant parse time overhead because itâs a pay-for-what-you-use implementation detail.
It adds time when it is used. We want this feature to be used.
It also adds substantial complexity we currently do not require from
compilers. Clang has a philosophy of just accepting this complexity. I assume
because it has this because it is a C++ compiler which accepts C in the same FE,
so it has to accept this cost anyway. But this is not at all true for other compilers.
When parsing a context where this can occur, if you see a reference to an identifier that cannot be found, you only have to late parse that one declaration otherwise declaration parsing occurs as usual.
Sure, but this late parsing can add arbitrary cost to the sitations you have to do it.
And it adds a huge complexity to implement it in the first place. You need to be
able to store the full generic AST before type checking for arbitrary compex expressions
and revisit it later.
This is not how many C compiler works because in C you can do type checking already
when parsing. Smaller C compilers (and also GCC) do this and at that time also emit
diagnostics. So other C compiler may not currently have the support to do late
parsing and this is a substantial burding (essentially a complete FE rewrite) and you
would imposethis to everyone by changing this fundamental property the language.
Given that the vast majority of uses of function declarations will not involve forward declared parameters, the hit to compile times is negligible. You are definitely right that it adds implementation complexity though.
Sch costs tend to accumulate. I guess you could say this also for most C++ features.
Still, in practice, compile time for largers C++ projects is bad.
uecker:
In contrast, the GCC extension is rather simple, fits well with the rules of the C language, is very flexible and powerful without introducing semantic problems with cycles. Just implement this. It may look ugly at first, but technically it is far better in so many ways.
I donât agree that if fits well within the rules of the C language, it definitely is not flexible (itâs only plausible in parameter declarations),
It is very flexible. You mean it can not be applied to structs? This is true, but I think this
needs to have a semantically different solution anyway. So I am fine with new syntax
there.
and it introduces new kinds of problems to be concerned about This is not a situation
where âjust implement thisâ is a good way forward â weâve not implemented this feature despite otherwise going for GCC compatibility specifically because the feature has these undesirable properties and the extension is not
I do not see any undesirable properties. The GCC experience is that it is simple to
implement and does not cause problems. It also solves the problem we have now
just fine for GCC. If clang would support it, we could move on and annotate a
lot of existing interfaces, which would be a huge step forward for safety.
particularly popular in the wild: context:global (?m:^(\w+⌠- Sourcegraph (my ability to write Re2 regexes is pretty limited, so there may be a better way to do that â but spot-checking those results shows almost no usage of the feature; Iâd be curious to know if GCC folks have more details on how frequently they see the extension used).
I think this argument is irrelevant. Any other new solution you want to invent also
has no current users. The reason this is not used is that a) it was useless until
recently (this changed now with better static analysis in GCC) and b) it is not
portable (which would change if clang implements it).
Given the lack of clarity coming out of the WG14 meetings last week (no approach looked to be a âslam dunkâ within the committee) and that the committee wonât be considering new features for a while now, I think this proposal should proceed with whatever approach the authors think best so long as it can be justified to the code reviewers.
I think the authors should consider the technical arguments I gave.
Martin.