Better type-specifier representation

Here's a proposal, collecting the arguments into one post:

Modifying Type to better represent 'type-specifier' in the syntax:

As Ted mentioned, there's a problem on how we represent struct declarations inside type-specifiers:

Example:

void f() {
   typedef struct x { int y; } foo; // #1
}

Syntactically this is what we have for #1:

typedef struct x { int y; } foo; -> 'typedef' type-specifier 'foo' ';'

and we treat it as: 'typedef' ['struct x' reference] 'foo' ';'

There's no information that 'struct x' is defined inside the typedef declaration. Also it's not clear who owns the RecordDecl.
I suggest that we treat it as:
'typedef' ['struct x' definition] 'foo' ';'
resulting in a RecordTypeDef Type for the TypedefDecl. This gives us the information that we have a 'struct x' definition inside the type-specifier of #1.

What about ownership of RecordDecl ?

There will be a 1-to-1 mapping between RecordTypes and RecordDecls,

struct x; -> A new incomplete RecordDecl is created here.
struct x {}; -> The previously created RecordDecl is fully defined.

and RecordType will own it. What is the advantage of that ? Ownership matters when a client wants to modify the AST. When the client says "I want to replace this RecordDecl with another one" it usually means "Everywhere in this translation unit, if a Type references this RecordDecl, I want it to reference this RecordDecl instead.". The client will accomplish this by replacing the one RecordDecl owned by the RecordType with another one.
The alternative would be replacing all the RecordDecls (forward decls and defs) present, *and* replacing the RecordDecl that the RecordType references.
Also, by having only one RecordDecl we avoid the overhead of creating multiple RecordDecls.

If the two constructs above are not RecordDecls owned by the translation unit, how are they represented in the AST ?

Syntactically they look like this:

struct x; -> type-specifier ';'
struct x {}; -> type-specifier ';'

similar to this, which we currently ignore:
int; -> type-specifier ';'

I suggest representing them with a kind of 'TypeSpecDecl' declaration node. This will allow us to also capture "int;" in the AST:
int; -> TypeSpecDecl with type 'int'
struct x {}; -> TypeSpecDecl with type RecordTypeDef defining RecordDecl 'x'
struct x; -> TypeSpecDecl with type RecordTypeRef, referencing RecordDecl 'x'

Other occurences of type-specifier are represented in a consistent way:

int foo(int x, struct { int a; } y);

'y' gets a new RecordTypeDef type, which owns the RecordDecl. We avoid the overhead of using multiple DeclGroups for the parameters.

sizeof (struct { int x;}) -> 'sizeof' '(' type-specifier ')' -> A new RecordTypeDef type passed to sizeof.

This approach of handling records will also apply to enums, basically it's about handling the tag declarations appearing inside type-specifiers. Consequently, this approach does not apply to typedefs:

typedef int x; -> syntactically (and semantically) 'typedef' type-specifier 'x' ';' -> a TypedefDecl 'x'

Typedefs don't have the complications of tag types, they will be handled the same way as now.

Any thoughts?

-Argiris

Hey Argiris,

Question below...

Here's a proposal, collecting the arguments into one post:

Modifying Type to better represent 'type-specifier' in the syntax:

As Ted mentioned, there's a problem on how we represent struct
declarations inside type-specifiers:

Example:

void f() {
  typedef struct x { int y; } foo; // #1
}

Syntactically this is what we have for #1:

typedef struct x { int y; } foo; -> 'typedef' type-specifier 'foo' ';'

and we treat it as: 'typedef' ['struct x' reference] 'foo' ';'

There's no information that 'struct x' is defined inside the typedef
declaration. Also it's not clear who owns the RecordDecl.
I suggest that we treat it as:
'typedef' ['struct x' definition] 'foo' ';'
resulting in a RecordTypeDef Type for the TypedefDecl. This gives us the
information that we have a 'struct x' definition inside the
type-specifier of #1.

I don't understand why this problem is specific to typedef's.

Don't we have the same problem for the following...

void f() {
   struct bar;
   struct bar { int a; } barVar1;
   struct bar barVar2;
}

How would this be represented?

snaroff

Hi Steve,

steve naroff wrote:

I don't understand why this problem is specific to typedef's.

Don't we have the same problem for the following...

void f() {
  struct bar;
  struct bar { int a; } barVar1;
  struct bar barVar2;
}

How would this be represented?

You are absolutely right, it's not specific to typedef's (I just took Ted's first example), thanks for pointing it out.

I suggest handling your example like this:

struct bar; -> TypeSpecDecl of type RecordRefType (a incomplete RecordDecl will be created)
struct bar { int a; } barVar1; -> DeclGroup with one VarDecl of type RecordDefType (the previous RecordDecl will be defined)
struct bar barVar2; -> DeclGroup with one VarDecl of type RecordRefType (the previous RecordDecl will be referenced)

The RecordDecl will be owned by one RecordType. The RecordRefType/RecordDefType resolve to this one RecordType.
Or just have a RecordDefType, not RecordRefType (the RecordType could be used for reference).

-Argiris

I think what is clear from this discussion is that we need "something else" to represent the information that is missing from the ASTs. Whether or not we call it a type-specifier, a DeclGroup, or something else, I think we are fundamentally trying to recover some syntactic information that we are now losing.

That said, I think changing the semantics of Type and its subclasses in order to accomplish this will have a variety of unpleasant complications. Right now, because types are uniqued, we can easily tell whether or not two variables are of the same type or of compatible types. This is really an elegant design from the perspective of clients (as well as Sema). By introducing different Type objects for what is conceptually speaking the same type (in the type system) that clean representation of the concept of types is discarded. This will seriously complicate clients of the ASTs that want to reason about types. Changing the meaning of types in this way thus seems like a pyrrhic victory; while we will recover some syntactic information, we will lose a great deal of semantic information (which will ultimately complicate the implementation of many clients of the ASTs).

I think what is clear from this discussion is that we need "something else" to represent the information that is missing from the ASTs. Whether or not we call it a type-specifier, a DeclGroup, or something else, I think we are fundamentally trying to recover some syntactic information that we are now losing.

That said, I think changing the semantics of Type and its subclasses in order to accomplish this will have a variety of unpleasant complications. Right now, because types are uniqued, we can easily tell whether or not two variables are of the same type or of compatible types. This is really an elegant design from the perspective of clients (as well as Sema). By introducing different Type objects for what is conceptually speaking the same type (in the type system) that clean representation of the concept of types is discarded. This will seriously complicate clients of the ASTs that want to reason about types. Changing the meaning of types in this way thus seems like a pyrrhic victory; while we will recover some syntactic information, we will lose a great deal of semantic information (which will ultimately complicate the implementation of many clients of the ASTs).

Hi Ted,

Ted Kremenek wrote:

I think what is clear from this discussion is that we need "something else" to represent the information that is missing from the ASTs. Whether or not we call it a type-specifier, a DeclGroup, or something else, I think we are fundamentally trying to recover some syntactic information that we are now losing.

That said, I think changing the semantics of Type and its subclasses in order to accomplish this will have a variety of unpleasant complications. Right now, because types are uniqued, we can easily tell whether or not two variables are of the same type or of compatible types. This is really an elegant design from the perspective of clients (as well as Sema). By introducing different Type objects for what is conceptually speaking the same type (in the type system) that clean representation of the concept of types is discarded. This will seriously complicate clients of the ASTs that want to reason about types

I don't think it will complicate the clients, the clients that reason about types use 'canonical' types:

typedef int A;
int x;
A y;

if a client wants to see if the types of x,y are compatible it uses canonical types.
If it wants to see if they are exactly the same type, it uses the type pointers.

struct S{} x;
struct S y;

if a client wants to see if the types of x,y are compatible it uses canonical types, no complication involved.
If it wants to see if they are exactly the same type, what should the answer be ? I think this the essential difference in discussion, is "struct S" and "struct S{}" exactly the same type ?

The parser knows the difference between a type reference and a type definition, shouldn't the type system know too ? What problems can we expect for clients caused by saying "struct S" is not exactly the same as "struct S{}" ?

(The memory ownership of RecordDecls, is a non-issue; we can have all RecordDecls owned by the translation unit and types don't have to do anything with this.)

-Argiris

Ted/Argiris,

Comments below...

I think what is clear from this discussion is that we need "something
else" to represent the information that is missing from the ASTs.
Whether or not we call it a type-specifier, a DeclGroup, or something
else, I think we are fundamentally trying to recover some syntactic
information that we are now losing.

That said, I think changing the semantics of Type and its subclasses
in order to accomplish this will have a variety of unpleasant
complications. Right now, because types are uniqued, we can easily
tell whether or not two variables are of the same type or of
compatible types. This is really an elegant design from the
perspective of clients (as well as Sema). By introducing different
Type objects for what is conceptually speaking the same type (in the
type system) that clean representation of the concept of types is
discarded. This will seriously complicate clients of the ASTs that
want to reason about types. Changing the meaning of types in this way
thus seems like a pyrrhic victory; while we will recover some
syntactic information, we will lose a great deal of semantic
information (which will ultimately complicate the implementation of
many clients of the ASTs).

I agree. Designing the AST's to serve multiple masters can be challenging...

complicate semantic analysis and code generation (or result in slower compile-time)

It's clear that we call agree it would be nice if...

a) User-defined types (structs/unions/enums) could distinguish between references and definitions.
b) Declarators could preserve grouping information.

In Ted's proposal, DeclGroup includes the user-defined type as an explicit "Decl".
In Argiris's proposal, the user-defined type is excluded from the DeclGroup and implemented as an explicit "Type".

Since Ted's proposal doesn't effect the Type system, I understand why it wouldn't complicate semantic analysis/etc.

Argiris, have you looked at how your proposal would effect the Type system?

Curious,

snaroff

Hi Argiris,

To start, let me just say that I think there are a few different
things going on in this
thread, and its a bit hard to discuss any of them (especially over
email), so it would
be good to break them up into separate problems where possible.

A few comments:

First, I am not clear exactly what problem you are solving here. Are
you proposing
this as an alternative to DeclGroups? And what are the problems you
are addressing?

Second, it would be nice to conceptually separate the solutions to the multiple
RecordDecl issue and the DeclGroup proposal. They overlap to some extent, it
is true, but I think they are separate issues and should be solved as such.

Third, in my mind encoding any part of the AST using Types is not a
good design. Having
them be related to one another is ok, but having things embedded in
the Type should
be an implementation detail, not something forced by the Parser or
even Sema. I am
especially wary of a design which has Types owning part of the AST.

Fourth, the main idea of DeclGroup was simply to capture the
"declarator-list" part of the
C grammar. I think this is what Ted meant about begin more faithful to
the grammer.

Fifth, if the problem you are solving is the ambiguity in whether a
structure use is a
definition or a declaration, then I agree that this should be solved,
but I do not think
it should be solved by introducing new types. I argued with Ted that
we should introduce
additional Decls to capture the syntactic (and semantic) distinction between:
   a. Introducing a new structure definition (which may be incomplete)
into scope.
   b. Referring to a previously declared structure.
However, in the end I decided that the introduction of this could be
delayed until after
the other changes to RecordDecl and the introduction of DeclGroup.
Those are more
important for solving ownership issues, which are in turn blocking
serialization.

More concretely, I believe there are some issues with your proposal:

(1) What is the representation of
  typedef struct x { ... } a, b, d, e ..., z;
Representing this, and capturing ownership, was really the key part of
DeclGroup.

(2) Re:

'y' gets a new RecordTypeDef type, which owns the RecordDecl. We avoid
the overhead of using multiple DeclGroups for the parameters.

This is a bit misleading, for all the common important cases DeclGroup can be
encoded in a way that doesn't have any additional memory overhead (and the
performance overhead of access through a smart pointer should be negligible).
Actually by my calculations we were going to save memory using DeclGroup.

I'd like to make sure we are all on the same page about what problems we
are solving (and break them down as much as possible) before going into to
many specifics about the solution.

- Daniel

All good points…

Sometimes it’s useful to consult the spec to help weigh various design points.

That said, C99 6.7.2.3p3 says…

All declarations of structure, union, or enumerated types that have the same scope and
use the same tag declare the same type. The type is incomplete109) until the closing brace
of the list defining the content, and complete thereafter.

Since both x and y have the same type, I don’t see any compelling reason to alter the type system.

struct S{} x;
struct S y;

If not, I think the DeclGroup solution will work fine (and appears simpler).

snaroff

Hi Daniel,

Daniel Dunbar wrote:

Fifth, if the problem you are solving is the ambiguity in whether a
structure use is a
definition or a declaration, then I agree that this should be solved,
  
Yes, this is the problem.

More concretely, I believe there are some issues with your proposal:

(1) What is the representation of
  typedef struct x { ... } a, b, d, e ..., z;
Representing this, and capturing ownership, was really the key part of
DeclGroup.
  
A DeclGroup, of TypedefDecls, which have type RecordDefType, pointing at RecordDecl 'x'.
The DeclGroup doesn't have to point and own the RecordDecl.
The RecordDecl can be uniqued and be owned by the translation unit.

but I do not think
it should be solved by introducing new types. I argued with Ted that
we should introduce
additional Decls to capture the syntactic (and semantic) distinction between:
   a. Introducing a new structure definition (which may be incomplete)
into scope.
   b. Referring to a previously declared structure.
However, in the end I decided that the introduction of this could be
delayed until after
the other changes to RecordDecl and the introduction of DeclGroup.
Those are more
important for solving ownership issues, which are in turn blocking
serialization

This is what I'm suggesting against: over-complicating the AST by introducing various Decls or Exprs for providing information for something that is fundamentally part of a type-specifier.

-We will introduce additional Decls for distinction between new type definition or reference in declarations (simpler option is not to introduce additional decls)
-Use DeclGroup for function parameters (even if there's no overhead, using ParmVarDecl is the simpler option)
-introduce some kind of decl or expression for the struct in "sizeof (struct {})" (simpler option is to use a Type, as we do now)
-The same for casts "(struct{}) x"
-in C++ use additional decl (or DeclGroup ?) for a condition declaration: "if (struct {}* x = 0) {}" (simpler option is to use a single VarDecl as now)
-Something to distinguish C++ new: "new struct{}()"

There must be other places where trying to distinguish between reference/definition is going to complicate things, basically anywhere that a type may be used, this is off the top of my head.

All these cases can be handled uniformly and in a much simpler way by introducing the RecordDefType, and making the distinction based on the thing which actually carries the ambiguity (the type-specifier).

-Argiris

steve naroff wrote:

All good points...

Sometimes it's useful to consult the spec to help weigh various design points.

That said, C99 6.7.2.3p3 says...

All declarations of structure, union, or enumerated types that have the same scope and use the same tag declare the same type. The type is incomplete109) until the closing brace of the list defining the content, and complete thereafter.

Since both x and y have the same type, I don't see any compelling reason to alter the type system.

struct S{} x;
struct S y;

C99 6.7.7p7 says:
All three of the following declarations of the signal function specify exactly the same type

typedef void fv(int), (*pfv)(int);
void (*signal(int, void (*)(int)))(int);
fv *signal(int, fv *);
pfv signal(int, pfv);

And we don't actually use "exactly the same type", we use TypedefType to capture more syntactic information.

From my perspective, we need to have a compelling reason to fold this into the type system.

If not, I think the DeclGroup solution will work fine (and appears simpler).

See here:
http://lists.cs.uiuc.edu/pipermail/cfe-dev/2008-September/002807.html
about why I think folding it in the type is simpler.

-Argiris

Okay, I think I understand where you are coming from. And I agree
about everything except using the types to encode the type specifier. :slight_smile:

If we want to explicitly represent the type specifier, what is wrong with
actually making this a separate node (not a Type and not a Decl). We can
probably present the abstraction that the AST has TypeSpecifier nodes
without actually creating them (by encoding in DeclGroup or other tricks).

This also gives a clear place to attach source range, makes clear what the
representation of sizeof(type) is (a SizeofAlignOfTypeExpr which points to
a TypeSpecifier which in turn points to the type), and should make ownership
clear.

I don't see any clear reason why, if we want this, we should embed it into
the type (which is fundamentally incorrect, a type specifier is not a type, many
things have different type specifiers but the same type).

- Daniel

Hi Argiris,

I'm fine with introducing the notion of a type-specifier into the AST, and it would potentially clean up a whole bunch of loose ends and more closely follow the grammar. From the points you have made, I can see how it would be very useful, and is compatible with the notion of a DeclGroup. In fact, the "prefix" of the DeclGroup could just be a type-specifier object.

My feeling is that type-specifiers are different than types. They aren't synonymous; one is a syntactic construct, the other a semantic concept, and they shouldn't be treated as being the same in the ASTs. I think that type-specifiers could possibly be encoded as a smart pointer or a similar class (similar to QualType) to make the overhead negligible.

Ted

To both Ted and Daniel:

Ted Kremenek wrote:

My feeling is that type-specifiers are different than types. They aren't synonymous; one is a syntactic construct, the other a semantic concept, and they shouldn't be treated as being the same in the ASTs. I think that type-specifiers could possibly be encoded as a smart pointer or a similar class (similar to QualType) to make the overhead negligible.

Daniel Dunbar wrote:

I don't see any clear reason why, if we want this, we should embed it into
the type (which is fundamentally incorrect, a type specifier is not a type, many
things have different type specifiers but the same type).
  
These are good points, I agree. Putting type-spec info in the type is the wrong way to go.

Daniel Dunbar wrote:

If we want to explicitly represent the type specifier, what is wrong with
actually making this a separate node (not a Type and not a Decl). We can
probably present the abstraction that the AST has TypeSpecifier nodes
without actually creating them (by encoding in DeclGroup or other tricks).

This also gives a clear place to attach source range, makes clear what the
representation of sizeof(type) is (a SizeofAlignOfTypeExpr which points to
a TypeSpecifier which in turn points to the type), and should make ownership
clear.

This is a great idea!

How about using something like this:

-In the AST, Decls do not have Types, they have TypeSpecifiers. Exprs have Types.
-TypeSpecifiers can own RecordDecls
-DeclGroups are used for declaration lists:
      int x,y,z; -> DeclGroup
  but not for parameters. If the ParmVarDecl has a TypeSpecifier, we don't have to use a DeclGroup in its place.

Let me know what you think.

-Argiris

This sounds like the right approach to me!

The one outstanding grossness that I can think of right now is VLAs. How should we represent:

   int x[foo()][bar()][...], y;

Right now, the type for the VLA owns the expressions foo(), bar(), and so on. Both DeclGroups and TypeSpecifiers don't solve this problem (I believe).

steve naroff wrote:

All good points…

Sometimes it’s useful to consult the spec to help weigh various design points.

That said, C99 6.7.2.3p3 says…

All declarations of structure, union, or enumerated types that have the same scope and use the same tag declare the same type. The type is incomplete109) until the closing brace of the list defining the content, and complete thereafter.

Since both x and y have the same type, I don’t see any compelling reason to alter the type system.

struct S{} x;

struct S y;

C99 6.7.7p7 says:
All three of the following declarations of the signal function specify exactly the same type

typedef void fv(int), (*pfv)(int);
void (signal(int, void ()(int)))(int);
fv *signal(int, fv *);
pfv signal(int, pfv);

And we don’t actually use “exactly the same type”, we use TypedefType to capture more syntactic information.

Typedef’s are an explicit language feature for introducing user-defined synonyms.

The only reason for representing TypedefType explicitly in the type system is to output user-defined names in diagnostics. In fact, I wish we didn’t have to deal with them explicitly in the type system. When I started writing Sema, I can’t tell you how many bugs resulted in accidentally operating on the non-canonical type. Nevertheless, the pain was worth the gain (for typedefs). I’m not convinced adding more pain for the case above is worth it. Nevertheless, I see the point you are making a bit more clearly…

From my perspective, we need to have a compelling reason to fold this into the type system.

If not, I think the DeclGroup solution will work fine (and appears simpler).

See here:
http://lists.cs.uiuc.edu/pipermail/cfe-dev/2008-September/002807.html
about why I think folding it in the type is simpler.

This helps me understand the problems you are considering, however I still don’t think folding it into the type system is the right thing. Adorning a Decl with more type-specifier info is fine though.

snaroff

Ted Kremenek wrote:

The one outstanding grossness that I can think of right now is VLAs. How should we represent:

  int x[foo()][bar()][...], y;

Right now, the type for the VLA owns the expressions foo(), bar(), and so on. Both DeclGroups and TypeSpecifiers don't solve this problem (I believe).

Is this restricted to VLAs ?
Doesn't this have the same issue ?:

const int x=10;
int y[x+1]; // type owning expression "x+1"

Huh. Does gcc treat 'y' and as a VLA or as a constant-sized array? I guess gcc's constant folding built into the frontend would make that a constant-sized array.

I guess this isn't (strictly-speaking) a VLA since this is example is C++ code.

Nevertheless, should we treat these as VLAs in the ASTs?

Ted Kremenek wrote:

Ted Kremenek wrote:

The one outstanding grossness that I can think of right now is VLAs. How should we represent:

int x[foo()][bar()][...], y;

Right now, the type for the VLA owns the expressions foo(), bar(), and so on. Both DeclGroups and TypeSpecifiers don't solve this problem (I believe).

Is this restricted to VLAs ?
Doesn't this have the same issue ?:

const int x=10;
int y[x+1]; // type owning expression "x+1"

Huh. Does gcc treat 'y' and as a VLA or as a constant-sized array? I guess gcc's constant folding built into the frontend would make that a constant-sized array.

Ah I should have mentioned that I was considering the above from the point of C++, which doesn't have the VLA concept.