PATCH: Cleanup function redeclaration representations

While working on some of the preliminaries for overloading, I stumbled
across an issue with the way Clang represents redeclarations of
functions. The basic problem is that, although Clang recognizes
redeclarations, it does not represent them as such. Here's a simple
little example that can be used to illustrate the problem:

    int f(int x) { return x; } // #1
    int f(int x); // #2
    int f(int x) { return x; } // #3

That's an error (f is defined twice), but Clang doesn't diagnose it.
If you swap the first two f's, or the last two f's, Clang does
diagnose it properly.

Why the ordering dependency? Well, when we see the 2nd declaration of
f, Clang notices that there is a previous declaration of 'f'. It then
merges the declarations (as it should) and considers the second
declaration valid... so it pushes the second declaration on shadowed
stack. Now we have two f's in scope, although both refer to the same
function.

When we get around to the third declaration (which is also a
definition, of course), we look for a previous declaration... and find
the second declaration. Since that declaration has no body, Clang
parses the body of f again and attaches it to the third declaration of
f (which has also been pushed onto the stack of shadowed variables and
put into the scope). So we end up with three FunctionDecls on the
stack of shadowed decls for the identifier "f", all of which
conceptually represent the same function.

In this patch, when we notice that we're redeclaring an existing
function, we merge the declarations together but *do not* introduce
the new declaration into the scope (Argiris is doing the same thing
with namespaces in his patch). That way, everyone who refers to a
function "f" will refer to the same FunctionDecl, which will always be
updated with all of the information we have about that function.

Some notes:
  - FunctionDecl::RedeclaredBy is used to update the main FunctionDecl
with information from the newest FunctionDecl, and update the chain of
previous declarations.

  - FunctionDecl::getBody actually traverses the list of
redeclarations to find the function definition. That way, we know
*which* declaration is also a definition.

  - This change breaks CodeGen for the case where one sees the
prototype of a function, generates a call to that function, and later
sees the definition; see test/CodeGen/functions.c.
CodeGenModule::GetAddrOfFunctionDecl currently relies on the later
definition having a different FunctionDecl node (which used to be the
case, but isn't with this patch). I can provide a follow-up patch to
fix this issue, unless someone more familiar with the CodeGen module
wants to deal with it.

  - Doug

clang-redecl.patch (12 KB)

While working on some of the preliminaries for overloading, I stumbled
across an issue with the way Clang represents redeclarations of
functions. The basic problem is that, although Clang recognizes
redeclarations, it does not represent them as such. Here's a simple
little example that can be used to illustrate the problem:

   int f(int x) { return x; } // #1
   int f(int x); // #2
   int f(int x) { return x; } // #3

That's an error (f is defined twice), but Clang doesn't diagnose it.
If you swap the first two f's, or the last two f's, Clang does
diagnose it properly.

Ah, this is an interesting problem and an interesting patch.

In this patch, when we notice that we're redeclaring an existing
function, we merge the declarations together but *do not* introduce
the new declaration into the scope (Argiris is doing the same thing
with namespaces in his patch). That way, everyone who refers to a
function "f" will refer to the same FunctionDecl, which will always be
updated with all of the information we have about that function.

Ok. Makes sense.

Some notes:
- FunctionDecl::RedeclaredBy is used to update the main FunctionDecl
with information from the newest FunctionDecl, and update the chain of
previous declarations.

- FunctionDecl::getBody actually traverses the list of
redeclarations to find the function definition. That way, we know
*which* declaration is also a definition.

Ok.

- This change breaks CodeGen for the case where one sees the
prototype of a function, generates a call to that function, and later
sees the definition; see test/CodeGen/functions.c.
CodeGenModule::GetAddrOfFunctionDecl currently relies on the later
definition having a different FunctionDecl node (which used to be the
case, but isn't with this patch). I can provide a follow-up patch to
fix this issue, unless someone more familiar with the CodeGen module
wants to deal with it.

I can fix the codegen part after you commit this patch, no worries, just let the test fail.

Some minor thoughts:

What is the intended ownership model of the previous declarations? I see that your patch has functiondecl delete other definitions when the functiondecl is deleted, is this intended, or do you think TranslationUnit should own the previous declarations? Is the intent that a client would walk a declaration list for the translation unit and only see each function once... walking the prev declaration list to see other declarations? This model makes sense to me, but it would be nice to explicitly say this somewhere in a prominent comment.

It is a little strange to me that getBody() can return non-null if isDefinition() return false. How about renaming getBody() -> findBody() and having getBody() just return the local function? Also, is there ever a case where getBody could ever return a FunctionDecl other than 'this'? I thought definitions were always at the front of the list.

Please rename 'RedeclaredBy' as 'MarkRedeclaredBy' or something, to make it more verby. Also, since it is mostly only useful during parsing time, should it be moved to Sema instead of living in the AST?

Overall, the patch looks great. Please feel free to commit when you're happy with the above issues,

-Chris

> While working on some of the preliminaries for overloading, I stumbled
> across an issue with the way Clang represents redeclarations of
> functions. The basic problem is that, although Clang recognizes
> redeclarations, it does not represent them as such. Here's a simple
> little example that can be used to illustrate the problem:

I can fix the codegen part after you commit this patch, no worries, just
let the test fail.

Thanks!

Some minor thoughts:

What is the intended ownership model of the previous declarations? I see
that your patch has functiondecl delete other definitions when the
functiondecl is deleted, is this intended, or do you think TranslationUnit
should own the previous declarations? Is the intent that a client would
walk a declaration list for the translation unit and only see each function
once... walking the prev declaration list to see other declarations? This
model makes sense to me, but it would be nice to explicitly say this
somewhere in a prominent comment.

Yes, this was the intent. Since all of the function declarations
represent the same function, I think it makes the most sense for that
function to only have one entry in the translation unit's list of
declarations, because there really is only one function being
declared. Those (probably few) clients that want to know about *all*
declarations of a function can walk the previous-declaration chains.

It is a little strange to me that getBody() can return non-null if
isDefinition() return false. How about renaming getBody() -> findBody() and
having getBody() just return the local function?

I think perhaps the name of isDefinition is really the cause of the
problem here, and that's the name we should change. I've gone with the
*very* explicit isThisDeclarationADefinition, because this is a
special-purpose question about (essentially) the ordering of the
re-declarations. I only expect it to be used in a few diagnostics,
where we want to say "f is (declared|defined) here".

I think splitting getBody into findBody() (which walks the tree) and
getBody() will end up being confusing. Most of the time, all we care
about is "is there a definition for this function?" Having two
functions to answer that question (which produce the same answer
except in very weird cases; see below) is going to cause clients
trouble.

Also, is there ever a case
where getBody could ever return a FunctionDecl other than 'this'? I thought
definitions were always at the front of the list.

Surprisingly, no. One can actually write, e.g.,

  void f(int x) { }
  void f(int x = 5);

Please rename 'RedeclaredBy' as 'MarkRedeclaredBy' or something, to make it
more verby.

"AddRedeclaration". Given that Argiris is doing the same thing for
namespaces, we should consider moving this up the Decl hierarchy and
making it virtual.

Also, since it is mostly only useful during parsing time,
should it be moved to Sema instead of living in the AST?

I feel like this is an important part of the consistency of the AST.
By keeping this member as a part of the FunctionDecl it allows the
members of FunctionDecl to remain protected, and (I hope) it will be
far easier to remember to keep this function in sync with changes to
FunctionDecl.

Overall, the patch looks great. Please feel free to commit when you're
happy with the above issues,

Thanks! Done:

  http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20080414/005376.html

  - Doug

Hi Doug,

When a function is redeclared, the parser reports the same decl pointer to the ASTConsumer, basically the ASTConsumer gets a new object but with the same
pointer as the previous decl. I think that this complicates things a bit from the aspect of a ASTConsumer.

For example, there's a consumer in clang (SerializationTest used with '-test-pickling'), that stores the decls reported from HandleTopLevelDecl into a vector
so that it can later 'feed' them in the same order to another consumer. This breaks when a redeclaration is reported because it then stores the same pointer
for each redeclaration.

How about instead of linking with previous decl, linking with next decl is used instead. For example:

   int f(int x) { return x; } // #1
   int f(int x); // #2
   int f(int x) { return x; } // #3

-For #1 , parser reports it
-For #2, it is merged with #1, #2 is not introduced in scope, #1 points to #2 (as next decl), and parser reports the decl pointer of #2
-For #3, it is merged with #1, #3 is not introduced in scope, #2 points to #3 (as next decl), and parser reports the decl pointer of #3

I think this is simpler and there's no need to have in AST the addDeclaration method that swaps the contents of decl objects.
The drawback is that there needs to be an additional connection, like #2 and #3 both pointing to #1 as the decl that represents the function.

What do you think ?

-Argiris

Hi Argiris,

When a function is redeclared, the parser reports the same decl pointer to
the ASTConsumer, basically the ASTConsumer gets a new object but with the
same
pointer as the previous decl. I think that this complicates things a bit
from the aspect of a ASTConsumer.

For example, there's a consumer in clang (SerializationTest used with
'-test-pickling'), that stores the decls reported from HandleTopLevelDecl
into a vector
so that it can later 'feed' them in the same order to another consumer.
This breaks when a redeclaration is reported because it then stores the same
pointer
for each redeclaration.

Perhaps the real issue is that we shouldn't be calling
HandleTopLevelDecl for re-declarations, because they aren't declaring
unique entities, e.g., HandleTopLevelRedeclaration.

How about instead of linking with previous decl, linking with next decl is
used instead. For example:

  int f(int x) { return x; } // #1
  int f(int x); // #2
  int f(int x) { return x; } // #3

-For #1 , parser reports it
-For #2, it is merged with #1, #2 is not introduced in scope, #1 points to
#2 (as next decl), and parser reports the decl pointer of #2
-For #3, it is merged with #1, #3 is not introduced in scope, #2 points to
#3 (as next decl), and parser reports the decl pointer of #3

I think this is simpler and there's no need to have in AST the
addDeclaration method that swaps the contents of decl objects.
The drawback is that there needs to be an additional connection, like #2
and #3 both pointing to #1 as the decl that represents the function.

What do you think ?

I had a patch that was a little like this, with one difference: once
#1 was merged into #2 (then #2 into #3), I removed the previous
declaration from the scope and pushed the new declaration. That way,
subsequent lookups of the function get the latest declaration. This is
really important, both for semantic and usability reasons:
semantically, we need all of the information from later declarations
in the FunctionDecl that name lookup finds, e.g.,

  void f(int x); // #1
  void f(int x = 2); // #2
  void g() { f(); } // need to see the default argument from declaration #2

The same thing happens in C, if the first declaration doesn't have a prototype.

So, if name lookup is going to continue to find the first declaration
of a function (as it does in your suggestion and in the patch I
committed), that first declaration either needs to accumulate
information from redeclarations or access to its internals needs to
walk through the chain of redeclarations, e.g., to find default
arguments.

The comment I have about this approach is that we'll end up with
different calls to the same function pointing at different
FunctionDecl nodes. For example:

  void f(int x); // #1
  void g(int x) { f(x); } // call points to #1
  void f(int x); // #2
  void h(int x) { f(x); } // call points to #2

I don't know whether that's a benefit or not :slight_smile: In some sense, it
accurately represents the source code (that's good), but it's also a
little confusing: they both refer to exactly the same function, so why
do they refer to different FunctionDecl nodes?

There's some philosophy behind my approach to function redeclarations:
I want "normal" users of the AST to only see a single FunctionDecl for
each function, and that FunctionDecl should contain all of the
information that the AST contains about that function. Advanced
clients that need to know about re-declarations can then walk the
redeclaration chain from that one FunctionDecl.

  - Doug

Doug Gregor wrote:

Perhaps the real issue is that we shouldn't be calling
HandleTopLevelDecl for re-declarations, because they aren't declaring
unique entities, e.g., HandleTopLevelRedeclaration.
  
There are types of ASTConsumers that would expect to get re/declarations as the parser encounters them. Think of a consumer that colorizes variable/param names
for an IDE. If we hide the redeclarations we make things complicated for them, I'm not sure what the benefit would be.

How about instead of linking with previous decl, linking with next decl is
used instead. For example:

  int f(int x) { return x; } // #1
  int f(int x); // #2
  int f(int x) { return x; } // #3

-For #1 , parser reports it
-For #2, it is merged with #1, #2 is not introduced in scope, #1 points to
#2 (as next decl), and parser reports the decl pointer of #2
-For #3, it is merged with #1, #3 is not introduced in scope, #2 points to
#3 (as next decl), and parser reports the decl pointer of #3

I think this is simpler and there's no need to have in AST the
addDeclaration method that swaps the contents of decl objects.
The drawback is that there needs to be an additional connection, like #2
and #3 both pointing to #1 as the decl that represents the function.

What do you think ?
    
I had a patch that was a little like this, with one difference: once
#1 was merged into #2 (then #2 into #3), I removed the previous
declaration from the scope and pushed the new declaration. That way,
subsequent lookups of the function get the latest declaration. This is
really important, both for semantic and usability reasons:
semantically, we need all of the information from later declarations
in the FunctionDecl that name lookup finds, e.g.,

  void f(int x); // #1
  void f(int x = 2); // #2
  void g() { f(); } // need to see the default argument from declaration #2

The same thing happens in C, if the first declaration doesn't have a prototype.

So, if name lookup is going to continue to find the first declaration
of a function (as it does in your suggestion and in the patch I
committed), that first declaration either needs to accumulate
information from redeclarations or access to its internals needs to
walk through the chain of redeclarations, e.g., to find default
arguments.
  
Yes, that's what I'm proposing. The first decl accumulates the information and/or walks the redeclarations for info.
Like the getBody() in your patch that walks the redeclarations to get the body.

The comment I have about this approach is that we'll end up with
different calls to the same function pointing at different
FunctionDecl nodes. For example:

  void f(int x); // #1
  void g(int x) { f(x); } // call points to #1
  void f(int x); // #2
  void h(int x) { f(x); } // call points to #2

I don't know whether that's a benefit or not :slight_smile: In some sense, it
accurately represents the source code (that's good), but it's also a
little confusing: they both refer to exactly the same function, so why
do they refer to different FunctionDecl nodes?
  
I don't quite understand what you mean. Name lookup should only see and return the first decl (as the function decl that represents 'f' function).
Both calls should point to #1, why will they refer to different FunctionDecl nodes ?

There's some philosophy behind my approach to function redeclarations:
I want "normal" users of the AST to only see a single FunctionDecl for
each function, and that FunctionDecl should contain all of the
information that the AST contains about that function.

I agree. The only difference is in case of

  void f(int x); // #1
  void f(int x = 2); // #2

instead of merging to #2 and swaping contents with #1, thus pointer of #1 now points to #2,
I suggest merging to #1.
In both cases, every use of 'f' will point to the same FunctionDecl node.

That way ASTConsumers like the syntax-colorizer I mentioned and pretty-printer (SerializationTest uses this one), can use redeclarations the same way
as single declarations. If a ASTConsumer cares only about single decls, it can check a decl.isRedeclaration() method and act accordingly.

-Argiris

Argiris Kirtzidis wrote:

There's some philosophy behind my approach to function redeclarations:
I want "normal" users of the AST to only see a single FunctionDecl for
each function, and that FunctionDecl should contain all of the
information that the AST contains about that function.

I agree. The only difference is in case of

void f(int x); // #1
void f(int x = 2); // #2

instead of merging to #2 and swaping contents with #1, thus pointer of #1 now points to #2,
I suggest merging to #1.
In both cases, every use of 'f' will point to the same FunctionDecl node.

The thing is, it seems to me that 'swaping contents' complicates things (from the perspective of a ASTConsumer) without offering some benefit.
Could you explain why it's useful ?

I believe this was Doug’s reason:

That way, subsequent lookups of the function get the latest declaration. This is really important, both for semantic and usability reasons: semantically, we need all of the information from later declarations in the FunctionDecl that name lookup finds, e.g.,

void f(int x); // #1
void f(int x = 2); // #2
void g() { f(); } // need to see the default argument from declaration #2

The same thing happens in C, if the first declaration doesn’t have a prototype.

However, I’m curious if his approach would be able to handle this:

void f(int x = 2); // #1
void f(int x); // #2
void g() { f(); } // need to see the default argument from declaration #1

Doug also said (emphasis added):

So, if name lookup is going to continue to find the first declaration of a function (as it does in your suggestion and in the patch I committed), that first declaration either _*needs to accumulate information from redeclarations*_ or access to its internals needs to walk through the chain of redeclarations, e.g., to find default arguments.

Perhaps accumulation is the approach most agreeable?

  • John

P.S. Howdy everyone! I’ve been watching the mailing list for awhile now, but this is my first correspondence.

Hi Argiris,

Doug Gregor wrote:

> Perhaps the real issue is that we shouldn't be calling
> HandleTopLevelDecl for re-declarations, because they aren't declaring
> unique entities, e.g., HandleTopLevelRedeclaration.
>
>

There are types of ASTConsumers that would expect to get re/declarations as
the parser encounters them. Think of a consumer that colorizes
variable/param names
for an IDE. If we hide the redeclarations we make things complicated for
them, I'm not sure what the benefit would be.

There are other types of ASTConsumers that don't want to see
redeclarations, e.g., a consumer that is parsing documentation for
declarations or generating Python wrappers. We need both declarations
and redeclarations passed to the consumer, and the consumer needs to
be able to tell which is which. That needs to be present in any
scheme.

> So, if name lookup is going to continue to find the first declaration
> of a function (as it does in your suggestion and in the patch I
> committed), that first declaration either needs to accumulate
> information from redeclarations or access to its internals needs to
> walk through the chain of redeclarations, e.g., to find default
> arguments.
>
>

Yes, that's what I'm proposing. The first decl accumulates the information
and/or walks the redeclarations for info.
Like the getBody() in your patch that walks the redeclarations to get the
body.

There's a lot of walking to do, here. We'll need to walk the
redeclaration chain to get parameter names, default arguments, the
return type, the source location, etc., because we want to use the
most recent redeclaration for diagnostic purposes. For example:

  void f(int, int);
  void f(int, int y = 5);

  void g() {
    f(); // error: not enough arguments in call; should refer to the
most recent "f" declaration
  }

Redeclarations can change a lot of things... one can use typedefs for
parameter types (C and C++), rename parameters, add default arguments,
add a prototype (C), change to a different "compatible" type such as
int -> enum (C), etc. All of these would have to walk the
redeclaration chain.

> The comment I have about this approach is that we'll end up with
> different calls to the same function pointing at different
> FunctionDecl nodes. For example:
>
> void f(int x); // #1
> void g(int x) { f(x); } // call points to #1
> void f(int x); // #2
> void h(int x) { f(x); } // call points to #2
>
> I don't know whether that's a benefit or not :slight_smile: In some sense, it
> accurately represents the source code (that's good), but it's also a
> little confusing: they both refer to exactly the same function, so why
> do they refer to different FunctionDecl nodes?
>
>

I don't quite understand what you mean. Name lookup should only see and
return the first decl (as the function decl that represents 'f' function).
Both calls should point to #1, why will they refer to different
FunctionDecl nodes ?

If we agree that both should point to #1, that's great; one of my
alternative patches used a slightly different scheme where the first
call would point to #1 and the second to #2. We can ignore that
approach.

> There's some philosophy behind my approach to function redeclarations:
> I want "normal" users of the AST to only see a single FunctionDecl for
> each function, and that FunctionDecl should contain all of the
> information that the AST contains about that function.
>

I agree. The only difference is in case of

  void f(int x); // #1
  void f(int x = 2); // #2

instead of merging to #2 and swaping contents with #1, thus pointer of #1
now points to #2,
I suggest merging to #1.
In both cases, every use of 'f' will point to the same FunctionDecl node.

Ah, I understand what you mean. I think the important question is: how
much do you merge into #1, and do you preserve the previous
information in #1? Does #1 get the new parameter names from #2? New
typedefs in the parameter types? What about the source location?

  void f(int x); // #1
  typedef int MyFlags;
  void f(MyFlags flags = 0); // #2

What parts of the declaration at #2 will go into the FunctionDecl for
#1? Additionally: when information gets merged into #1, is that same
information that was in the FunctionDecl for #1 lost? (e.g., the
parameter name "x" it used?)

That way ASTConsumers like the syntax-colorizer I mentioned and
pretty-printer (SerializationTest uses this one), can use redeclarations the
same way
as single declarations. If a ASTConsumer cares only about single decls, it
can check a decl.isRedeclaration() method and act accordingly.

We can design the interface either way... a syntax-colorizer could
just override HandleRedeclaration to call the same routine that deals
with single declarations, because it treats them the same. Other tools
can handle redeclarations differently.

  - Doug

Don't forget that redeclarations needn't be at global scope, and the
composite type only applies where the redeclaration is visible; as
soon as it goes out of scope the type of the visible declaration is
the semantic type.

Neil.

Hi John,

However, I'm curious if his approach would be able to handle this:

void f(int x = 2); // #1

void f(int x); // #2
void g() { f(); } // need to see the default argument from declaration #1

Yes, it does. Default arguments from previous declarations get merged
into subsequent declarations. That's why one can write strange things
like:

  void f(int x, int y = 2);
  void f(int x = 1, int y);

That second declaration looks like nonsense, but it works because the
first declaration has a default argument for the second function
parameter.

  - Doug

Neil Booth wrote:-

Don't forget that redeclarations needn't be at global scope, and the
composite type only applies where the redeclaration is visible; as
soon as it goes out of scope the type of the visible declaration is
the semantic type.

Also, there is nothing specific to functions here. It all applies to
variables too.

Neil.

In C++, it isn't a redeclaration if it's not in the same scope
([over.dcl]p1 says as such). One can have multiple declarations at
different scopes that end up referring to the same function, e.g.,

  extern "C" void foo(int);

could occur in several local scopes. However, this doesn't meet the
C++ definition of a redeclaration. Perhaps C is different?

Clang currently considers declarations from different scopes to be
completely different entities. A function declared in a local scope
will be popped off the stack once Clang leaves that local scope, so it
"does the right thing" with these.

  - Doug

Doug Gregor wrote:-

In C++, it isn't a redeclaration if it's not in the same scope
([over.dcl]p1 says as such). One can have multiple declarations at
different scopes that end up referring to the same function, e.g.,

  extern "C" void foo(int);

could occur in several local scopes. However, this doesn't meet the
C++ definition of a redeclaration. Perhaps C is different?

C doesn't define redeclaration; I'm using it to mean a declaration
of an entity already declared. C does specify the (type) semantics in
such cases.

Clang currently considers declarations from different scopes to be
completely different entities. A function declared in a local scope
will be popped off the stack once Clang leaves that local scope, so it
"does the right thing" with these.

For semantic type, provided you get the composite type correct, that
is almost fine (beware interposed local variables blocking visibility
of the outer declaration. Yay for corner cases - there are a scary
number in this area).

However, it's nice to complain about / reject the "x" in things like

void foo1 (void) { extern int x(void); }
void foo2 (void) { extern double x(void); }

$ gcc /tmp/bug.c
/tmp/bug.c: In function 'foo2':
/tmp/bug.c:2: error: conflicting types for 'x'
/tmp/bug.c:1: error: previous declaration of 'x' was here

This is undefined behaviour according to the C standard; but there
is little value in letting such stuff pass.

GCC does this really well now; it gets almost anything no matter how
you might try to fool it. It used to (<3.4 ?) be much less precise.

Neil.

Very nice... it would definitely be helpful to diagnose this sort of problem.

  - Doug

Doug Gregor wrote:-

Hi Doug,

Doug Gregor wrote:

So, if name lookup is going to continue to find the first declaration
of a function (as it does in your suggestion and in the patch I
committed), that first declaration either needs to accumulate
information from redeclarations or access to its internals needs to
walk through the chain of redeclarations, e.g., to find default
arguments.

Yes, that's what I'm proposing. The first decl accumulates the information
and/or walks the redeclarations for info.
Like the getBody() in your patch that walks the redeclarations to get the
body.
    
There's a lot of walking to do, here. We'll need to walk the
redeclaration chain to get parameter names, default arguments, the
return type, the source location, etc., because we want to use the
most recent redeclaration for diagnostic purposes. For example:

  void f(int, int);
  void f(int, int y = 5);

  void g() {
    f(); // error: not enough arguments in call; should refer to the
most recent "f" declaration
  }
  
Are you saying that the most recent declaration is always best to use for diagnostic purposes or are you talking about the specific example ?
What if it was:

  void f(int, int y = 5);
  void f(int, int);

Which decl should the diagnostic use ?

There's some philosophy behind my approach to function redeclarations:
I want "normal" users of the AST to only see a single FunctionDecl for
each function, and that FunctionDecl should contain all of the
information that the AST contains about that function.

I agree. The only difference is in case of

  void f(int x); // #1
  void f(int x = 2); // #2

instead of merging to #2 and swaping contents with #1, thus pointer of #1
now points to #2,
I suggest merging to #1.
In both cases, every use of 'f' will point to the same FunctionDecl node.
    
Ah, I understand what you mean. I think the important question is: how
much do you merge into #1, and do you preserve the previous
information in #1? Does #1 get the new parameter names from #2? New
typedefs in the parameter types? What about the source location?

  void f(int x); // #1
  typedef int MyFlags;
  void f(MyFlags flags = 0); // #2

What parts of the declaration at #2 will go into the FunctionDecl for
#1? Additionally: when information gets merged into #1, is that same
information that was in the FunctionDecl for #1 lost? (e.g., the
parameter name "x" it used?)
  
These are questions that apply to either case of merging:

  typedef int MyFlags;
  void f(MyFlags flags = 0); // #1
  void f(int x); // #2

What parts of the declaration at #1 will go into the FunctionDecl for
#2? Additionally: when information gets merged into #2, is that same
information that was in the FunctionDecl for #2 lost? (e.g., the
parameter name "x" it used?)

That way ASTConsumers like the syntax-colorizer I mentioned and
pretty-printer (SerializationTest uses this one), can use redeclarations the
same way
as single declarations. If a ASTConsumer cares only about single decls, it
can check a decl.isRedeclaration() method and act accordingly.
    
We can design the interface either way... a syntax-colorizer could
just override HandleRedeclaration to call the same routine that deals
with single declarations, because it treats them the same. Other tools
can handle redeclarations differently.
  
The difference is that with 'swaping contents' the consumer cannot assume that it will get different pointers for different decl objects.
SerializationTest pushes the decl pointers into a vector to feed them later to another consumer. With 'swaping contents' it gets
the same pointer for each redeclaration. This translates to feeding the same decl multiple times.
This is what I mean about complicating things with the 'swaping content' bit.

-Argiris

Argiris Kirtzidis wrote:

There's some philosophy behind my approach to function redeclarations:
I want "normal" users of the AST to only see a single FunctionDecl for
each function, and that FunctionDecl should contain all of the
information that the AST contains about that function.

I agree. The only difference is in case of

  void f(int x); // #1
  void f(int x = 2); // #2

instead of merging to #2 and swaping contents with #1, thus pointer of #1
now points to #2,
I suggest merging to #1.
In both cases, every use of 'f' will point to the same FunctionDecl node.
    
Ah, I understand what you mean. I think the important question is: how
much do you merge into #1, and do you preserve the previous
information in #1? Does #1 get the new parameter names from #2? New
typedefs in the parameter types? What about the source location?

  void f(int x); // #1
  typedef int MyFlags;
  void f(MyFlags flags = 0); // #2

What parts of the declaration at #2 will go into the FunctionDecl for
#1? Additionally: when information gets merged into #1, is that same
information that was in the FunctionDecl for #1 lost? (e.g., the
parameter name "x" it used?)
  
These are questions that apply to either case of merging:

typedef int MyFlags;
void f(MyFlags flags = 0); // #1
void f(int x); // #2

What parts of the declaration at #1 will go into the FunctionDecl for
#2? Additionally: when information gets merged into #2, is that same
information that was in the FunctionDecl for #2 lost? (e.g., the
parameter name "x" it used?)

The point here is, given a decl #1 and a redecl #2, is there an advantage in merging #1 -> #2, instead of #2 -> #1 ?

Merging #1 into #2 means that both #1 and #2 contain all of the
information that its corresponding declarations contains. #1 contains
all of the information about the first declaration of "f", including
the exact type (with typedefs and such), parameter names, and which
default arguments were present. #2 contains all of the information
about the second declaration, its exact parameter types, parameter
names, etc., which includes the merging of default arguments.
(Semantically, the merged default arguments become a part of the
declaration at #2).

So, merging #1 into #2 this gives us very precise information about
each of the declaration. I don't see how we can keep all of that
information when merging #2 into #1, unless we do something like the
"swap" trick that I'm already doing. Just merging #2 into #1 directly
means that we'll lose the source information that was in #1. Or
perhaps you are thinking of a different way of doing this merge?

  - Doug

Doug Gregor wrote:

  

The point here is, given a decl #1 and a redecl #2, is there an advantage
in merging #1 -> #2, instead of #2 -> #1 ?
    
Merging #1 into #2 means that both #1 and #2 contain all of the
information that its corresponding declarations contains. #1 contains
all of the information about the first declaration of "f", including
the exact type (with typedefs and such), parameter names, and which
default arguments were present. #2 contains all of the information
about the second declaration, its exact parameter types, parameter
names, etc., which includes the merging of default arguments.
(Semantically, the merged default arguments become a part of the
declaration at #2).

So, merging #1 into #2 this gives us very precise information about
each of the declaration. I don't see how we can keep all of that
information when merging #2 into #1, unless we do something like the
"swap" trick that I'm already doing. Just merging #2 into #1 directly
means that we'll lose the source information that was in #1. Or
perhaps you are thinking of a different way of doing this merge?
  
How about having a merge more like #1 <-> #2 :

(1) -#2 merges default args from #1
(2) -#2 is checked for error diagnostics about default args
(3) -#1 gets default args from #2 \ (4) -(#1)->setNextDeclaration(#2) ---- (maybe 3,4 are performed in addRedeclaration method)
(5) -#2 is returned by the parser but only #1 is visible in scope, thus all uses of the function will point to the same FunctionDecl node (#1)

In the end, both #1 and #2 contain all of the information that its corresponding declarations contain.
Merged default arguments become a part of both declarations.
Name lookup returns #1, which has all the necessary semantic information about default args.

Will this work or am I missing something ?

-Argiris