[PATCH] Function redeclaration and PR2360

The issue in PR2360 (http://llvm.org/bugs/show_bug.cgi?id=2360) is
essentially that the destruction sees the first function definition
multiple times, so the program essentially crashes trying to free it
twice. The first function definition is seen multiple times because
of the funny rearrangements that

This patch rearranges things so that each top-level declaration is
added to the scope in the usual way, rather than messing with the old
declaration. This means that references to a function before a
redeclaration refer to the old declaration, and references to a
function after a redeclaration refer to the new declaration. The
original arrangement where the old declaration was rearranged to look
like new declaration was done in r50021; however, that rearrangement
isn't needed to fix the original bug (test/Sema/redefinition.c).

A nice side effect of this patch is that it makes the representation
of function declarations in the AST more faithful to the original
source, which might be useful for other purposes.

I know there was an extremely long mailing list discussion related to
this stuff, but I'm not entirely sure if any of it is relevant to this
patch.

There's one regression that I found with this patch relating to
codegen of static forward declarations, but I don't think it's
actually a bug in my patch; I'll have to study it a bit more, and I'll
check in a fix for that separately before I check this in.

-Eli

t.txt (4.44 KB)

Hi Eli,

Eli Friedman wrote:

The issue in PR2360 (http://llvm.org/bugs/show_bug.cgi?id=2360) is
essentially that the destruction sees the first function definition
multiple times, so the program essentially crashes trying to free it
twice. The first function definition is seen multiple times because
of the funny rearrangements that

[...]

I know there was an extremely long mailing list discussion related to
this stuff, but I'm not entirely sure if any of it is relevant to this
patch.
  
These kind of issues with "swapping" decls triggered that discussion.

This patch rearranges things so that each top-level declaration is
added to the scope in the usual way, rather than messing with the old
declaration. This means that references to a function before a
redeclaration refer to the old declaration, and references to a
function after a redeclaration refer to the new declaration. The
original arrangement where the old declaration was rearranged to look
like new declaration was done in r50021; however, that rearrangement
isn't needed to fix the original bug (test/Sema/redefinition.c).

A nice side effect of this patch is that it makes the representation
of function declarations in the AST more faithful to the original
source, which might be useful for other purposes.

The consensus was that the same declaration node should be used throughout the AST for the same function, regardless of the redeclarations.
The purpose of "swapping" decls were to allow only one declaration node to be visible in scope and also allow walking through all redeclarations if the client needs them.
During that discussion in mailing list, a better alternative was suggested to accomplish this:
http://lists.cs.uiuc.edu/pipermail/cfe-dev/2008-May/001644.html

IMHO, your patch should go in until someone implements that "proper" solution.

-Argiris

I like the idea of having multiple FunctionDecls, and this will be useful for any client that wishes to perform source-level transformations. For example, refactoring clients will need to know the location of every function declaration corresponding to the same function.

One caveat: if a client inspects two separate DeclRefExprs, each one referring to a different FunctionDecl for the same function, is there a good, standard way to compare if they refer to the same function? It would be nice if we didn't have to do a full traversal of the "PreviousDeclaration" list for both FunctionDecls. From an efficiency standpoint, the number of redeclarations is small, so doing the list traversal itself might not matter, but it might be nice to have something like a Decl::isEqual method that could be used to compare if two Decls are "the same" from a semantic perspective.

I think we'd need some sort of map for global names to implement this;
two completely unrelated FunctionDecls can refer to the same function
if they're in disjoint scopes.

-Eli

Yes, that makes sense.

Do you guys disagree with the suggestion here ? : http://lists.cs.uiuc.edu/pipermail/cfe-dev/2008-May/001644.html
I was going to implement that one. The result will be that DeclRefExprs that refer to the same function will always refer to the same FunctionDecl as well.

-Argiris

Ted Kremenek wrote:

That sounds fine to me. We will still need a global map when doing things like inter-procedural analysis or analysis across translation units, but having all DeclRefExprs refer to the first FunctionDecl seems like a nice simplification. I agree with Doug's email that clients interested in source information can just walk the chain of Decls.

Mmm... the issue with that exact approach is that we lose information:
once the AST is completely constructed, you can't tell which
declaration a DeclRef refers to. Usually, it doesn't really matter;
however, a rewriting tool might be interested in knowing which
declaration is referred to, and we'd have funny cases where the type
of the DeclRef and the type of the Decl itself are different. Still
might be workable, though.

-Eli

Do you guys disagree with the suggestion here ? :
http://lists.cs.uiuc.edu/pipermail/cfe-dev/2008-May/001644.html
I was going to implement that one. The result will be that DeclRefExprs that
refer to the same function will always refer to the same FunctionDecl as
well.

Mmm... the issue with that exact approach is that we lose information:
once the AST is completely constructed, you can't tell which
declaration a DeclRef refers to.

This makes sense.

Usually, it doesn't really matter;
however, a rewriting tool might be interested in knowing which
declaration is referred to, and we'd have funny cases where the type
of the DeclRef and the type of the Decl itself are different.

Interesting. Can you give an example of this in the case of FunctionDecls?

Still
might be workable, though.

It can be difficult to accurately reconstruct information after an "abstraction leak", but a global map (as you suggested) would provide the same functionality as Argiris's proposed change. And, as I already mentioned, we will need some global name resolution anyway, especially when dealing with multiple translation units. Handling multiple Decls within the same translation unit that refer to the same entity might just be a special case within that more general problem.

int a();
int b(void) {a(1);}
int a(int i) {return i;}
int c(void) {return a(2);}

The resolved "a" in b has type int(), while the one in c has type
int(int). Not usually significant, but could matter; for example,
int(void) is compatible with int(), but not int(int).

-Eli

Okay. This is what I thought you were referring to, but I wanted to make sure. Thanks!

Eli Friedman wrote:

This means that references to a function before a
redeclaration refer to the old declaration, and references to a
function after a redeclaration refer to the new declaration.

How about these cases:

int a(int); // #1
int a(); // #2
int b(void) {a();} // should refer to #2 or is more accurate to refer to #1 ?

And:

int a(int x = 5); // #1
int a(int x); // #2
int b(void) {a();} // Shouldn't it refer to #1 ?

-Argiris

Eli Friedman wrote:

This means that references to a function before a
redeclaration refer to the old declaration, and references to a
function after a redeclaration refer to the new declaration.

How about these cases:

int a(int); // #1
int a(); // #2
int b(void) {a();} // should refer to #2 or is more accurate to refer to #1
?

a refers to #2; but note that per the rules about redeclaration, the
type of #2 is actually int(int). (See C99 6.2.7 p3 and p4.) clang
doesn't implement this bit correctly yet, though.

And:

int a(int x = 5); // #1
int a(int x); // #2
int b(void) {a();} // Shouldn't it refer to #1 ?

a again refers to #2; the only way that really makes sense here is to
have #2 point to the default argument from #1. Take the following
example:

int a(int x, int y = 3);
int a(int x = 5, int y);

Neither version of a has all the arguments; we have to propagate them
forward somehow. This is currently done in
Sema::MergeCXXFunctionDecl.

Note that there is currently a small bug here currently: declarations
can't determine whether they own a default argument, so we end up
leaking them.

-Eli