C++ default arguments patch, rework Parse-Sema interaction for function parameters

The attached patch implements support for C++ default arguments, e.g.,

  void g(int x, int y, int z = 0);

The patch does the following:
  - Allows the definition of default arguments for function parameters
(C++ only, of course)
  - Checks that the default arguments are well-formed w.r.t their
parameter type
  - Handles merging of default arguments when merging function declarations
  - Handles calls to functions that use default arguments
  - Introduces parameter names into the "prototype scope" when they
are parsed, rather than waiting for the function definition.

All but the last are relatively straightforward. Up until now,
ParseFunctionDeclarator had a simple notion of a function prototype
scope, but it was never actually used because parameters were never
put into that scope. For example, when parsing

  void f(int x, int y);

the parameters 'x' and 'y' would never be put into the scope of that
function prototype. Parameters were only introduced into the function
scope at definition time, e.g.,

  void f(int x, int y) { /* x and y introduced into scope for the
definition */ }

For default arguments, we need to put these parameters into prototype
scope as soon as we see them, to get the right name-lookup behavior:

  void f(int x, int y = x); // name lookup of 'x' in default argument
of 'y' should refer to the parameter x.

Among other things, this problem means that some ill-formed code will
not be diagnosed because name lookup is wrong, e.g.,

  int x;
  void f(int x, int y = x); // currently, Clang finds ::x.

Interestingly, without support for default arguments, we can still use
C extensions to get different behavior than we'd expect:

  struct S { } s;
  void f(int s, typeof(s) t);

  void g() { f(1, 2); }

GCC compiles this, because the 's' in typeof(s) refers to the
parameter 's', so the parameter 't' is an int.
Clang currently rejects this, because name lookup of 's' finds the
global struct 's', which isn't compatible with an int.
With this patch, Clang does what GCC does, because we always put
parameters into prototype scope.

Changes to look out for in this patch:
  - Action::ActOnParamDeclaratorType has been replaced with
Action::ActOnParamDeclarator; the latter actually introduces the
parameter into the current scope
  - Action::ActOnParamDefaultArgument handles default arguments; the
C++-specific code for Semi is in the new SemaDeclCXX.cpp
  - The Objective-C Sema code for creating ParmVarDecls has not be
changed to use Sema::ActOnParamDeclarator, because my understanding of
Objective-C is far too weak.

You'll see a few new FIXMEs in this code that I haven't attacked yet.
The most annoying one (for me) is that

  void f(int x, int y = x);

doesn't actually produce an error. We do name lookup of 'x' in the
default argument of 'y' properly, but I don't have a feel for how we
should check that parameter names shall not be used in the default
argument. It could be done "quickly" by adding a flag in the parser
(DontAllowParamIdentifiers) or could be separated from the main flow
by passing over the AST for the default arguments within
Sema::ActOnParamDefaultArgument.

I ran "make test" within the Clang directory, and there were no
regressions on x86 Linux. I'm not sure what else I'm supposed to test
before submitting patches.

Comments welcome :slight_smile:

  - Doug

clang-default-arg-try2.patch (48.1 KB)

The attached patch implements support for C++ default arguments, e.g.,

void g(int x, int y, int z = 0);

Very nice!

The patch does the following:
- Allows the definition of default arguments for function parameters
(C++ only, of course)
- Checks that the default arguments are well-formed w.r.t their
parameter type
- Handles merging of default arguments when merging function declarations
- Handles calls to functions that use default arguments

Great!

- Introduces parameter names into the "prototype scope" when they
are parsed, rather than waiting for the function definition.

All but the last are relatively straightforward. Up until now,
ParseFunctionDeclarator had a simple notion of a function prototype
scope, but it was never actually used because parameters were never
put into that scope.

Funny you mention this, I was just working on fixing that for http://llvm.org/PR2046 . C89+ requires this to handle cases like:

int aa(int b, int x[sizeof b]) {}

and C99 allows:

int bb(int sz, int ar[sz][sz])..

Your patch is exactly what I was going to do :slight_smile:

For example, when parsing

void f(int x, int y);

the parameters 'x' and 'y' would never be put into the scope of that
function prototype. Parameters were only introduced into the function
scope at definition time, e.g.,

void f(int x, int y) { /* x and y introduced into scope for the
definition */ }

Right.

Interestingly, without support for default arguments, we can still use
C extensions to get different behavior than we'd expect:

struct S { } s;
void f(int s, typeof(s) t);

void g() { f(1, 2); }

GCC compiles this, because the 's' in typeof(s) refers to the
parameter 's', so the parameter 't' is an int.
Clang currently rejects this, because name lookup of 's' finds the
global struct 's', which isn't compatible with an int.
With this patch, Clang does what GCC does, because we always put
parameters into prototype scope.

Thanks, it sounds like you fixed the bug for me :slight_smile:

Changes to look out for in this patch:
- Action::ActOnParamDeclaratorType has been replaced with
Action::ActOnParamDeclarator; the latter actually introduces the
parameter into the current scope
- Action::ActOnParamDefaultArgument handles default arguments; the
C++-specific code for Semi is in the new SemaDeclCXX.cpp

Ok.

- The Objective-C Sema code for creating ParmVarDecls has not be
changed to use Sema::ActOnParamDeclarator, because my understanding of
Objective-C is far too weak.

No problem. Others can handle that as soon as the patch goes in.

Some thoughts on the patch:

Now that the decls are being added to scopes, repeated parameter names should be found automatically by the normal decl mechanism. This means you can completely remove 'ParamsSoFar' and the #ifdef'd code from Parser::ParseFunctionDeclarator.

Also in ParseFunctionDeclarator, I think this:

+ DeclTy *Param = Actions.ActOnParamDeclarator(CurScope, ParmDecl);

> All but the last are relatively straightforward. Up until now,
> ParseFunctionDeclarator had a simple notion of a function prototype
> scope, but it was never actually used because parameters were never
> put into that scope.
>

Funny you mention this, I was just working on fixing that for
http://llvm.org/PR2046 . C89+ requires this to handle cases like:

[snip]

> Interestingly, without support for default arguments, we can still use
> C extensions to get different behavior than we'd expect:
>
> struct S { } s;
> void f(int s, typeof(s) t);
>
> void g() { f(1, 2); }
>
> GCC compiles this, because the 's' in typeof(s) refers to the
> parameter 's', so the parameter 't' is an int.
> Clang currently rejects this, because name lookup of 's' finds the
> global struct 's', which isn't compatible with an int.
> With this patch, Clang does what GCC does, because we always put
> parameters into prototype scope.
>

Thanks, it sounds like you fixed the bug for me :slight_smile:

I just checked the test cases in 2046 and 2195; both work with this
patch, and I've added them as test cases.

Some thoughts on the patch:

Now that the decls are being added to scopes, repeated parameter names
should be found automatically by the normal decl mechanism. This means you
can completely remove 'ParamsSoFar' and the #ifdef'd code from
Parser::ParseFunctionDeclarator.

Oops, I meant to kill that code outright.

Also in ParseFunctionDeclarator, I think this:

+ DeclTy *Param = Actions.ActOnParamDeclarator(CurScope, ParmDecl);
+
+ // Parse the default argument, if any.
+ if (Tok.is(tok::equal)) {

When I first saw this, I thought it should be: if (Tok.is(tok::equal) &&
getLang().CPlusPlus). Please add a comment that says that Sema handles
rejecting default arguments in C.

Okay.

I'm not sure that the handling of 'implicit' typespecs is really the way to
go for ObjC 'self' and C++ 'this'. [To be clear before I go into it, this
isn't your fault: this is a pre-existing problem in the code.] The issue is
that the code currently models 'self' as a ParamDecl, which is reflects its
common implementation, but doesn't accurately reflect its place in the
language.

I think it's accurate to say that they are implicit parameters, even
in the language sense. I could imagine having an ImplicitParmVarDecl
that inherits ParmVarDecl, if that distinction is important enough to
warrant a new class. An "isImplicit" flag might be just as good.

I think it would be better for implicitly defined names like self/this to
be a separate kind of decl (ImplicitDecl?)

which would allow us to get rid
of the tie in to 'ActOnParamDeclarator', and eliminate the need for
self/this to be involved at all with the parser-level DeclSpec object.

Looking back at this, my approach here was a bit lame. There are other
ways to write CreateImplicitParameter without calling
ActOnParamDeclarator directly.

Doing this has been on my todo list for a long time, but there was never a
forcing function. :slight_smile: I'd prefer to not add 'TST_implicit' though. Would
you like me to do this change, or would you like to finish this patch and I
can do it later?

How about this: in the revised patch (attached), I've eliminated
TST_implicit by implementing CreateImplicitParameter directly. I think
this is a cleaner solution. If you still think you need an
ImplicitDecl/ImplicitParmVarDecl, I'll leave that up to you.

[snipped stuff below means I agree and have fixed the issue]

+++ lib/Sema/SemaExpr.cpp (working copy)

Your changes to ActOnCallExpr seem quite reasonable, except that they pass
in default arguments from the decl into the call when needed. This violates
the ownership property of the ASTs (that a call, f.e., owns its children
pointers). Instead of doing this, I think it would make sense to update the
comments above the CallExpr AST node in Expr.h to say that calls can have
fewer arguments than the callee if it has default arguments. In addition to
solving ownership issues, this makes the AST more accurately reflect the
code the user wrote. Does this seem reasonable?

The problem is that this pushes the handling of default arguments into
every consumer of the CallExpr AST, and default arguments won't always
be as simple as looking at the corresponding parameter in the function
declaration. For example, default arguments in function templates
aren't instantiated until they are actually used, so in code like
this:

  template<typename T>
  void f(T x, T y = T()) { }

  void g(int x) { f(x); }

If we just store the 'x' in the CallExpr to f, consumers of that
CallExpr will have to look at 'f' to determine that it's a function
template specialization, then go back to the function template it was
instantiated from to re-instantiate the default argument. That's going
to cost compile time (instantiation will never be cheap), but I'm more
concerned about the fact that anyone looking at call arguments will
need to know about templates.

Is there, perhaps, a way to clone an expression node, so that the
CallExpr could own the clone? The clone could have its source location
set to the end of the CallExpr, which could be used as a cheap way to
determine that the default argument was used. (e.g.,
CallExpr::IsDefaultArg(param_index)), so we would have all of the
source information we need.

@@ -768,7 +817,39 @@ Sema::ActOnDeclarator(Scope *S, Declarat

+ // When we have a prototype for a named function, create the
+ // parameter declarations for NewFD. C++ needs these early, to
+ // deal with merging default arguments properly.

I'm not sure what this is doing. It seems that it could be factored better
somehow, but I'm not exactly sure what is going on. Can you give an example
of when this is needed?

I've updated the comment to this:

    // Copy the parameter declarations from the declarator D to
    // the function declaration NewFD, if they are available.

Basically, the parameter declarations have all been created for the
function declarator (D)... the parser has called ActOnParamDeclarator
for each parameter, and put them into the function declarator. This
code is copying those parameter declarations into the FunctionDecl for
the function. We used to create the parameter declarations at the same
time we filled in the FunctionDecl (at definition time), but now we do
it as early as possible.

> You'll see a few new FIXMEs in this code that I haven't attacked yet.
>

As a general comment, I tend to prefer patches to be broken down into
distinct unrelated pieces to make development more incremental. For
example, it should be possible to split the 'make decls for params' part of
this patch out and get it committed separately. Making incremental patches
makes them easier to review and discuss. There is no need to do it for this
patch though, it looks like it is close to ready to going in. However, I'd
suggest working getting what you have in first, and finishing these details
for the next pass :slight_smile:

Oh, certainly. You only got the whole enchilada in one shot because I
didn't think of using sizeof() to make the parameter-scoping thing
matter in straight C.

> The most annoying one (for me) is that
>
> void f(int x, int y = x);
>
> doesn't actually produce an error. We do name lookup of 'x' in the
> default argument of 'y' properly, but I don't have a feel for how we
> should check that parameter names shall not be used in the default
> argument. It could be done "quickly" by adding a flag in the parser
> (DontAllowParamIdentifiers) or could be separated from the main flow
> by passing over the AST for the default arguments within
> Sema::ActOnParamDefaultArgument.
>

I'm not sure the right way to go here. I strongly prefer to avoid a global
'DontAllowParamIdentifiers' flag. Would it be sufficient to do a quick
recursive walk of the initializer expression looking for ParamDecls?

Yes, we can do a recursive walk. Default arguments are often small, so
the cost shouldn't be prohibitive. That'll be a separate patch.

  - Doug

clang-default-arg-try3.patch (49.1 KB)

Watch out here: there are other similar restrictions. For example,
the following is illegal:
void f()
{
    int i;
    extern void g(int x = sizeof(i));
}

Probably better to deal with this in Sema; any other behavior would
involve completely butchering scopes.

+ // Check for C99 6.7.5.3p10 - foo(void) is a non-varargs
+ // function that takes no arguments, not a function that takes a
+ // single void argument. FIXME: Is this really the right place
+ // to check for this? C++ says that the parameter list (void) is
+ // the same as an empty parameter list, whereas the parameter
+ // list (T) (with T typedef'd to void) is not. For C++, this
+ // should be handled in the parser. Check C89 and C99 standards
+ // to see what the correct behavior is.

AFAIK, clang currently handles C89/99 correctly for this case. Not
sure how this change affects that, though.

-Eli

+// RUN: clang -fsyntax-only -verify %s
+void f(int i);
+void f(int i = 0); // expected-error {{previous definition is here}}
+void f(int i = 17); // expected-error {{redefinition of default argument}}

Nit: those are really declarations not definitions. The problem may
have been there already.

Neil.

Thanks, it sounds like you fixed the bug for me :slight_smile:

I just checked the test cases in 2046 and 2195; both work with this
patch, and I’ve added them as test cases.

Nice.

I’m not sure that the handling of ‘implicit’ typespecs is really the way to

go for ObjC ‘self’ and C++ ‘this’. [To be clear before I go into it, this

isn’t your fault: this is a pre-existing problem in the code.] The issue is

that the code currently models ‘self’ as a ParamDecl, which is reflects its

common implementation, but doesn’t accurately reflect its place in the

language.

I think it’s accurate to say that they are implicit parameters, even
in the language sense. I could imagine having an ImplicitParmVarDecl
that inherits ParmVarDecl, if that distinction is important enough to
warrant a new class. An “isImplicit” flag might be just as good.

Ok.

I think it would be better for implicitly defined names like self/this to

be a separate kind of decl (ImplicitDecl?)

which would allow us to get rid

of the tie in to ‘ActOnParamDeclarator’, and eliminate the need for

self/this to be involved at all with the parser-level DeclSpec object.

Looking back at this, my approach here was a bit lame. There are other
ways to write CreateImplicitParameter without calling
ActOnParamDeclarator directly.

Right. I’m most concerned with not making the DeclSpec object more complex just to handle “self” and friends.

Doing this has been on my todo list for a long time, but there was never a

forcing function. :slight_smile: I’d prefer to not add ‘TST_implicit’ though. Would

you like me to do this change, or would you like to finish this patch and I

can do it later?

How about this: in the revised patch (attached), I’ve eliminated
TST_implicit by implementing CreateImplicitParameter directly. I think
this is a cleaner solution. If you still think you need an
ImplicitDecl/ImplicitParmVarDecl, I’ll leave that up to you.

Sounds good.

+++ lib/Sema/SemaExpr.cpp (working copy)

Your changes to ActOnCallExpr seem quite reasonable, except that they pass

in default arguments from the decl into the call when needed. This violates

the ownership property of the ASTs (that a call, f.e., owns its children

pointers). Instead of doing this, I think it would make sense to update the

comments above the CallExpr AST node in Expr.h to say that calls can have

fewer arguments than the callee if it has default arguments. In addition to

solving ownership issues, this makes the AST more accurately reflect the

code the user wrote. Does this seem reasonable?

The problem is that this pushes the handling of default arguments into
every consumer of the CallExpr AST, and default arguments won’t always
be as simple as looking at the corresponding parameter in the function
declaration. For example, default arguments in function templates
aren’t instantiated until they are actually used, so in code like
this:

template
void f(T x, T y = T()) { }

void g(int x) { f(x); }

If we just store the ‘x’ in the CallExpr to f, consumers of that
CallExpr will have to look at ‘f’ to determine that it’s a function
template specialization, then go back to the function template it was
instantiated from to re-instantiate the default argument. That’s going
to cost compile time (instantiation will never be cheap), but I’m more
concerned about the fact that anyone looking at call arguments will
need to know about templates.

Ok, you’re right. This does get ugly fast :slight_smile:

Is there, perhaps, a way to clone an expression node, so that the
CallExpr could own the clone? The clone could have its source location
set to the end of the CallExpr, which could be used as a cheap way to
determine that the default argument was used. (e.g.,
CallExpr::IsDefaultArg(param_index)), so we would have all of the
source information we need.

We could do this, but here are two alternative approaches that I like better :).

  1. we could introduce a new DefaultArgExpr node, which maintains a pointer to a FunctionDecl or ParamDecl (or whatever else is needed). In this case, we’d have an AST for “f(x, DefaultArgExpr(f))”. For a client that wants to recursively traverse the AST (e.g. to do code generation) it could just skip into the default expr for the nested decl directly with no other context.

  2. we could introduce a second sort of DefaultArgExpr node, which is a noop node whose single child is a clone of the default expr. This allows us to capture in the AST the fact that the argument wasn’t provided, but clients can just ignore it.

Of the two, I like #1 the best: it is very explicit, and it doesn’t require cloning around ASTs unnecessarily. The down side about it is that clients have to handle one more node type, but I don’t think this is a killer. Being an explicit node like this makes it easier to handle, and we keep the invariant that the number of arguments to a call is >= the of formal arguments for the callee. What do you think?

@@ -768,7 +817,39 @@ Sema::ActOnDeclarator(Scope *S, Declarat

  • // When we have a prototype for a named function, create the
  • // parameter declarations for NewFD. C++ needs these early, to
  • // deal with merging default arguments properly.

I’m not sure what this is doing. It seems that it could be factored better

somehow, but I’m not exactly sure what is going on. Can you give an example

of when this is needed?

I’ve updated the comment to this:

// Copy the parameter declarations from the declarator D to
// the function declaration NewFD, if they are available.

Basically, the parameter declarations have all been created for the
function declarator (D)… the parser has called ActOnParamDeclarator
for each parameter, and put them into the function declarator. This
code is copying those parameter declarations into the FunctionDecl for
the function. We used to create the parameter declarations at the same
time we filled in the FunctionDecl (at definition time), but now we do
it as early as possible.

Ok, I guess my basic objection is the duplication of this logic:

  • if (FTI.NumArgs == 1 && !FTI.isVariadic && FTI.ArgInfo[0].Ident == 0 &&
  • FTI.ArgInfo[0].Param &&
  • !((ParmVarDecl*)FTI.ArgInfo[0].Param)->getType().getCVRQualifiers() &&
  • ((ParmVarDecl*)FTI.ArgInfo[0].Param)->getType()->isVoidType()) {

Can this be factored out into a predicate method and shared with the other code that does it?

I’m not sure the right way to go here. I strongly prefer to avoid a global

‘DontAllowParamIdentifiers’ flag. Would it be sufficient to do a quick

recursive walk of the initializer expression looking for ParamDecls?

Yes, we can do a recursive walk. Default arguments are often small, so
the cost shouldn’t be prohibitive. That’ll be a separate patch.

Sounds great! Thanks again Doug,

-Chris

We could do this, but here are two alternative approaches that I like better
:).

1) we could introduce a new DefaultArgExpr node, which maintains a pointer
to a FunctionDecl or ParamDecl (or whatever else is needed). In this case,
we'd have an AST for "f(x, DefaultArgExpr(f<int>))". For a client that
wants to recursively traverse the AST (e.g. to do code generation) it could
just skip into the default expr for the nested decl directly with no other
context.

[snip #2]

Of the two, I like #1 the best: it is very explicit, and it doesn't require
cloning around ASTs unnecessarily. The down side about it is that clients
have to handle one more node type, but I don't think this is a killer.
Being an explicit node like this makes it easier to handle, and we keep the
invariant that the number of arguments to a call is >= the of formal
arguments for the callee. What do you think?

That's reasonable. The updated patch contains a new node
CXXDefaultArgExpr; most of the changes in this patch just deal with
that new node in the statement printer, serialization, code generator,
etc.
The code generator bits were no fun at all, because I ended up having
to duplicate the handling of CXXDefautArgExpr for CGExprComplex,
CGExprConstant, CGExprScalar, and CGExprAgg. I hope there's a better
way to do this, but I didn't dig deep into the CodeGen module before I
convinced myself there wasn't. CXXDefaulArgExpr is a *great* candidate
for a pre-CodeGen lowering phase :slight_smile:

There's a new test for the CodeGen bits, but I don't know how to
automate the checks we need. If you build it with
-DCLANG_GENERATE_KNOWN_GOOD, it writes out the default arguments
explicitly at the call site; otherwise, it lets the front end fill in
the default arguments. Ideally, we would build this test both ways and
"diff" the byte-code produced from each (that's what I did to verify
the correctness of the CodeGen additions), but I didn't implement that
in the test harness.

[Aside: StmtPrinter did me a great favor by failing to link when I
forgot to add in the CXXDefaultArgExpr case. The serialization code
and CodeGen modules, and perhaps even the isConstantExpr/isLvalue/etc
routines, could all benefit from this kind of built-in safety.]

@@ -768,7 +817,39 @@ Sema::ActOnDeclarator(Scope *S, Declarat

+ // When we have a prototype for a named function, create the
+ // parameter declarations for NewFD. C++ needs these early, to
+ // deal with merging default arguments properly.

I'm not sure what this is doing. It seems that it could be factored better
somehow, but I'm not exactly sure what is going on. Can you give an example
of when this is needed?

I've updated the comment to this:

    // Copy the parameter declarations from the declarator D to
    // the function declaration NewFD, if they are available.

Basically, the parameter declarations have all been created for the
function declarator (D)... the parser has called ActOnParamDeclarator
for each parameter, and put them into the function declarator. This
code is copying those parameter declarations into the FunctionDecl for
the function. We used to create the parameter declarations at the same
time we filled in the FunctionDecl (at definition time), but now we do
it as early as possible.

Ok, I guess my basic objection is the duplication of this logic:

+ if (FTI.NumArgs == 1 && !FTI.isVariadic && FTI.ArgInfo[0].Ident == 0
&&
+ FTI.ArgInfo[0].Param &&
+
!((ParmVarDecl*)FTI.ArgInfo[0].Param)->getType().getCVRQualifiers() &&
+ ((ParmVarDecl*)FTI.ArgInfo[0].Param)->getType()->isVoidType()) {

Can this be factored out into a predicate method and shared with the other
code that does it?

At the moment, this is the only place that does this check. It used to
be in ActOnStartOfFunctionDef, but I've moved it here. Granted, I
don't like that we do this test in Sema... I would much rather handle
(void) parameter lists in the parser, and just put a flag on the decl
to say whether "void" was given to denote an empty parameter list (or,
to be very picky, we could have an EmptyParamListType node that has
the actual "type" the user wrote, if any). Handling it in the parser
makes especially good sense for C++ (and perhaps C89? I don't have the
C standards handy) where the parameter list has to be spelled
(void)... it can't be (T) where T is a typedef of void as I think it
can in C99.

  - Doug

clang-default-arg-try4.patch (59.9 KB)

No, it's correct. We talking about redefinition of the default
argument, not of the function or of the parameter; the C++ standard
refers to these as definitions of default arguments, and says
explicitly that default arguments shall not be "redefined".

  - Doug

Of the two, I like #1 the best: it is very explicit, and it doesn't require
cloning around ASTs unnecessarily. The down side about it is that clients
have to handle one more node type, but I don't think this is a killer.
Being an explicit node like this makes it easier to handle, and we keep the
invariant that the number of arguments to a call is >= the of formal
arguments for the callee. What do you think?

That's reasonable. The updated patch contains a new node
CXXDefaultArgExpr; most of the changes in this patch just deal with
that new node in the statement printer, serialization, code generator,
etc.

Looks great, I like this approach. Instead of dwelling on details and forcing you to keep the patch up to date, I committed it here (after resolving a few conflicts that came up in the short time since you sent it out!):
http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20080407/005079.html

Some more nit-picky stuff is included below.

The code generator bits were no fun at all, because I ended up having
to duplicate the handling of CXXDefautArgExpr for CGExprComplex,
CGExprConstant, CGExprScalar, and CGExprAgg. I hope there's a better
way to do this, but I didn't dig deep into the CodeGen module before I
convinced myself there wasn't. CXXDefaulArgExpr is a *great* candidate
for a pre-CodeGen lowering phase :slight_smile:

:slight_smile: The funny thing about it is that the code being lowered isn't *exactly* the same (the methods return different things). In llvm-gcc, we actually only have two code paths (one for scalars and one for aggregates) instead of three... and they are actually interlaced together and active in a context sensitive way. Not having a codepath specialized for _Complex means that we get substantially inferior code in some cases, and having them interlaced together makes the code much more complex and difficult to understand.

I won't claim with certainty that the approach taken by the clang codegen is the one will end up with, but it's a step forward. A bigger issue is that a lot of it has to be duplicated (same in llvm-gcc) for constant initializers, which is annoying, but doesn't affect default arguments at least. :slight_smile:

There's a new test for the CodeGen bits, but I don't know how to
automate the checks we need. If you build it with
-DCLANG_GENERATE_KNOWN_GOOD, it writes out the default arguments
explicitly at the call site; otherwise, it lets the front end fill in
the default arguments. Ideally, we would build this test both ways and
"diff" the byte-code produced from each (that's what I did to verify
the correctness of the CodeGen additions), but I didn't implement that
in the test harness.

The codegen tests currently mostly test that the compiler doesn't crash on some language feature. I think this is fairly reasonable (and follows what we do with llvm-gcc). More comprehensive tests will include the whole programs from the llvm-test suite, and ultimately building llvm itself with clang ;-).

[Aside: StmtPrinter did me a great favor by failing to link when I
forgot to add in the CXXDefaultArgExpr case. The serialization code
and CodeGen modules, and perhaps even the isConstantExpr/isLvalue/etc
routines, could all benefit from this kind of built-in safety.]

This is a great idea. Some of the code could probably be auto-generated as well. I'll add it to my (huge) todo list.

@@ -768,7 +817,39 @@ Sema::ActOnDeclarator(Scope *S, Declarat

+ // When we have a prototype for a named function, create the
+ // parameter declarations for NewFD. C++ needs these early, to
+ // deal with merging default arguments properly.

I'm not sure what this is doing. It seems that it could be factored better
somehow, but I'm not exactly sure what is going on. Can you give an example
of when this is needed?

I've updated the comment to this:

   // Copy the parameter declarations from the declarator D to
   // the function declaration NewFD, if they are available.

Basically, the parameter declarations have all been created for the
function declarator (D)... the parser has called ActOnParamDeclarator
for each parameter, and put them into the function declarator. This
code is copying those parameter declarations into the FunctionDecl for
the function. We used to create the parameter declarations at the same
time we filled in the FunctionDecl (at definition time), but now we do
it as early as possible.

Ok!

Ok, I guess my basic objection is the duplication of this logic:

+ if (FTI.NumArgs == 1 && !FTI.isVariadic && FTI.ArgInfo[0].Ident == 0
&&
+ FTI.ArgInfo[0].Param &&
+
!((ParmVarDecl*)FTI.ArgInfo[0].Param)->getType().getCVRQualifiers() &&
+ ((ParmVarDecl*)FTI.ArgInfo[0].Param)->getType()->isVoidType()) {

Can this be factored out into a predicate method and shared with the other
code that does it?

At the moment, this is the only place that does this check.

Ah ok! Please disregard my objection, the code looked familiar so I was sure it was copied from somewhere. I didn't realize you also *deleted* the original copy.

It used to
be in ActOnStartOfFunctionDef, but I've moved it here. Granted, I
don't like that we do this test in Sema... I would much rather handle
(void) parameter lists in the parser, and just put a flag on the decl
to say whether "void" was given to denote an empty parameter list (or,
to be very picky, we could have an EmptyParamListType node that has
the actual "type" the user wrote, if any). Handling it in the parser
makes especially good sense for C++ (and perhaps C89? I don't have the
C standards handy) where the parameter list has to be spelled
(void)... it can't be (T) where T is a typedef of void as I think it
can in C99.

I'm not sure about that. clang tracks typedefs (and lack thereof) well enough that Sema can decide whether it was spelled "(void)" or "(T)". Also, we do have to handle the "(T)" case for C99. It seems to me that handling this in the parser would require handling it in both the parser and Sema.

Here are some of the increasingly minor nit picky details from your patch:

+++ include/clang/AST/ExprCXX.h (working copy)

I changed the comment above the CXXDefaultArgExpr to use ///'s so that doxygen picks it up.

+ // Retrieve the parameter that the argument was created from
+ const ParmVarDecl *getParam() const { return Param; }

Please terminate sentence comments with a '.'. Yes it's pedantic, but it makes the comments more consistent and read more nicely :). This occurs in many places in your patch.

+ virtual SourceRange getSourceRange() const {
+ return Param->getDefaultArg()->getSourceRange();
+ }

I don't think that this is right. The source range of the CXXDefaultArgExpr should be empty, not the source range of the input. The default argument doesn't occur in the source, so it shouldn't have a location.

+++ lib/AST/ExprCXX.cpp (working copy)

+// CXXDefaultArgExpr
+Stmt::child_iterator CXXDefaultArgExpr::child_begin() {
+ return reinterpret_cast<Stmt**>(Param->getDefaultArg());
+}

This is subtly wrong. The child iteration stuff should walk the expression tree in the current function, it shouldn't "jump the gap here". I suspect that -ast-dump will have problems with CXXDefaultArgExpr's due to this. I'd just have CXXDefaultArgExpr just return an empty child range.

  +void
+Sema::ActOnParamDefaultArgument(DeclTy *param, SourceLocation EqualLoc,
+ ExprTy *defarg) {
...
+ // Some default arguments were missing. Clear out all of the
+ // default arguments up to (and including) the last missing
+ // default argument, so that we leave the function parameters
+ // in a semantically valid state.
+ for (p = 0; p <= LastMissingDefaultArg; ++p) {
+ ParmVarDecl *Param = FD->getParamDecl(p);
+ if (Param->getDefaultArg()) {
+ delete Param->getDefaultArg();
+ Param->setDefaultArg(0);
+ }
+ }

Should these ParmVarDecl's be marked as erroneous? This might be a good idea if there are secondary errors that can occur because they are missing default initializers. I can't think of any specific examples off the top of my head though, your call.

+++ lib/Sema/SemaExpr.cpp (working copy)

    // Make the call expr early, before semantic checks. This guarantees cleanup
    // of arguments and function on error.
- llvm::OwningPtr<CallExpr> TheCall(new CallExpr(Fn, Args, NumArgs,
+ if (getLangOptions().CPlusPlus && FDecl && NumArgs < FDecl- >getNumParams())
+ NumArgsPassed = FDecl->getNumParams();
+ llvm::OwningPtr<CallExpr> TheCall(new CallExpr(Fn, Args, NumArgsPassed,
                                                   Context.BoolTy, RParenLoc));

I don't think this is safe. The CallExpr ctor reads the range [Args,Args+NumArgsPassed), so changing NumArgsPassed in this way makes it potentially read beyond the end of the args array.

+ // If too few arguments are available (and we don't have default
+ // arguments for the remaining parameters), don't make the call.
+ if (NumArgs < NumArgsInProto) {
+ if (getLangOptions().CPlusPlus &&
+ FDecl &&
+ FDecl->getParamDecl(NumArgs)->getDefaultArg()) {
+ // Use default arguments for missing arguments
+ NumArgsToCheck = NumArgsInProto;
+ } else
+ return Diag(RParenLoc, diag::err_typecheck_call_too_few_args,
+ Fn->getSourceRange());
+ }

This logic is reasonable, but what do you think about adding a new method to FunctionDecl, something like "unsigned getMinRequiredArguments() const" or something like that, which would scan the argument list for default arguments. I think this is a generally useful method, and could simplify this down to:

if (NumArgs < NumArgsInProto) {
   if (FDecl && NumArgs >= FDecl->getMinRequiredArguments())
      ...

which would be simpler :). It could also be used to simplify the logic above.

+++ lib/Sema/SemaType.cpp (working copy)
@@ -394,7 +394,7 @@ QualType Sema::GetTypeForDeclarator(Decl
          llvm::SmallVector<QualType, 16> ArgTys;

          for (unsigned i = 0, e = FTI.NumArgs; i != e; ++i) {
- QualType ArgTy = QualType::getFromOpaquePtr(FTI.ArgInfo[i].TypeInfo);
+ QualType ArgTy = ((ParmVarDecl *)FTI.ArgInfo[i].Param)- >getType();

Please do the cast into a temporary 'ParmVarDecl*' at the top of the loop instead of doing the cast three times in the loop.

+++ lib/Sema/SemaDeclObjC.cpp (working copy)

  + // Introduce all of the othe parameters into this scope
    for (unsigned i = 0, e = MDecl->getNumParams(); i != e; ++i) {

typo "othe", should end with a period.

+++ lib/AST/Expr.cpp (working copy)

@@ -1009,6 +1021,9 @@ bool Expr::isNullPointerConstant(ASTCont
      // Accept ((void*)0) as a null pointer constant, as many other
      // implementations do.
      return PE->getSubExpr()->isNullPointerConstant(Ctx);
+ } else if (const CXXDefaultArgExpr *DefaultArg = dyn_cast<CXXDefaultArgExpr>(this)) {
+ // See through default argument expressions
+ return DefaultArg->getExpr()->isNullPointerConstant(Ctx);
    }

80 columns.

+++ lib/Parse/ParseDecl.cpp (working copy)
@@ -1232,8 +1232,10 @@ void Parser::ParseParenDeclarator(Declar

  /// declaration-specifiers declarator
+/// [C++] declaration-specifiers declarator '=' assignment-expression
  /// [GNU] declaration-specifiers declarator attributes
  /// declaration-specifiers abstract-declarator[opt]
+/// [C++] declaration-specifiers abstract-declarator[opt] '=' assignment-expression
  /// [GNU] declaration-specifiers abstract-declarator[opt] attributes

80 columns.

Thanks again Doug, this is great work!

-Chris

> That's reasonable. The updated patch contains a new node
> CXXDefaultArgExpr; most of the changes in this patch just deal with
> that new node in the statement printer, serialization, code generator,
> etc.
>

Looks great, I like this approach. Instead of dwelling on details and
forcing you to keep the patch up to date, I committed it here (after
resolving a few conflicts that came up in the short time since you sent it
out!):

http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20080407/005079.html

Thanks! I'm following this up with another patch (attached) that fixes
the remaining nits below. It also implements proper checking for
ill-formed code like

  void f(int x, int y = x);

and, when in non-C99 mode, warns about this as an extension:

  typedef void T;
  void f(T);

I'm not sure about that. clang tracks typedefs (and lack thereof) well
enough that Sema can decide whether it was spelled "(void)" or "(T)". Also,
we do have to handle the "(T)" case for C99. It seems to me that handling
this in the parser would require handling it in both the parser and Sema.

Okay. Implemented in Sema, now.

+ virtual SourceRange getSourceRange() const {
+ return Param->getDefaultArg()->getSourceRange();
+ }

I don't think that this is right. The source range of the
CXXDefaultArgExpr should be empty, not the source range of the input. The
default argument doesn't occur in the source, so it shouldn't have a
location.

Okay.

+++ lib/AST/ExprCXX.cpp (working copy)

+// CXXDefaultArgExpr
+Stmt::child_iterator CXXDefaultArgExpr::child_begin() {
+ return reinterpret_cast<Stmt**>(Param->getDefaultArg());
+}

This is subtly wrong. The child iteration stuff should walk the expression
tree in the current function, it shouldn't "jump the gap here". I suspect
that -ast-dump will have problems with CXXDefaultArgExpr's due to this. I'd
just have CXXDefaultArgExpr just return an empty child range.

Okay, understood.

  +void
+Sema::ActOnParamDefaultArgument(DeclTy *param, SourceLocation EqualLoc,
+ ExprTy *defarg) {
...
+ // Some default arguments were missing. Clear out all of the
+ // default arguments up to (and including) the last missing
+ // default argument, so that we leave the function parameters
+ // in a semantically valid state.
+ for (p = 0; p <= LastMissingDefaultArg; ++p) {
+ ParmVarDecl *Param = FD->getParamDecl(p);
+ if (Param->getDefaultArg()) {
+ delete Param->getDefaultArg();
+ Param->setDefaultArg(0);
+ }
+ }

Should these ParmVarDecl's be marked as erroneous? This might be a good
idea if there are secondary errors that can occur because they are missing
default initializers. I can't think of any specific examples off the top of
my head though, your call.

Secondary errors could occur as the result of a function call that
relies on those default arguments, but there isn't much we can do
about that. I say we just leave the parameters alone: other than not
having default arguments any more, there isn't anything wrong with
them. (Most importantly: the AST is still in a perfectly consistent
state after removing default arguments)

+++ lib/Sema/SemaExpr.cpp (working copy)

   // Make the call expr early, before semantic checks. This guarantees
cleanup
   // of arguments and function on error.
- llvm::OwningPtr<CallExpr> TheCall(new CallExpr(Fn, Args, NumArgs,
+ if (getLangOptions().CPlusPlus && FDecl && NumArgs <
FDecl->getNumParams())
+ NumArgsPassed = FDecl->getNumParams();
+ llvm::OwningPtr<CallExpr> TheCall(new CallExpr(Fn, Args, NumArgsPassed,
                                                  Context.BoolTy,
RParenLoc));

I don't think this is safe. The CallExpr ctor reads the range
[Args,Args+NumArgsPassed), so changing NumArgsPassed in this way makes it
potentially read beyond the end of the args array.

/me smacks forehead, fixes bug

+ // If too few arguments are available (and we don't have default
+ // arguments for the remaining parameters), don't make the call.
+ if (NumArgs < NumArgsInProto) {
+ if (getLangOptions().CPlusPlus &&
+ FDecl &&
+ FDecl->getParamDecl(NumArgs)->getDefaultArg()) {
+ // Use default arguments for missing arguments
+ NumArgsToCheck = NumArgsInProto;
+ } else
+ return Diag(RParenLoc, diag::err_typecheck_call_too_few_args,
+ Fn->getSourceRange());
+ }

This logic is reasonable, but what do you think about adding a new method
to FunctionDecl, something like "unsigned getMinRequiredArguments() const"
or something like that, which would scan the argument list for default
arguments. I think this is a generally useful method, and could simplify
this down to:

if (NumArgs < NumArgsInProto) {
  if (FDecl && NumArgs >= FDecl->getMinRequiredArguments())
     ...

which would be simpler :). It could also be used to simplify the logic
above.

Sure, although I'm going to have to point out that we're replacing a
constant-time check (is a particular pointer non-NULL) with a loop
that spins through the parameter list :slight_smile:

  - Doug

clang-default-arg-fixes.patch (15.8 KB)

Doug Gregor wrote:-

and, when in non-C99 mode, warns about this as an extension:

  typedef void T;
  void f(T);

Nothing special about C99 there, it must be accepted in C90 too.

Neil.

You're certain? EDG diagnoses this code with an extension warning in
it's C90 mode, and I can't find a copy of the C90 specification to
verify (ISO doesn't sell it anymore).

  - Doug

Doug Gregor wrote:-

2008/4/9 Neil Booth <neil@daikokuya.co.uk>:

Doug Gregor wrote:-

Doug Gregor wrote:-

and, when in non-C99 mode, warns about this as an extension:

typedef void T;
void f(T);

Nothing special about C99 there, it must be accepted in C90 too.

You’re certain? EDG diagnoses this code with an extension warning in
it’s C90 mode, and I can’t find a copy of the C90 specification to
verify (ISO doesn’t sell it anymore).

Yeah, it’s DR 157 to C90. EDG make several mistakes too :slight_smile:

Neil.

Then the online Comeau demonstration does the same mistake. Or it really is in the C90 standard (I’ll admit not having a copy too).

Comeau C/C++ 4.3.9 (Mar 27 2007 17:24:47) for ONLINE_EVALUATION_BETA1
Copyright 1988-2007 Comeau Computing. All rights reserved.
MODE:strict errors C90noC++0x_extensions

“ComeauTest.c”, line 2: error: declaring a void parameter list with a typedef is
nonstandard
void f(T);
^

1 error detected in the compilation of “ComeauTest.c”.

Note that this passes in C99 mode.

http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20080407/005079.html

Thanks! I'm following this up with another patch (attached) that fixes
the remaining nits below. It also implements proper checking for
ill-formed code like

void f(int x, int y = x);

Nice.

and, when in non-C99 mode, warns about this as an extension:

typedef void T;
void f(T);

I tweaked this for c89.

Should these ParmVarDecl's be marked as erroneous? This might be a good
idea if there are secondary errors that can occur because they are missing
default initializers. I can't think of any specific examples off the top of
my head though, your call.

Secondary errors could occur as the result of a function call that
relies on those default arguments, but there isn't much we can do
about that. I say we just leave the parameters alone: other than not
having default arguments any more, there isn't anything wrong with
them. (Most importantly: the AST is still in a perfectly consistent
state after removing default arguments)

Sounds good. Keeping the AST consistent on errors is a high priority, thanks :slight_smile:

This logic is reasonable, but what do you think about adding a new method
to FunctionDecl, something like "unsigned getMinRequiredArguments() const"
or something like that, which would scan the argument list for default
arguments. I think this is a generally useful method, and could simplify
this down to:

if (NumArgs < NumArgsInProto) {
if (FDecl && NumArgs >= FDecl->getMinRequiredArguments())
    ...

which would be simpler :). It could also be used to simplify the logic
above.

Sure, although I'm going to have to point out that we're replacing a
constant-time check (is a particular pointer non-NULL) with a loop
that spins through the parameter list :slight_smile:

You're definitely right. If it ever becomes a problem we can reevaluate it :).

The patch looks great except some nit-picky stuff, I applied your patch here:
http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20080407/005098.html

The only change I made was to fold a few regtests together into one file. While our current testsuite doesn't have much specific structure, for language features I'd generally prefer to have one file for each "feature" and have multiple tests within that file. This keeps the # files down, which speeds up testing and will hopefully keep the suite reasonable. This is only possible for stuff that uses -verify of course.

Here is the C89 tweak:
http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20080407/005099.html

Here are the nit-picks:

+++ include/clang/AST/ExprCXX.h (working copy)
@@ -149,7 +149,9 @@ namespace clang {
      Expr *getExpr() { return Param->getDefaultArg(); }

      virtual SourceRange getSourceRange() const {
- return Param->getDefaultArg()->getSourceRange();
+ // Default argument expressions have no represntation in the
+ // source, so they have an empty source range.
+ return SourceRange();

spello represntation

+++ include/clang/AST/Decl.h (working copy)
@@ -352,6 +352,7 @@ public:
      return ParamInfo[i];
    }
    void setParams(ParmVarDecl **NewParamInfo, unsigned NumParams);
+ unsigned getMinRequiredArguments() const;

Please add a copy of your doxygen comment from the cpp file. Please mention "C++ default arguments".

+++ lib/Sema/SemaExpr.cpp (working copy)
@@ -637,11 +634,10 @@ ActOnCallExpr(ExprTy *fn, SourceLocation
      // If too few arguments are available (and we don't have default
      // arguments for the remaining parameters), don't make the call.
      if (NumArgs < NumArgsInProto) {
- if (getLangOptions().CPlusPlus &&
- FDecl &&
- FDecl->getParamDecl(NumArgs)->getDefaultArg()) {
+ if (FDecl && NumArgs >= FDecl->getMinRequiredArguments()) {
          // Use default arguments for missing arguments
          NumArgsToCheck = NumArgsInProto;
+ TheCall->setNumArgs(NumArgsInProto);
        } else

Nice!

+++ lib/Sema/SemaDeclCXX.cpp (working copy)

+/// CheckDefaultArgumentVisitor - Traverses the default argument of a
+/// parameter to determine whether it contains any ill-formed
+/// subexpressions. For example, this will diagnose the use of local
+/// variables or parameters within the default argument expression.
+class VISIBILITY_HIDDEN CheckDefaultArgumentVisitor

Please wrap this class in an anonymous namespace. Is there a spec citation that you can give for this? Maybe C++ [dcl.fct.default] is enough.

+bool CheckDefaultArgumentVisitor::VisitExpr(Expr *Node) {
+ bool IsInvalid = false;
+ for (Stmt::child_iterator first = Node->child_begin(),
+ last = Node->child_end();
+ first != last; ++first)
+ IsInvalid |= Visit(*first);

Hi Chris,

+++ lib/Sema/SemaDeclCXX.cpp (working copy)

+/// CheckDefaultArgumentVisitor - Traverses the default argument of a
+/// parameter to determine whether it contains any ill-formed
+/// subexpressions. For example, this will diagnose the use of local
+/// variables or parameters within the default argument expression.
+class VISIBILITY_HIDDEN CheckDefaultArgumentVisitor

Please wrap this class in an anonymous namespace. Is there a spec citation
that you can give for this? Maybe C++ [dcl.fct.default] is enough.

It checks semantics for a couple different paragraphs in
[dcl.fct.default], so I've used that.

+bool CheckDefaultArgumentVisitor::VisitExpr(Expr *Node) {
+ bool IsInvalid = false;
+ for (Stmt::child_iterator first = Node->child_begin(),
+ last = Node->child_end();
+ first != last; ++first)
+ IsInvalid |= Visit(*first);
+
+ return IsInvalid;
+}

Does it make sense to report multiple diagnostics for a malformed default?

Yes, I think it does make sense to report multiple diagnostics,
because they are all independent problems.

+ // Default arguments are evaluated each time the function is
+ // called. The order of evaluation of function arguments is
+ // unspecified. Consequently, parameters of a function shall not
+ // be used in default argument expressions, even if they are not
+ // evaluated. Parameters of a function declared before a default
+ // argument expression are in scope and can hide namespace and
+ // class member names.
+ return S->Diag(DRE->getSourceRange().getBegin(),
+ diag::err_param_default_argument_references_param,
+ Param->getName());

It's a very minor thing, but it would be nice to highlight the sourcerange
for the entire default arg expr as well, giving something like:

void foo(int i, int j, int k = i+j)
                               ^~~

This would require capturing the root of the default arg expr in the
CheckDefaultArgumentVisitor, but that doesn't seem too evil :slight_smile:

Okay; that's not evil at all. Actually, it looks pretty nice.

+ // C++ [dcl.fct.default]p7
+ // Local variables shall not be used in default argument
+ // expressions.
+ return S->Diag(DRE->getSourceRange().getBegin(),
+ diag::err_param_default_argument_references_local,
+ BlockVar->getName());

Note that this will also exclude local static variables and local extern
variables. For example:

void foo() {
  extern int x;
  void bar(int z = x);
}

Is that expected?

Yes, that's expected.

Cleanup patch attached.

  - Doug

clang-default-arg-fixes.patch (8.72 KB)

The patch looks great, applied:
http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20080407/005157.html

Thanks Doug,

-Chris