Of the two, I like #1 the best: it is very explicit, and it doesn't require
cloning around ASTs unnecessarily. The down side about it is that clients
have to handle one more node type, but I don't think this is a killer.
Being an explicit node like this makes it easier to handle, and we keep the
invariant that the number of arguments to a call is >= the of formal
arguments for the callee. What do you think?
That's reasonable. The updated patch contains a new node
CXXDefaultArgExpr; most of the changes in this patch just deal with
that new node in the statement printer, serialization, code generator,
etc.
Looks great, I like this approach. Instead of dwelling on details and forcing you to keep the patch up to date, I committed it here (after resolving a few conflicts that came up in the short time since you sent it out!):
http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20080407/005079.html
Some more nit-picky stuff is included below.
The code generator bits were no fun at all, because I ended up having
to duplicate the handling of CXXDefautArgExpr for CGExprComplex,
CGExprConstant, CGExprScalar, and CGExprAgg. I hope there's a better
way to do this, but I didn't dig deep into the CodeGen module before I
convinced myself there wasn't. CXXDefaulArgExpr is a *great* candidate
for a pre-CodeGen lowering phase 
The funny thing about it is that the code being lowered isn't *exactly* the same (the methods return different things). In llvm-gcc, we actually only have two code paths (one for scalars and one for aggregates) instead of three... and they are actually interlaced together and active in a context sensitive way. Not having a codepath specialized for _Complex means that we get substantially inferior code in some cases, and having them interlaced together makes the code much more complex and difficult to understand.
I won't claim with certainty that the approach taken by the clang codegen is the one will end up with, but it's a step forward. A bigger issue is that a lot of it has to be duplicated (same in llvm-gcc) for constant initializers, which is annoying, but doesn't affect default arguments at least. 
There's a new test for the CodeGen bits, but I don't know how to
automate the checks we need. If you build it with
-DCLANG_GENERATE_KNOWN_GOOD, it writes out the default arguments
explicitly at the call site; otherwise, it lets the front end fill in
the default arguments. Ideally, we would build this test both ways and
"diff" the byte-code produced from each (that's what I did to verify
the correctness of the CodeGen additions), but I didn't implement that
in the test harness.
The codegen tests currently mostly test that the compiler doesn't crash on some language feature. I think this is fairly reasonable (and follows what we do with llvm-gcc). More comprehensive tests will include the whole programs from the llvm-test suite, and ultimately building llvm itself with clang ;-).
[Aside: StmtPrinter did me a great favor by failing to link when I
forgot to add in the CXXDefaultArgExpr case. The serialization code
and CodeGen modules, and perhaps even the isConstantExpr/isLvalue/etc
routines, could all benefit from this kind of built-in safety.]
This is a great idea. Some of the code could probably be auto-generated as well. I'll add it to my (huge) todo list.
@@ -768,7 +817,39 @@ Sema::ActOnDeclarator(Scope *S, Declarat
+ // When we have a prototype for a named function, create the
+ // parameter declarations for NewFD. C++ needs these early, to
+ // deal with merging default arguments properly.
I'm not sure what this is doing. It seems that it could be factored better
somehow, but I'm not exactly sure what is going on. Can you give an example
of when this is needed?
I've updated the comment to this:
// Copy the parameter declarations from the declarator D to
// the function declaration NewFD, if they are available.
Basically, the parameter declarations have all been created for the
function declarator (D)... the parser has called ActOnParamDeclarator
for each parameter, and put them into the function declarator. This
code is copying those parameter declarations into the FunctionDecl for
the function. We used to create the parameter declarations at the same
time we filled in the FunctionDecl (at definition time), but now we do
it as early as possible.
Ok!
Ok, I guess my basic objection is the duplication of this logic:
+ if (FTI.NumArgs == 1 && !FTI.isVariadic && FTI.ArgInfo[0].Ident == 0
&&
+ FTI.ArgInfo[0].Param &&
+
!((ParmVarDecl*)FTI.ArgInfo[0].Param)->getType().getCVRQualifiers() &&
+ ((ParmVarDecl*)FTI.ArgInfo[0].Param)->getType()->isVoidType()) {
Can this be factored out into a predicate method and shared with the other
code that does it?
At the moment, this is the only place that does this check.
Ah ok! Please disregard my objection, the code looked familiar so I was sure it was copied from somewhere. I didn't realize you also *deleted* the original copy.
It used to
be in ActOnStartOfFunctionDef, but I've moved it here. Granted, I
don't like that we do this test in Sema... I would much rather handle
(void) parameter lists in the parser, and just put a flag on the decl
to say whether "void" was given to denote an empty parameter list (or,
to be very picky, we could have an EmptyParamListType node that has
the actual "type" the user wrote, if any). Handling it in the parser
makes especially good sense for C++ (and perhaps C89? I don't have the
C standards handy) where the parameter list has to be spelled
(void)... it can't be (T) where T is a typedef of void as I think it
can in C99.
I'm not sure about that. clang tracks typedefs (and lack thereof) well enough that Sema can decide whether it was spelled "(void)" or "(T)". Also, we do have to handle the "(T)" case for C99. It seems to me that handling this in the parser would require handling it in both the parser and Sema.
Here are some of the increasingly minor nit picky details from your patch:
+++ include/clang/AST/ExprCXX.h (working copy)
I changed the comment above the CXXDefaultArgExpr to use ///'s so that doxygen picks it up.
+ // Retrieve the parameter that the argument was created from
+ const ParmVarDecl *getParam() const { return Param; }
Please terminate sentence comments with a '.'. Yes it's pedantic, but it makes the comments more consistent and read more nicely :). This occurs in many places in your patch.
+ virtual SourceRange getSourceRange() const {
+ return Param->getDefaultArg()->getSourceRange();
+ }
I don't think that this is right. The source range of the CXXDefaultArgExpr should be empty, not the source range of the input. The default argument doesn't occur in the source, so it shouldn't have a location.
+++ lib/AST/ExprCXX.cpp (working copy)
+// CXXDefaultArgExpr
+Stmt::child_iterator CXXDefaultArgExpr::child_begin() {
+ return reinterpret_cast<Stmt**>(Param->getDefaultArg());
+}
This is subtly wrong. The child iteration stuff should walk the expression tree in the current function, it shouldn't "jump the gap here". I suspect that -ast-dump will have problems with CXXDefaultArgExpr's due to this. I'd just have CXXDefaultArgExpr just return an empty child range.
+void
+Sema::ActOnParamDefaultArgument(DeclTy *param, SourceLocation EqualLoc,
+ ExprTy *defarg) {
...
+ // Some default arguments were missing. Clear out all of the
+ // default arguments up to (and including) the last missing
+ // default argument, so that we leave the function parameters
+ // in a semantically valid state.
+ for (p = 0; p <= LastMissingDefaultArg; ++p) {
+ ParmVarDecl *Param = FD->getParamDecl(p);
+ if (Param->getDefaultArg()) {
+ delete Param->getDefaultArg();
+ Param->setDefaultArg(0);
+ }
+ }
Should these ParmVarDecl's be marked as erroneous? This might be a good idea if there are secondary errors that can occur because they are missing default initializers. I can't think of any specific examples off the top of my head though, your call.
+++ lib/Sema/SemaExpr.cpp (working copy)
// Make the call expr early, before semantic checks. This guarantees cleanup
// of arguments and function on error.
- llvm::OwningPtr<CallExpr> TheCall(new CallExpr(Fn, Args, NumArgs,
+ if (getLangOptions().CPlusPlus && FDecl && NumArgs < FDecl- >getNumParams())
+ NumArgsPassed = FDecl->getNumParams();
+ llvm::OwningPtr<CallExpr> TheCall(new CallExpr(Fn, Args, NumArgsPassed,
Context.BoolTy, RParenLoc));
I don't think this is safe. The CallExpr ctor reads the range [Args,Args+NumArgsPassed), so changing NumArgsPassed in this way makes it potentially read beyond the end of the args array.
+ // If too few arguments are available (and we don't have default
+ // arguments for the remaining parameters), don't make the call.
+ if (NumArgs < NumArgsInProto) {
+ if (getLangOptions().CPlusPlus &&
+ FDecl &&
+ FDecl->getParamDecl(NumArgs)->getDefaultArg()) {
+ // Use default arguments for missing arguments
+ NumArgsToCheck = NumArgsInProto;
+ } else
+ return Diag(RParenLoc, diag::err_typecheck_call_too_few_args,
+ Fn->getSourceRange());
+ }
This logic is reasonable, but what do you think about adding a new method to FunctionDecl, something like "unsigned getMinRequiredArguments() const" or something like that, which would scan the argument list for default arguments. I think this is a generally useful method, and could simplify this down to:
if (NumArgs < NumArgsInProto) {
if (FDecl && NumArgs >= FDecl->getMinRequiredArguments())
...
which would be simpler :). It could also be used to simplify the logic above.
+++ lib/Sema/SemaType.cpp (working copy)
@@ -394,7 +394,7 @@ QualType Sema::GetTypeForDeclarator(Decl
llvm::SmallVector<QualType, 16> ArgTys;
for (unsigned i = 0, e = FTI.NumArgs; i != e; ++i) {
- QualType ArgTy = QualType::getFromOpaquePtr(FTI.ArgInfo[i].TypeInfo);
+ QualType ArgTy = ((ParmVarDecl *)FTI.ArgInfo[i].Param)- >getType();
Please do the cast into a temporary 'ParmVarDecl*' at the top of the loop instead of doing the cast three times in the loop.
+++ lib/Sema/SemaDeclObjC.cpp (working copy)
+ // Introduce all of the othe parameters into this scope
for (unsigned i = 0, e = MDecl->getNumParams(); i != e; ++i) {
typo "othe", should end with a period.
+++ lib/AST/Expr.cpp (working copy)
@@ -1009,6 +1021,9 @@ bool Expr::isNullPointerConstant(ASTCont
// Accept ((void*)0) as a null pointer constant, as many other
// implementations do.
return PE->getSubExpr()->isNullPointerConstant(Ctx);
+ } else if (const CXXDefaultArgExpr *DefaultArg = dyn_cast<CXXDefaultArgExpr>(this)) {
+ // See through default argument expressions
+ return DefaultArg->getExpr()->isNullPointerConstant(Ctx);
}
80 columns.
+++ lib/Parse/ParseDecl.cpp (working copy)
@@ -1232,8 +1232,10 @@ void Parser::ParseParenDeclarator(Declar
/// declaration-specifiers declarator
+/// [C++] declaration-specifiers declarator '=' assignment-expression
/// [GNU] declaration-specifiers declarator attributes
/// declaration-specifiers abstract-declarator[opt]
+/// [C++] declaration-specifiers abstract-declarator[opt] '=' assignment-expression
/// [GNU] declaration-specifiers abstract-declarator[opt] attributes
80 columns.
Thanks again Doug, this is great work!
-Chris