Hello

Hi,

I just took my first steps into clang development (downloading and compiling the source and subscribing to this list) and I thought I'd say hello. I'm a CS student and mostly interested in C++, so that's where I'll be working. I think I'll like this work. Expect lots of questions.

Sebastian

Welcome. For each question you ask, you got to implement one missing feature of C++. :slight_smile: Feel free to ask any hard questions you want, as templates aren't done yet. :slight_smile:

Mike Stump wrote:

mostly interested in C++, so that's where I'll be working. I think I'll like this work. Expect lots of questions.

Welcome. For each question you ask, you got to implement one missing feature of C++. :slight_smile: Feel free to ask any hard questions you want, as templates aren't done yet. :slight_smile:

Hehe :slight_smile:

I think I'll go for namespaces first. However, it seems that there is absolutely no support for nested names yet. Is that correct? Such support would have to be added to the parser and to the semantic analysis at least. MinimalAction too? It would have to keep track of identifiers, if we don't want horrendous misparses of code when not using the full semantic analysis.

namespace foo
{
        typedef int bar;
}
foo::bar i;

cxx-parse-nested-specifier.cpp:7:4: error: expected '=', ',', ';', 'asm', or '__attribute__' after declarator

Sebastian

Hi Sebastian,

Sebastian Redl wrote:

I think I'll go for namespaces first. However, it seems that there is absolutely no support for nested names yet. Is that correct?

I have a set of patches for nested names, waiting to be "unleashed on the world" when the conditions are right.
On the other hand here are a couple of missing C++ features (the "good thing" of so much missing C++ support is lots of options to choose implementing :slight_smile: )

--Unnamed namespaces:

namespace { void f(); }
void g() { f(); }

main.cpp:2:12: error: use of undeclared identifier 'f'

--'using' directives:

using namespace foo;

Such support would have to be added to the parser and to the semantic analysis at least. MinimalAction too?

That is a good question. Is MinimalAction expected to eventually support the full C++ type system ? (nested names, templates)

-Argiris

Such support would have to be added to the parser and to the semantic analysis at least. MinimalAction too?

That is a good question. Is MinimalAction expected to eventually support the full C++ type system ? (nested names, templates)

I don't think so. Supporting template would need nearly all (at least a good part) of the C++ semantic. And this is not the goal.
An heuristic could get pretty good result on basic code, but I think that if someone want to use an heuristic, he should write his own Action class.

Sounds great. Don't be shy about submitting as you develop. People can steer you away from pitfalls faster, and others might do some of the work so you don't have to do it all.

Sebastian Redl wrote:

Argiris Kirtzidis wrote:

I have a set of patches for nested names, waiting to be "unleashed on the world" when the conditions are right.

Could you unleash them on me at least for now? There's no point in trying to implement much regarding namespaces without support for them.

Here's an initial discussion about nested-names:
http://lists.cs.uiuc.edu/pipermail/cfe-dev/2008-August/002417.html

To be more specific about "the conditions", I'm talking about having some kind of "annotation tokens" like I propose here:
http://lists.cs.uiuc.edu/pipermail/cfe-dev/2008-August/002664.html

These will be useful for both nested-names and ambiguity resolution (discussed here: http://lists.cs.uiuc.edu/pipermail/cfe-dev/2008-August/002594.html)

Please let me know if you have some thoughts on the subject.

-Argiris

Argiris Kirtzidis wrote:

Here's an initial discussion about nested-names:
http://lists.cs.uiuc.edu/pipermail/cfe-dev/2008-August/002417.html

To be more specific about "the conditions", I'm talking about having some kind of "annotation tokens" like I propose here:
http://lists.cs.uiuc.edu/pipermail/cfe-dev/2008-August/002664.html

These will be useful for both nested-names and ambiguity resolution (discussed here: http://lists.cs.uiuc.edu/pipermail/cfe-dev/2008-August/002594.html)

Please let me know if you have some thoughts on the subject.

I had always thought that it was possible to parse the non-template part of nested name specifiers without having to refer to sema at all. In my opinion, C++ would have been better if the specification of nested-id simply used 'identifier' instead of 'class-or-namespace-name'.
But then, a lot of C++ would be a lot easier (not only to parse) if the designers had avoided semantically loaded terms in the grammar and actually looked out for and prevented syntactic ambiguities.

Anyway, I have thought about the old discussions a lot this last week, and most of my comments turned out to be already answered. Below is what remains.

I had some thoughts of parsing the entire thing ahead of time and then having the sema resolve all issues at once, but there is one test case that really makes this impractical:

typedef int foo;
namespace abc { foo bar(); }
foo::abc::bar()
{
  // ...
}

Thus, there is one statement of yours in the discussion that is wrong:

Anyway, I don't think we need to worry about whether it should be "A::B ::C" or "A ::b::C" or whatever.
  

We do have to worry. GCC parses the above example correctly as foo ::abc::bar. It doesn't stumble if foo is in its own namespace or even template class either. (In my opinion, that's really a defect in the standard. Nor do I think that there is any program out there that really relies on this.)
Consider also:

I think this is correct since the spec says nothing about "'::' associativity".
  

:: is part of productions further down the tree than anything else, so it binds more strongly than anything else. However, it only binds to left-hand identifiers that actually name a namespace or class. Otherwise, it's the global scope.

It probably could still be done, but it would be very complex code - the sema would have to report that there are two identifiers, it would have to report the split position, and the parser would have to adjust its state according to this new revelation. It would shift a job to the sema that really is the parser's problem, and all that for probably no performance gain at all.
(Hm, or is that what your patch does? I haven't actually looked at any code.)

Here's a nice pathological case. GCC is so confused by it that it aborts processing. It doesn't even take any instantiations - the definition is enough. (Of course, to actually be a definition of the second template function, there would have to be a typename before the B::A.)

namespace A { template <typename B> B f(); }
template <typename B> typename B::A f();

template <typename B>
B::a::f()
{
  return 0;
}

struct s { typedef int A; }
void foo()
{
        f<int>();
        f<s>();
}

It's not surprising that this confuses GCC. This matter is actually unspecified in the 2003 standard, see DR215. DR215 is in fact very important to this whole matter, as is DR125, because the resolutions to these issues render the first example I've given invalid. The resolution to 125 has been voted into the C++0x paper in 2004, the one to 215 in 2007. Under these rules, the class-or-namespace-name production disappears and is replaced by the more sensible type-or-namespace-name, with qualified name lookup amended to fail if the type name doesn't denote a class. (A type template parameter is a type-name.)

The question is whether we want to follow the broken rules of C++03, as GCC does, or the updated rules of C++0x, which introduces a tiny incompatibility to GCC.

I think that we should forbid using anything but Sema as the C++ Action. At least when we get to implementing templates, implementing isTypeName and similar functions is such a burden that it just doesn't make sense to use a different action, and there is no heuristic that can guess at types vs non-types with any reasonable accuracy - at least not without parsing ahead, and an Action can't do that.
The alternative is moving all type analysis, including that from Sema, to MinimalAction, and deriving Sema from MinimalAction. That would seriously slow down MinimalAction for C, though, I think.

Parser::isTokenStreamTypeName() - if I understand its purpose correctly, eventually this function will have to distinguish between types, templates, perhaps even concepts, and objects/functions. This is a lot for a function whose name suggests a boolean distinction.

Does whatever final resolution to the name lookup issue has been chosen handle reentrancy? Does the inner lookup work here?
ns1::templname<ns2::typename>::objname

I don't have any comments on the recently committed disambiguation code.

Sebastian

Hi Sebastian,

Sebastian Redl wrote:

I had some thoughts of parsing the entire thing ahead of time and then having the sema resolve all issues at once, but there is one test case that really makes this impractical:

typedef int foo;
namespace abc { foo bar(); }
foo::abc::bar()
{
// ...
}

Thus, there is one statement of yours in the discussion that is wrong:

Anyway, I don't think we need to worry about whether it should be "A::B ::C" or "A ::b::C" or whatever.
  

We do have to worry. GCC parses the above example correctly as foo ::abc::bar. It doesn't stumble if foo is in its own namespace or even template class either. (In my opinion, that's really a defect in the standard. Nor do I think that there is any program out there that really relies on this.)

Interesting find.

Consider also:

I think this is correct since the spec says nothing about "'::' associativity".
  

:: is part of productions further down the tree than anything else, so it binds more strongly than anything else. However, it only binds to left-hand identifiers that actually name a namespace or class. Otherwise, it's the global scope.

Hmm, the standard says at 3.4.3p1: "During the lookup for a name preceding the '::' scope resolution operator, object, function, and enumerator names are ignored. If the name found is not a class-name or namespace-name, the program is ill-formed".
It seems to me that '::' binds to left-hand identifiers and if the identifier is not a namespace or class, we can consider it an error. I can't find anything about resorting to the global scope when the identifier exists and it's not a class or namespace.
For comparison, both MSVC and Comeau report something like "error: name followed by '::' must be a class or namespace".

It probably could still be done, but it would be very complex code - the sema would have to report that there are two identifiers, it would have to report the split position, and the parser would have to adjust its state according to this new revelation. It would shift a job to the sema that really is the parser's problem, and all that for probably no performance gain at all.

I completely agree, I don't think it's worth it.

Here's a nice pathological case. GCC is so confused by it that it aborts processing. It doesn't even take any instantiations - the definition is enough. (Of course, to actually be a definition of the second template function, there would have to be a typename before the B::A.)

namespace A { template <typename B> B f(); }
template <typename B> typename B::A f();

template <typename B>
B::a::f()
{
return 0;
}

struct s { typedef int A; }
void foo()
{
       f<int>();
       f<s>();
}

It's not surprising that this confuses GCC. This matter is actually unspecified in the 2003 standard, see DR215. DR215 is in fact very important to this whole matter, as is DR125, because the resolutions to these issues render the first example I've given invalid. The resolution to 125 has been voted into the C++0x paper in 2004, the one to 215 in 2007. Under these rules, the class-or-namespace-name production disappears and is replaced by the more sensible type-or-namespace-name, with qualified name lookup amended to fail if the type name doesn't denote a class. (A type template parameter is a type-name.)

The question is whether we want to follow the broken rules of C++03, as GCC does, or the updated rules of C++0x, which introduces a tiny incompatibility to GCC.

As I already mentioned, it's not clear that even C++03 allows this.

I think that we should forbid using anything but Sema as the C++ Action. At least when we get to implementing templates, implementing isTypeName and similar functions is such a burden that it just doesn't make sense to use a different action, and there is no heuristic that can guess at types vs non-types with any reasonable accuracy - at least not without parsing ahead, and an Action can't do that.
The alternative is moving all type analysis, including that from Sema, to MinimalAction, and deriving Sema from MinimalAction. That would seriously slow down MinimalAction for C, though, I think.

Parser::isTokenStreamTypeName() - if I understand its purpose correctly, eventually this function will have to distinguish between types, templates, perhaps even concepts, and objects/functions. This is a lot for a function whose name suggests a boolean distinction.

Yeah, Parser::isTokenStreamTypeName() wasn't such a good approach. Currently I'm leaning towards "annotation tokens". The way I see them working is like this:
-At various points in the parser, when nested-name is encountered, a Parser::ParseCXXScopeSpec will parse it and leave a "scope spec token" to the token stream.
-This "scope spec token" can be later used by passing a "CXXScopeTy*" object (that "scope spec token" will contain) to various Sema actions.

For example:
  namespace foo { unsigned bar(); }
  unsigned foo::bar(); #1

At #1, Parser::ParseDeclarationSpecifiers will:
-parse 'unsigned'
-parse 'foo::' (using Parser::ParseCXXScopeSpec) leaving a "scope spec token" to the token stream.
-Check whether "foo::bar" is a typename by calling Action.isTypeName passing 'bar' identifier along with a CXXScopeTy* object (which the "scope spec token" can provide)
-finish parsing declaration specifiers since "foo::bar" is not a type (the "scope spec token" is still the current token).

Later, Parser::ParseDeclarator can see that there's a "scope spec token" in the token stream and take it into account when considering the declarator identifier.
The net result will be that each ParseXXX function will view a token stream appropriate to the parsing context, and without having to 're-parse' stuff (like re-parsing nested names).
The code will be simpler and more maintainable this way.

Let me know what you think.

Does whatever final resolution to the name lookup issue has been chosen handle reentrancy? Does the inner lookup work here?
ns1::templname<ns2::typename>::objname

I don't know how exactly templates will work, but I think there won't be reentrancy issues.

-Argiris

Argiris Kirtzidis wrote:

Hi Sebastian,

Sebastian Redl wrote:

Consider also:

I think this is correct since the spec says nothing about "'::' associativity".
  

:: is part of productions further down the tree than anything else, so it binds more strongly than anything else. However, it only binds to left-hand identifiers that actually name a namespace or class. Otherwise, it's the global scope.

Hmm, the standard says at 3.4.3p1: "During the lookup for a name preceding the '::' scope resolution operator, object, function, and enumerator names are ignored. If the name found is not a class-name or namespace-name, the program is ill-formed".
It seems to me that '::' binds to left-hand identifiers and if the identifier is not a namespace or class, we can consider it an error. I can't find anything about resorting to the global scope when the identifier exists and it's not a class or namespace.
For comparison, both MSVC and Comeau report something like "error: name followed by '::' must be a class or namespace".

The contradiction between 3.4.3 and 5.1 was part of the defect report.

Sebastian

Sebastian Redl wrote:

Argiris Kirtzidis wrote:

Hmm, the standard says at 3.4.3p1: "During the lookup for a name preceding the '::' scope resolution operator, object, function, and enumerator names are ignored. If the name found is not a class-name or namespace-name, the program is ill-formed".
It seems to me that '::' binds to left-hand identifiers and if the identifier is not a namespace or class, we can consider it an error. I can't find anything about resorting to the global scope when the identifier exists and it's not a class or namespace.
For comparison, both MSVC and Comeau report something like "error: name followed by '::' must be a class or namespace".

The contradiction between 3.4.3 and 5.1 was part of the defect report.

Ah ok, I guess C++03 wasn't clear about the '::' binding.

About the example:

typedef int foo;
namespace abc { foo bar(); }
foo::abc::bar()
{
// ...
}

Are you suggesting that we should have identical behavior as gcc (not emitting an error) ?
This doesn't seem so important, particularly considering that it is going to be explicitly stated that an error is necessary.
We already have minor incompatibilities with gcc, like the scoping of the 'condition' declarations in selection/iteration statements and some cases of declaration/expression ambiguity resolution.

-Argiris

If the standard says we're required to produce an error, then we
should produce an error. We don't want to get into the game of trying
to parse ill-formed code that other compilers happen to parse.
Building a standards-conforming C++ front end is hard enough as it is
:slight_smile:

The best thing to do would be to make sure that there is a bug report
about each of these issues in GCC's bug database. If they fix the
problem, they'll be helping to migrate user code to be more
standards-conforming, and we'll have fewer issues when that code gets
compiled with Clang. And even if GCC doesn't fix the problem, at least
we can point to the report and say, "look, GCC knows the code is
wrong, too, but they haven't gotten around to diagnosing it."

It would also be useful to keep a list of these places where existing
compilers accept ill-formed code that Clang rejects. This list could
eventually be turned into some kind of "porting to Clang" guide.

  - Doug

Argiris Kirtzidis wrote:

Are you suggesting that we should have identical behavior as gcc (not emitting an error) ?

Absolutely not! Especially since MSVC and Comeau reject it. (What does EDG do?)

We already have minor incompatibilities with gcc, like the scoping of the 'condition' declarations in selection/iteration statements and some cases of declaration/expression ambiguity resolution.

This incompatibility is not so minor, on the other hand. Many old C++ programs rely on incorrect for-scoping, and many compilers support this compatibility feature.

Still, extremely low priority. As Doug says, implementing the standard is hard enough without implementing compatibility features and compiler quirks.

Sebastian

FWIW, I also agree here: In terms of core C++ support (name resolution, templates, etc), clang should aim to be as conformant as possible. Accepting broken code is more more important for C than C++. That said, I think it will be important for the C++ front-end to accept some of the grossness that comes from the common subset of C and C++ (e.g. "extended" integer constant expressions, etc), and we'll probably end up supporting the GNU C++ extensions as well someday.

For better or worse, C++ programmers are much more used to having to tweak their code when updating compilers than C programmers, and there are many more 'dusty decks' of C programs out there.

-Chris

My guidance would be follow the new rules, and if it poses a problem for real code (or if it people just want to donate extra goodness), then add the old semantics under 03 (or earlier).

There's a different here between things that were clarified after
C++98 or C++03 and things that were actually changed. If it's
something that was clarified (e.g., via a defect report), we should
just implement the new semantics unless it poses a problem for real
code. If it's some new feature in C++0x, or something that
intentionally changed the behavior, we should implement both... and,
ideally, have some kind of C++0x-compatibility warning. Examples of
such changes include "auto" changing from a storage specifier to a
type-specifier, and >> changing meanings within a template argument
list.

In this particular case, I think we just implement the clarified
semantics and move on.

  - Doug

FWIW, I also agree here: In terms of core C++ support (name
resolution, templates, etc), clang should aim to be as conformant as
possible. Accepting broken code is more more important for C than C+
+. That said, I think it will be important for the C++ front-end to
accept some of the grossness that comes from the common subset of C
and C++ (e.g. "extended" integer constant expressions, etc), and we'll
probably end up supporting the GNU C++ extensions as well someday.

I think the philosophy should be exactly the same as that for C: we
try to be as standards-compliant as possible, try to allow
non-standard constructs in such a way that compliant code doesn't
break, and deal with any conflicts on a case-by-case basis. Conflicts
are generally rare... the only ones I can think of that we've had to
deal with for C are trigraph support and integer constant expressions
like those in tgmath.h.

For better or worse, C++ programmers are much more used to having to
tweak their code when updating compilers than C programmers, and there
are many more 'dusty decks' of C programs out there.

g++ can get away with breaking code because of its market-share; I
don't think that's really relevant for clang. I'm sure we'll run into
plenty of places where we'll have to extend the semantics for
non-compliant programs.

-Eli

Doug Gregor wrote:

It would also be useful to keep a list of these places where existing
compilers accept ill-formed code that Clang rejects. This list could
eventually be turned into some kind of "porting to Clang" guide.

This sounds good, how about a simple text file in the 'docs' directory ?
And should it keep track of incompatibilities with msvc too or only focus on gcc ?

-Argiris

Doug Gregor wrote:

It would also be useful to keep a list of these places where existing
compilers accept ill-formed code that Clang rejects. This list could
eventually be turned into some kind of "porting to Clang" guide.

This sounds good, how about a simple text file in the 'docs' directory ?

Let's do HTML, since the internals manual is already HTML. I suggest
"Porting.html", for Porting to Clang.

And should it keep track of incompatibilities with msvc too or only focus on
gcc ?

Both, in separate sections.

Thanks!

  - Doug