C++ using-directive parsing

Hi all,

I have played bit with clang, and implemented parsing of C++ using-directive.
I would like to work on Sema support for it too, but I'm still
unfamiliar with project,
and not really sure if I am going in *right* direction. Hints and
comments are highly welcome :slight_smile:
Attached contains things I have done so far. Before I will continue
adding AST and Sema support,
I would like to explain what in am planing to do, because I might be
missing something.

  Using directive basically fits concept of ScopedDecl, but not really
well NamedDecl concept.
It is not really clear, how it should be represented. And ofcourse
lookup will have to deal with it
in special way.
My idea is to represent it as ScopedDecl and introduce special name
(DeclarationNameExtra instance),
shared by all UsingDirectiveDecl's. Those would be added to
DeclContext in which they appear
(in case of namespace only to original namespace). Does it sound sane?
On lookup, list of all namespaces, both introduced directly and
indirectly by using directive
would be build. I belive it would be wise to cache it, which would
introduce another member to DeclContext.
DeclContext should be pretty common citizen in AST, well maybe not in
headers, so it could be some memory
overhead, especially for C. Or Maybe I just worry to early.

There are few possible options to deal with it not increasing size of
DeclContext:
- Add this member to each concrete ScopedDecl in which it may appear,
or even new base class and MI
- (Ab)use DeclContext.LookupPtr structure and associate this list with
special name,
  (same as used by UsingDirectiveDecl's?)
- any others?

First is probably more clean, but IMO using directives are not so
often, and it would introduce some
memory overhead for Decl's that don't use it.

Second feels like ugly hack, but it doesn't share this disadvantage
with first, but requires some overhead
(hashtable lookup). Initially I would be in favour of implementing of
second option. There is also FIXME,
that says about need of replacing structure held by LookupPtr, so it
could be smartly abstracted later.

Another minor issue is keeping list of used namespaces up to date,
which probably will require some
knowledge from namespace about its 'users'. Here I would like to
introduce distinction between original-namespace,
and extended-namespace. This not really required, but seems to be good
idea, because original namespace will
have to keep bit more information than extended namespace, later also
things like C++0x
inline namespace/gnu strong-using extension. This could be done same
*transparent* way, that was suggested
by Mr. Douglas Gregor some time ago (strangely couldn't find pointers
to this message now).

I have also attached 'make report' results, since I have noticed
CodeGenObjC/encode-test.m failure,
not introduced by my change. I'am using gentoo gcc-4.3.2 x86 linux.
so it might be compiler issue.

Thanks, Piotr

Initial-Parser-support-for-C-using-directive.patch (18.4 KB)

make-report-result-r61444.txt (2.65 KB)

Hi Piotr,

I have played bit with clang, and implemented parsing of C++ using-directive.

Great!

I would like to work on Sema support for it too, but I'm still
unfamiliar with project,
and not really sure if I am going in *right* direction. Hints and
comments are highly welcome :slight_smile:
Attached contains things I have done so far. Before I will continue
adding AST and Sema support,
I would like to explain what in am planing to do, because I might be
missing something.

Okay, I'll have some comments on your patch below.

Using directive basically fits concept of ScopedDecl, but not really
well NamedDecl concept.

There are other kinds of entities that are ScopedDecls/NamedDecls but don't actually have names.

It is not really clear, how it should be represented. And ofcourse
lookup will have to deal with it
in special way.

Definitely.

My idea is to represent it as ScopedDecl and introduce special name
(DeclarationNameExtra instance),
shared by all UsingDirectiveDecl's. Those would be added to
DeclContext in which they appear
(in case of namespace only to original namespace). Does it sound sane?

That's interesting. It'll have to be an overloaded name, just like function names can be overloaded, but it does make it easy to fit into the existing name lookup mechanisms (in DeclContext). That said, it seems slightly weird to have an actual name for using directives... perhaps we can hide the fact that they have a name, encapsulating it inside the logic of DeclContext, DeclarationName, and Sema?

On lookup, list of all namespaces, both introduced directly and
indirectly by using directive
would be build.

My initial reaction is that keeping track of namespaces introduced indirectly will add more memory and complexity overhead than it will save in time. But, let's explore this further.

I belive it would be wise to cache it, which would
introduce another member to DeclContext.
DeclContext should be pretty common citizen in AST, well maybe not in
headers, so it could be some memory
overhead, especially for C. Or Maybe I just worry to early.

It's good to keep DeclContext small, if we can, because it is used in many places. Interestingly, namespace directives can also be used within blocks, which aren't DeclContexts. I wonder if that will change, or if we'll instead want to put something into the Scope for blocks.

There are few possible options to deal with it not increasing size of
DeclContext:
- Add this member to each concrete ScopedDecl in which it may appear,
or even new base class and MI
- (Ab)use DeclContext.LookupPtr structure and associate this list with
special name,
(same as used by UsingDirectiveDecl's?)
- any others?

First is probably more clean, but IMO using directives are not so
often, and it would introduce some
memory overhead for Decl's that don't use it.

I agree.

Second feels like ugly hack, but it doesn't share this disadvantage
with first, but requires some overhead
(hashtable lookup). Initially I would be in favour of implementing of
second option. There is also FIXME,
that says about need of replacing structure held by LookupPtr, so it
could be smartly abstracted later.

I like this second option, and I think that if we abstract it well---e.g., by providing special methods inside DeclContext to lookup the list of using directives rather than having the client create a special DeclarationName each time---and document the slightly sneaky use of DeclarationName, it'll be clean.

Another minor issue is keeping list of used namespaces up to date,
which probably will require some
knowledge from namespace about its 'users'. Here I would like to
introduce distinction between original-namespace,
and extended-namespace. This not really required, but seems to be good
idea, because original namespace will
have to keep bit more information than extended namespace,

I think this complexity comes from trying to keep the list of namespaces indirectly searched, but I don't think that's an important case to optimize for. It only takes a single hash-table lookup into each DeclContext to find the using directives at each level, and I'm guessing that it's rare for there to be more than two or three levels of indirection to deal with in real programs. Let's just start by keeping track of the direct using directives; it'll save space and keep the complexity down. If we find that the depth of the using-directive search is getting large, we'll consider doing something more involved.

later also
things like C++0x
inline namespace/gnu strong-using extension. This could be done same
*transparent* way, that was suggested
by Mr. Douglas Gregor some time ago (strangely couldn't find pointers
to this message now).

I'm still working on that one. Inline namespace should be pretty easy once I've finished that work into transparent DeclContexts.

(Oh, and I'm just "Doug" <g>)

I have also attached 'make report' results, since I have noticed
CodeGenObjC/encode-test.m failure,
not introduced by my change. I'am using gentoo gcc-4.3.2 x86 linux.
so it might be compiler issue.

It's not failing on my Mac, so we'll have to have someone familiar with Objective C (and who has a Linux machine!) take a look. Thanks for pointing this out!

Some comments on your patch:

@@ -225,6 +227,9 @@ Parser::DeclTy *Parser::ParseDeclaration(unsigned Context) {
      return ParseTemplateDeclaration(Context);
    case tok::kw_namespace:
      return ParseNamespace(Context);
+ case tok::kw_using:
+ if (isCXXUsingDirective()) return ParseUsingDirective(Context);
+ else return ParseUsingDeclaration(Context); //FIXME: it is just stub
    default:
      return ParseSimpleDeclaration(Context);

Instead of having the isCXXUsingDirective---which requires lookahead---could you just have a ParseUsingDirectiveOrDeclaration function that eats the "using" and then decides whether it goes on to parse a using directive or using declaration? It would be slightly more efficient.

+/// ParseNamespaceName - Parse c++ namespace-name, returns null and issues
+/// diagnostics if current token is not identifier or it is not namespace
+/// name. Otherwise returns found namespace declaration.
+Parser::DeclTy *Parser::ParseNamespaceName(const CXXScopeSpec &SS) {

I think ParseNamespaceName is trying to do more work than it should. It's calling isNamespaceName to check whether the identifier is a namespace, effectively performing some of the semantic analysis for using directives. However, in this case the parse won't parse differently dependent on whether a name is a namespace-name, class-name, or something else. So, I suggest just parsing the identifier as an identifier (in ParseUsingDirective) and then passing the identifier and its source location directly to Actions.ActOnUsingDirective. Then, semantic analysis can lookup the identifier in the scope of 'SS', complain if it's not a namespace-name, and do whatever other AST-building and semantic-analysis tasks it needs to. That would probably mean that we would no longer need isNamespaceName (!).

+// Defined out-of-line here because of dependecy on AttributeList
+Action::DeclTy *Action::ActOnUsingDirective(Scope *CurScope,
+ SourceLocation UsingLoc,
+ SourceLocation NamespaceLoc,
+ const CXXScopeSpec &SS,
+ DeclTy *Namespace,
+ AttributeList *AttrList) {

Douglas Gregor wrote:

Hi Piotr,

  +// Defined out-of-line here because of dependecy on AttributeList
+Action::DeclTy *Action::ActOnUsingDirective(Scope *CurScope,
+ SourceLocation UsingLoc,
+ SourceLocation NamespaceLoc,
+ const CXXScopeSpec &SS,
+ DeclTy *Namespace,
+ AttributeList *AttrList) {
+
+ // FIXME: Parser seems to assume that Action::ActOn* takes ownership over
+ // passed AttributeList, however other actions don't free it, is it
+ // temporary state or bug?
+ delete AttrList;
+ return 0;
+}

We're still working on the ownership issues with the Parse-Sema interaction (with Sebastian Redl leading the charge), so this is a good FIXME to leave in place to remind us to deal with these issues.

AttributeList ... interesting. So it's defined out-of-line since the delete needs the full definition, while all existing uses of AttributeList leak the object and thus are satisfied with the declaration.
I'll add AttributeList to the things that need to have their ownership straightened out.

Great work on the patch. I have nothing to add to Doug's comments.

Sebastian

Thanks for review, comments and anwsers. :slight_smile:

Hi Piotr,

My idea is to represent it as ScopedDecl and introduce special name
(DeclarationNameExtra instance),
shared by all UsingDirectiveDecl's. Those would be added to
DeclContext in which they appear
(in case of namespace only to original namespace). Does it sound sane?

That's interesting. It'll have to be an overloaded name, just like function
names can be overloaded, but it does make it easy to fit into the existing
name lookup mechanisms (in DeclContext). That said, it seems slightly weird
to have an actual name for using directives... perhaps we can hide the fact
that they have a name, encapsulating it inside the logic of DeclContext,
DeclarationName, and Sema?

That was idea, I think it is possible.

On lookup, list of all namespaces, both introduced directly and
indirectly by using directive
would be build.

My initial reaction is that keeping track of namespaces introduced
indirectly will add more memory and complexity overhead than it will save in
time. But, let's explore this further.

Ok.

I belive it would be wise to cache it, which would
introduce another member to DeclContext.
DeclContext should be pretty common citizen in AST, well maybe not in
headers, so it could be some memory
overhead, especially for C. Or Maybe I just worry to early.

It's good to keep DeclContext small, if we can, because it is used in many
places. Interestingly, namespace directives can also be used within blocks,
which aren't DeclContexts. I wonder if that will change, or if we'll instead
want to put something into the Scope for blocks.

I was not aware of this issue.

From my point of view it would be easier handle it consistenly in DeclContext,

however didn't looked at this code yet. I'll take a look how big
change it would be.

Second feels like ugly hack, but it doesn't share this disadvantage
with first, but requires some overhead
(hashtable lookup). Initially I would be in favour of implementing of
second option. There is also FIXME,
that says about need of replacing structure held by LookupPtr, so it
could be smartly abstracted later.

I like this second option, and I think that if we abstract it well---e.g.,
by providing special methods inside DeclContext to lookup the list of using
directives rather than having the client create a special DeclarationName
each time---and document the slightly sneaky use of DeclarationName, it'll
be clean.

True.

Another minor issue is keeping list of used namespaces up to date,
which probably will require some
knowledge from namespace about its 'users'. Here I would like to
introduce distinction between original-namespace,
and extended-namespace. This not really required, but seems to be good
idea, because original namespace will
have to keep bit more information than extended namespace,

I think this complexity comes from trying to keep the list of namespaces
indirectly searched, but I don't think that's an important case to optimize
for. It only takes a single hash-table lookup into each DeclContext to find
the using directives at each level, and I'm guessing that it's rare for
there to be more than two or three levels of indirection to deal with in
real programs. Let's just start by keeping track of the direct using
directives; it'll save space and keep the complexity down. If we find that
the depth of the using-directive search is getting large, we'll consider
doing something more involved.

Ok, I leave it for now, should not be to hard to add later.

later also
things like C++0x
inline namespace/gnu strong-using extension. This could be done same
*transparent* way, that was suggested
by Mr. Douglas Gregor some time ago (strangely couldn't find pointers
to this message now).

I'm still working on that one. Inline namespace should be pretty easy once
I've finished that work into transparent DeclContexts.

Great!

(Oh, and I'm just "Doug" <g>)

I'll try to remember. :slight_smile:

Some comments on your patch:

@@ -225,6 +227,9 @@ Parser::DeclTy *Parser::ParseDeclaration(unsigned
Context) {
    return ParseTemplateDeclaration(Context);
  case tok::kw_namespace:
    return ParseNamespace(Context);
+ case tok::kw_using:
+ if (isCXXUsingDirective()) return ParseUsingDirective(Context);
+ else return ParseUsingDeclaration(Context); //FIXME: it is just stub
  default:
    return ParseSimpleDeclaration(Context);

Instead of having the isCXXUsingDirective---which requires lookahead---could
you just have a ParseUsingDirectiveOrDeclaration function that eats the
"using" and then decides whether it goes on to parse a using directive or
using declaration? It would be slightly more efficient.

Yes, I initially had ParseUsingDirectiveOrDeclaration during evolution.
Later had short, failed attempt implementing c++0x:
using-directive:
      attribute-specifier[opt] using namespace ::[opt]
nested-name-specifier[opt] namespace-name ;

requiring tentative parsing, and forgot change it back. Fixed now in
attached version.

+/// ParseNamespaceName - Parse c++ namespace-name, returns null and issues
+/// diagnostics if current token is not identifier or it is not namespace
+/// name. Otherwise returns found namespace declaration.
+Parser::DeclTy *Parser::ParseNamespaceName(const CXXScopeSpec &SS) {

I think ParseNamespaceName is trying to do more work than it should. It's
calling isNamespaceName to check whether the identifier is a namespace,
effectively performing some of the semantic analysis for using directives.
However, in this case the parse won't parse differently dependent on whether
a name is a namespace-name, class-name, or something else. So, I suggest
just parsing the identifier as an identifier (in ParseUsingDirective) and
then passing the identifier and its source location directly to
Actions.ActOnUsingDirective. Then, semantic analysis can lookup the
identifier in the scope of 'SS', complain if it's not a namespace-name, and
do whatever other AST-building and semantic-analysis tasks it needs to. That
would probably mean that we would no longer need isNamespaceName (!).

Done, indeed, fixes -parse-noop.

In the new test (cxx-using-directive.cpp), I think we also need to test for
something where '::' prefixes the nested-name-specifier, e.g.,

       using namespace ::D::B;

(I tried it, and it works fine, but this will make sure it continues to
work.)

Done, and few more.

I think that with the change in the way ActOnUsingDirective works (and
removal of isNamespaceName, if that all works out), this will be ready to go
in. Thanks for working on this!

       - Doug

I made additional change, now ActOnUsingDirective is not called when
we fail parse CXXScopeSpec or identifier. Is this ok, or we should
create invalid decl in such cases?

Piotr

Initial-Parser-support-for-C-using-directive_v2.patch (16.6 KB)

I have changes fixing this leakages by delete'ing in my local branch.
Maybe just use llvm::OwningPtr<AttributeList> instead plain C pointers
in ActOn*.
This would enforce taking ownership by Action. If this is ok, please
let me know and I will submit this change.

Piotr

Piotr Rak wrote:

Yes, I initially had ParseUsingDirectiveOrDeclaration during evolution.
Later had short, failed attempt implementing c++0x:
using-directive:
     attribute-specifier[opt] using namespace ::[opt]
nested-name-specifier[opt] namespace-name ;

requiring tentative parsing, and forgot change it back. Fixed now in
attached version.

Thanks.

I made additional change, now ActOnUsingDirective is not called when
we fail parse CXXScopeSpec or identifier. Is this ok, or we should
create invalid decl in such cases?

This is okay. If the parsing failed and we've emitted an error already, there's no sense is calling into the Action module. If, for example, we had already creating the decl and then found a parse error, we'd want to call into the Action module to tell it that the decl is now invalid.

+ if (!(getLangOptions().CPlusPlus || getLangOptions().CPlusPlus0x)
+ && NamespaceNameOnly) {

You only need to check for getLangOptions().CPlusPlus here... we won't ever have CPlusPlus0x without CPlusPlus. Typically, the "if" condition should be folded into the assertion (so it all goes away in release mode), but in this case I don't think the assertion itself is all that important.

+ } else {
+ Diag(IdentLoc, diag::err_expected_namespace_name);
+ // FIXME: if SS.isSet() we could note about NamespcName not being
+ // member of DC, or not namespace name.
+ }

Instead of mentioning DC (which means trying to format it into a readable name in the error message), I suggest just underlying the range with, e.g.,

  Diag(IdentLoc, diag::err_expected_namespace_name) << SS.getSourceRange();

Anyway, these are minor things. If you would send the updated cxx-using-directive.cpp test case you mentioned, I'll be happy to integrate your patch into the Clang repository. Thanks!

  - Doug

+ } else {
+ Diag(IdentLoc, diag::err_expected_namespace_name);
+ // FIXME: if SS.isSet() we could note about NamespcName not being
+ // member of DC, or not namespace name.
+ }

Instead of mentioning DC (which means trying to format it into a readable
name in the error message), I suggest just underlying the range with, e.g.,

       Diag(IdentLoc, diag::err_expected_namespace_name) <<
SS.getSourceRange();

I meant additional giving note something like: 'C is not member of D::B'

Anyway, these are minor things. If you would send the updated
cxx-using-directive.cpp test case you mentioned, I'll be happy to integrate
your patch into the Clang repository. Thanks!

       - Doug

Sorry, about that, attached now!

Piotr

cxx-using-directive.cpp (768 Bytes)

+ } else {
+ Diag(IdentLoc, diag::err_expected_namespace_name);
+ // FIXME: if SS.isSet() we could note about NamespcName not being
+ // member of DC, or not namespace name.
+ }

Instead of mentioning DC (which means trying to format it into a readable
name in the error message), I suggest just underlying the range with, e.g.,

      Diag(IdentLoc, diag::err_expected_namespace_name) <<
SS.getSourceRange();

I meant additional giving note something like: 'C is not member of D::B'

We could do that, too, although we don't yet have the ability to print a CXXScopeSpec (this isn't hard to add).

Anyway, these are minor things. If you would send the updated
cxx-using-directive.cpp test case you mentioned, I'll be happy to integrate
your patch into the Clang repository. Thanks!

      - Doug

Sorry, about that, attached now!

Okay, committed. Thanks!

  - Doug