[Patch] Refactor Sema::CppLookupName

Hi,

Attached refactors Sema::CppNameLookup in way, we do not reapeat
looking into same DeclContexts, also looks now it looks less chaotic
IMO :slight_smile:
It also fixes MergeLookupResults. My recent patch caused it fail in
same cases, which is noticable with CppNameLookup change.

Piotr

0001-Refactor-CppLookupName.patch (13.9 KB)

This doesn't look right:

+ if (HasFuncs) {
+ // Filter out non-FunctionDecls from result set.
+ for (DeclsSetTy::iterator FD=DI; FD!=DEnd; FD++)
+ if (!isa<FunctionDecl>(*FD)) FoundDecls.erase(*FD);
+ FoundLen = FoundDecls.size();
+ }

It's not that when we have functions we ignore everything else. Rather, object names (for functions, enumerators, variables, etc.) hide tag names (structs, enums, etc.), but only if those names themselves aren't ambiguous. I added this to the end of using-directive.cpp to test my theory:

namespace Ni {
   int i();
}

namespace NiTest {
   using namespace A;
   using namespace Ni;

   int test() {
     return i; // expected-error{{ambiguous}}
   }
}

and, rather than producing an ambiguity, it complains that it can't return an int(void) from a function whose return type is "int". I suspect that A::i is getting thrown out by the loop mentioned above.

I really think that FoundOverloaded, and the optimization it's attempting to do, is overly complicating MergeLookupResults. I suggest that the FoundOverloaded optimization be removed for now, until we have the semantics solidly implemented. Every Decl that we see (either due to LResult::Found or LResult::FoundOverloaded) gets dumped into FoundDecls. For TagDecls, get the canonical tag decl before putting the decl into FoundDecls.

At the end of the main loop in MergeLookupResults, FoundDecls will contain all of the non-unique decls. We can go through those keeping track of what kinds of names we see: object names, function names, and tag names. Then, determine whether they conflict or one hides the other, generating an ambiguous lookup if they conflict and hiding the tag names if they don't conflict.

  - Doug

Hi Doug,

Attached refactors Sema::CppNameLookup in way, we do not reapeat
looking into same DeclContexts, also looks now it looks less chaotic
IMO :slight_smile:
It also fixes MergeLookupResults. My recent patch caused it fail in
same cases, which is noticable with CppNameLookup change.

This doesn't look right:

+ if (HasFuncs) {
+ // Filter out non-FunctionDecls from result set.
+ for (DeclsSetTy::iterator FD=DI; FD!=DEnd; FD++)
+ if (!isa<FunctionDecl>(*FD)) FoundDecls.erase(*FD);
+ FoundLen = FoundDecls.size();
+ }

It's not that when we have functions we ignore everything else. Rather,
object names (for functions, enumerators, variables, etc.) hide tag names
(structs, enums, etc.), but only if those names themselves aren't ambiguous.
I added this to the end of using-directive.cpp to test my theory:

namespace Ni {
int i();
}

namespace NiTest {
using namespace A;
using namespace Ni;

int test() {
   return i; // expected-error{{ambiguous}}
}
}

and, rather than producing an ambiguity, it complains that it can't return
an int(void) from a function whose return type is "int". I suspect that A::i
is getting thrown out by the loop mentioned above.

Indeed, thanks for pointing that out! Attached should fix that.

I really think that FoundOverloaded, and the optimization it's attempting to
do, is overly complicating MergeLookupResults. I suggest that the
FoundOverloaded optimization be removed for now, until we have the semantics
solidly implemented. Every Decl that we see (either due to LResult::Found or
LResult::FoundOverloaded) gets dumped into FoundDecls. For TagDecls, get the
canonical tag decl before putting the decl into FoundDecls.

It was not optimization any longer, since we started to have access
Decls by LookupResult::begin/end without, but I failed to notice that.
This version yields one OverloadFunctionDecl allocation too.

Current approach (LookupResult::OverloadedDeclSingleDecl), still has
one flaw, unfortunately consumers of LookupResult using new interface
(begin/end) to access overloads, need to check if we store
OverloadedFunctionDecl or iterators, to free memory. I don't see any
obvious solution for that now, I will keep that in mind.

At the end of the main loop in MergeLookupResults, FoundDecls will contain
all of the non-unique decls. We can go through those keeping track of what
kinds of names we see: object names, function names, and tag names. Then,
determine whether they conflict or one hides the other, generating an
ambiguous lookup if they conflict and hiding the tag names if they don't
conflict.

Yes, this is much cleaner way. I hope, I got it right this time.

Piotr

0001-Refactor-CppLookupName.v2.patch (20.5 KB)

Hi Piotr,

Hi Doug,

Hi Piotr,

Current approach (LookupResult::OverloadedDeclSingleDecl), still has
one flaw, unfortunately consumers of LookupResult using new interface
(begin/end) to access overloads, need to check if we store
OverloadedFunctionDecl or iterators, to free memory. I don't see any
obvious solution for that now, I will keep that in mind.

Okay. We can add a Destroy method that knows how to deallocate LookupResults
data structures. That would also be useful to deallocate base paths and
such.

I'll do that, but we will have to be extra careful to avoid leaks anyway.
Could we just add destructor only for debug mode, something like:

#ifndef NDEBUG
LookupResult::~LookupResult() {
  assert(<did_cleanup>);
}
#endif

+
+ // No name lookup results, return early.
+ if (I == End) return LResult::CreateLookupResult(Context, 0);
+

We could also return early if there's only one lookup result, right?

Indeed, I added this to my local copy (where I have basicly
implemented [namespace.qual]).
Unfortunately Sema::ActOnFunctionDeclarator() does pretty wired stuff,
things like:

namespace A {}

void A::f() {}

compiles silently with trunk, and few other funny things..., so I must
first straighten this out. I think I already know how to fix it.

Same bug, makes this fail:

namespace X {
  void foo();
}

namespace Y {
  using namespace X;
}

void Y::foo() {} //expected-error{{no member}}

I've attached my changes, but those are not ready to commit.

At the end of the main loop in MergeLookupResults, FoundDecls will
contain
all of the non-unique decls. We can go through those keeping track of
what
kinds of names we see: object names, function names, and tag names. Then,
determine whether they conflict or one hides the other, generating an
ambiguous lookup if they conflict and hiding the tag names if they don't
conflict.

Yes, this is much cleaner way. I hope, I got it right this time.

It's looking a lot better! I've committed it with some tweaks of my own,
which I wanted to comment on:

+ if (FoundDecls.size() == 1) {
+ // 1) Exactly one result.
+ } else if (FoundDecls.size() - TagNames == 1) {
+ // 2) Ordinary name hides tag.
+ NameHidesTags = true;

This means that an unambiguous object name (or set of function names) will
hide tag names, even if the tag names result in an ambiguity. However, we
need to report an ambiguity in this case. I've tweaked the code by adding an
"else if (TagNames > 1) Ambiguous = true;" between cases 1 and 2, along with
this test:

namespace OneTag {
struct X; // expected-note{{candidate found by name lookup is 'OneTag::X'}}
}

namespace OneFunction {
void X(); // expected-note{{candidate found by name lookup is
'OneFunction::X'}}
}

namespace TwoTag {
struct X; // expected-note{{candidate found by name lookup is 'TwoTag::X'}}
}

namespace FuncHidesTagAmbiguity {
using namespace OneTag;
using namespace OneFunction;
using namespace TwoTag;

void test() {
   (void)X(); // expected-error{{reference to 'X' is ambiguous}}
}
}

Right, I fail again.

Interesting, this test uncovered an unfortunate issue. Since we added code
to report ambiguous type-name lookups in Sema::getTypeName, we actually get
*two* sets of ambiguity diagnostics regarding

       X();

Do we want 'clang -verify' warn when such things happen? Now in such
case tests will silently passes.

The first diagnostic comes when we check whether 'X' is a type name (using
Sema::getTypeName, to see if this is a C++ functional cast), and then later
we check whether we can build an expression for 'X' (using
Sema::ActOnIdentifierExpr), and both of those issue the diagnostic. Ideas
welcome!

I will look into it.

Moving along with the patch...

+ if (NameHidesTags) {
+ // 2, 3): Remove tag-names hidden by ordinary-names.
+ DeclsSetTy::iterator DI = FoundDecls.begin(), DEnd =
FoundDecls.end();
+ for (DI = FoundDecls.begin(); (DI != DEnd) && (TagNames > 0); ++DI) {
+ if (isa<TagDecl>(*DI)) {
+ FoundDecls.erase(*DI);
+ --TagNames;
+ }
+ }
+ assert((TagNames == 0) && "Failed remove all TagDecls?");

It's almost never safe to erase something from a container when looping over
the container, and SmallPtrSet doesn't even provide an interface to make
this safe.

True, it derefences already invalid iterator.

That said, since NameHidesTags should only be true when there is
a single tag name left, so in the version I committed we keep track of that
one tag decl and remove it from FoundDecls. No more loop!

Great! Thanks for review and fixing my failures, again.

The commit is here:

http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20090202/011825.html

       - Doug

Piotr

namespace.qual.patch (11.5 KB)

Hi Piotr,

Current approach (LookupResult::OverloadedDeclSingleDecl), still has
one flaw, unfortunately consumers of LookupResult using new interface
(begin/end) to access overloads, need to check if we store
OverloadedFunctionDecl or iterators, to free memory. I don't see any
obvious solution for that now, I will keep that in mind.

Okay. We can add a Destroy method that knows how to deallocate LookupResults
data structures. That would also be useful to deallocate base paths and
such.

I'll do that, but we will have to be extra careful to avoid leaks anyway.
Could we just add destructor only for debug mode, something like:

#ifndef NDEBUG
LookupResult::~LookupResult() {
assert(<did_cleanup>);
}
#endif

Yes, we can do that. The trick is to keep LookupResult a POD, because everything related to name lookup is performance-critical.

+
+ // No name lookup results, return early.
+ if (I == End) return LResult::CreateLookupResult(Context, 0);
+

We could also return early if there's only one lookup result, right?

Indeed, I added this to my local copy (where I have basicly
implemented [namespace.qual]).
Unfortunately Sema::ActOnFunctionDeclarator() does pretty wired stuff,
things like:

namespace A {}

void A::f() {}

compiles silently with trunk, and few other funny things..., so I must
first straighten this out. I think I already know how to fix it.

Yikes! This is probably just a misplaced/missing diagnostic. I'll look into it.

Interesting, this test uncovered an unfortunate issue. Since we added code
to report ambiguous type-name lookups in Sema::getTypeName, we actually get
*two* sets of ambiguity diagnostics regarding

      X();

Do we want 'clang -verify' warn when such things happen? Now in such
case tests will silently passes.

It would be helpful to teach -verify to do that, yes.

The first diagnostic comes when we check whether 'X' is a type name (using
Sema::getTypeName, to see if this is a C++ functional cast), and then later
we check whether we can build an expression for 'X' (using
Sema::ActOnIdentifierExpr), and both of those issue the diagnostic. Ideas
welcome!

I will look into it.

Thanks!

Looking forward to qualified name lookup into namespaces with using directives :slight_smile:

  - Doug

Douglas Gregor wrote:

Hi Piotr,

Interesting, this test uncovered an unfortunate issue. Since we
added code
to report ambiguous type-name lookups in Sema::getTypeName, we
actually get
*two* sets of ambiguity diagnostics regarding

      X();

Do we want 'clang -verify' warn when such things happen? Now in such
case tests will silently passes.
    
It would be helpful to teach -verify to do that, yes.

How about something like

// expected-diag {{...}} -> allow the diagnostic 1+ times (in order to
not mess up current test cases)
// expected-diag 2 {{...}}} -> demand the diagnostic exactly twice

Sebastian

Great idea!

  - Doug

Hi Doug,

Hi Piotr,

Current approach (LookupResult::OverloadedDeclSingleDecl), still has
one flaw, unfortunately consumers of LookupResult using new interface
(begin/end) to access overloads, need to check if we store
OverloadedFunctionDecl or iterators, to free memory. I don't see any
obvious solution for that now, I will keep that in mind.

Okay. We can add a Destroy method that knows how to deallocate
LookupResults
data structures. That would also be useful to deallocate base paths and
such.

I'll do that, but we will have to be extra careful to avoid leaks anyway.
Could we just add destructor only for debug mode, something like:

#ifndef NDEBUG
LookupResult::~LookupResult() {
assert(<did_cleanup>);
}
#endif

Yes, we can do that. The trick is to keep LookupResult a POD, because
everything related to name lookup is performance-critical.

Ok, will do that.

+
+ // No name lookup results, return early.
+ if (I == End) return LResult::CreateLookupResult(Context, 0);
+

We could also return early if there's only one lookup result, right?

Indeed, I added this to my local copy (where I have basicly
implemented [namespace.qual]).
Unfortunately Sema::ActOnFunctionDeclarator() does pretty wired stuff,
things like:

namespace A {}

void A::f() {}

compiles silently with trunk, and few other funny things..., so I must
first straighten this out. I think I already know how to fix it.

Yikes! This is probably just a misplaced/missing diagnostic. I'll look into
it.

Thanks, for fixing it! Looks nicer than my initial attempt :slight_smile:

Looking forward to qualified name lookup into namespaces with using
directives :slight_smile:

       - Doug

Hopefully shouldn't be that far now :slight_smile:

Piotr

Douglas Gregor wrote:

How about something like

// expected-diag {{...}} -> allow the diagnostic 1+ times (in order to
not mess up current test cases)
// expected-diag 2 {{...}}} -> demand the diagnostic exactly twice

Great idea!

OK, I'll take a stab at it over the weekend.

Sebastian

I afraid that buys us nothing, or not much.
We would not notice such regressions, if we would not manualy change most of:

// expected-diag {{...}}
to:
// expected-diag 1 {{...}}

Later we would have to write most ensure we do that.

I would rather suggest:

// expected-diag {{...}} -> allow the diagnostic exactly 1 time

// expected-diag 2 {{...}}} -> demand the diagnostic exactly twice

Audit all tests that it breaks, if it is correct, if not mark with
FIXME or fix.
What do you think?

Piotr