Beginnings of C++ access control

Hello,

I've been working on implementing C++ access control in the Sema stage, and
now have a very preliminary patch (hopefully attached, otherwise inline in a
followup -- still not sure if the list supports attachments).

At the moment it's limited to explicit 'obj.mem' lookups, but I thought I'd
send it here to make sure I'm on roughly the right track before fiddling with
every single possible use. It should be reasonably easy to extent to other
contexts.

Here are a few things I'd particularly welcome comments on:

The beginning of AccessMemberFromRecord (importing anonymous struct/union defs
into the outer class) feels like it ought to be reinventing the wheel, is
there some handy function I should be using instead? If there isn't a reusable
version it should probably be involved in private resolution too in
CheckMemberAccess. I've held off putting it there for now.

I've reused the access specifier AS_none to denote an inaccessible member.
Would people prefer me to invent a new enum for this situation?

Tim.

access_first.diff (12.6 KB)

Tim Northover wrote:

Hello,

I've been working on implementing C++ access control in the Sema stage, and
now have a very preliminary patch (hopefully attached, otherwise inline in a
followup -- still not sure if the list supports attachments).
  

Hi,

Welcome to Clang. It's always nice to see new people join.
Your patch came through fine. Comments are below. One general comment:
please convert all tabs to spaces.

Index: test/SemaCXX/access.cpp

--- test/SemaCXX/access.cpp (revision 69439)
+++ test/SemaCXX/access.cpp (working copy)
  
+ x = bprota.x; // expected-error {{member 'A::x' is protected}}
+ x = bprota.y; // expected-error {{member 'A::y' is protected}}
+ x = bprota.z; // expected-error {{member 'A::z' is inaccessible}}
+
+ x = bpriva.x; // expected-error {{member 'A::x' is private}}
+ x = bpriva.y; // expected-error {{member 'A::y' is private}}
+ x = bpriva.z; // expected-error {{member 'A::z' is inaccessible}}
  

The distinction between private/protected and inaccessible is confusing.
When is which used, and can we get rid of 'inaccessible'?
After looking at the rest of the code, I can see where the problem with
that is. But I think this suggests a problem with the overall structure
of the code. I think your checker functions should be able to report the
class where the member was last accessible, so that we can formulate an
error like, "member A::y is private in 'class B'". It would be even
nicer if the message could mention that this is because B inherits
privately from A.

+ x = bprota.x; // expected-error {{member 'A::x' is protected}}
+ x = bprota.y; // expected-error {{member 'A::y' is protected}}
+ x = bprota.z; // expected-error {{member 'A::z' is inaccessible}}
+
+ x = bpriva.x; // expected-error {{member 'A::x' is private}}
+ x = bpriva.y; // expected-error {{member 'A::y' is private}}
+ x = bpriva.z; // expected-error {{member 'A::z' is inaccessible}}
  
+/// Finds out whether member is a public/private/protected part of the accessor.
+AccessSpecifier
+Sema::AccessMemberFromRecord(NamedDecl *MemberDecl,
+ const RecordType *AccessorType) {
  
I don't like the name of this function; it confuses me. Is AccessorType
the context of the function that accesses the field, or is it the type
that's supposed to contain the field? The first is suggested by the
name, but the second by the comment.
OK, looking at the code below, I can see it's the type through which the
field is accessed. I think this could be clearer. Call the function
"AccessMemberThroughRecord(NamedDecl *Member, const RecordType *Container)"

+
+ RecordDecl *BaseRecord = 0;
+ if (MemberDecl->getDeclContext()->isRecord())
+ BaseRecord = cast<RecordDecl>(MemberDecl->getDeclContext());
  
RecordDecl *BaseRecord = dyn_cast<RecordDecl>(MemberDecl->getDeclContext());

+ if (!BaseRecord || BaseRecord->isAnonymousStructOrUnion()) {
+ // Either not a record: it's a union or it's anonymous.
  
This is not a good comment. It explains the what of the condition above,
which is not very helpful (I can read code), but it doesn't explain the
why, which would be.

+ for (DeclContext *DC = MemberDecl->getDeclContext();
+ DC;
+ DC = DC->getParent()) {
  
Minor stylistic issue: we usually would keep condition and incrementer
on the same line if they fit. (That is, merge the second and third line
of this snippet.)

+ AccessSpecifier Perm = PropagateMemberAccess(IntrinsicPerm, Path);
  
This usage of PropagateMemberAccess suggests that the name isn't ideal
either. Perhaps something like "GetAccessThroughPath" would be more
appropriate.

+ if (BestPerm == AS_none)
+ BestPerm = Perm;
+ else if (Perm != AS_none && Perm < BestPerm)
+ BestPerm = Perm;
  
Merge the two.

+ // [class.access.base]:p5
+ // A member m is accessible at the point R when named in class N if:
  
Please observe the standard quotation style; there's an automated tool
that collects the quotes. The style is:

std sectionparagraph

e.g.
C++ [class.access.base]p5
or
C++ 5.16p4

All in all, this is pretty good work. Thanks for tackling this beast.

Sebastian

Tim Northover wrote:

  

Tim Northover wrote:
    

+ // [class.access.base]:p5
+ // A member m is accessible at the point R when named in class N if:
      

Please observe the standard quotation style; there's an automated tool
that collects the quotes. The style is:
    
I didn't realise there was an automated tool involved there, I'd assumed it
was more informal. So should what follows be a direct quotation from the
standard, or is paraphrasing preferred?
  

If you paraphrase, it's not a quote. So if you make a quote header,
quote exactly.
I paraphrase unnecessarily wordy parts sometimes, but if you do that, be
sure to wrap that part in [], e.g.
// The frooble has [congualized] members of type snook.

And presumably I shouldn't split up the standard comments after that without
including the header in each subpart?
  

Eh, I do that all the time. I don't think the tool extracts the actual
quote.

Anyway, onto the main question I have for now. The standard
([class.access.base]p5) says that a member m is "accessible at point R if
named in class N" if (amongst other options):

"m as a member of N is protected, and R occurs in a member or friend of class
N, or in a member or friend of a class P derived from N, where m as a member
of P is public, private, or protected".

I interpret that as saying the following is valid:
class N {
protected:
    int m;
};

class Y : private N {};

class P : public N {
    class B : public Y {
        int f(N x) {
            return x.x;
        }
    };
};

but gcc disagrees:
scope.cpp:1: error: ‘class N’ is inaccessible
scope.cpp:10: error: within this context
scope.cpp: In member function ‘int P::b::f(N)’:
scope.cpp:11: error: ‘class N’ has no member named ‘x’

Am I wrong in my interpretation of that particular paragraph; or is that
actually forbidden on other grounds (perhaps ambiguity resolution?); or is gcc
wrong?
  

Comeau agrees with GCC, so I'd say they're right. This is because the N
that is found would be the N within N that Y inherited as private.

That is, in every class, there exists an alias of that class by the same
name. This alias is inherited and underlies normal access restrictions.
In other words, Y contains an N that is private, since it was inherited
from the private base N. B then inherits this N, but cannot access it.
The name lookup in P::b::f finds this N, instead of the N that P
inherited, because P::B is the inner scope and P the outer scope.

Indeed, if I change f(N x) to f(P::N x) (access the N that P inherited)
or f(::N x) (access the global, original N directly), Comeau accepts
this part. It still doesn't accept accessing x.m, which is protected and
you don't access it through P, much less x.x, which doesn't exist.

The naming class 'N' for the N in the signature of f is B, and the
member 'm' actually found is P::N, which is inaccessible in B.

By the way, please use Reply All to reply to cfe-dev messages. Your
message didn't go to the list.

Sebastian

I didn’t realise there was an automated tool involved there, I’d assumed it
was more informal. So should what follows be a direct quotation from the
standard, or is paraphrasing preferred?

The tool is pretty stupid, it just scan for things that look like
// <dotted notation | C++ name>
or so. You can find the tool in utils/FindSpecRefs if you want the real details, and the output is at
http://t1.minormatter.com/~ddunbar/references.html
(which is linked from the menu bar on clang.llvm.org)

And presumably I shouldn’t split up the standard comments after that without

including the header in each subpart?

Eh, I do that all the time. I don’t think the tool extracts the actual
quote.

Correct. “Easy” to fix once our C++ support is done. :slight_smile:

  • Daniel

I seem to have found similar in an even simpler example where hiding private
bases doesn't come up:

class N {
protected:
        int m;
};

class P : public N {
  int getX(N n) {
    return n.m;
  }
};

g++ (and comeau) complain that m is protected here, but I just can't see how
this can fail the condition:

"— m as a member of N is protected, and R occurs [...] or in a member or
friend of a class P derived from N, where m as a member of P is public,
private, or protected, or"

I can see why it'd make sense to make some demand that N be a
pointer/reference that's actually convertible to a P, but I can't see where
the standard actually does so.

Anyone got any thoughts?

I've put the current version of what I'm working on up at
http://www.maths.ed.ac.uk/~s0677366/access_first.diff
instead of spamming the list with it in case anyone wants to see.

There's still a few more comments to improve, and this issue to get sorted.
And I've got various tests scattered around my directory that should be
integrated.

The bulk of CheckMemberAccess is probably untested too; I haven't come up with
a way of getting there without friends which aren't implemented yet. I view it
more as a scaffold that should be close to doing what I want when I do put
friends in.

Tim.

Keep reading the standard... Try all of 11.5 Protected member access, paying special attention to all the examples, then work back to the wording that gives rise to the examples.

Ah, I see! Thankyou. I hadn't even looked at the titles of the section beyond
what I was implementing. Couldn't see past the end of my nose.

Tim.

That's just the start. After that, you have to read:

   11 Member access control [class.access]

the chapter; it has a few more important details. :slight_smile:

I've now got the access control code to the point where I think it implements
a reasonably self-contained section of the standard, and I'm wondering what I
should do next.

At the moment the code is almost certainly too stringent -- it will disallow
valid C++ programs where that wouldn't happen before. However getting it to
the stage where the only change is to disallow invalid programs would lead to
a much larger patch (the main issue would be dealing with friends everywhere
needed) which I understand is discouraged.

The current patch is at http://www.maths.ed.ac.uk/~s0677366/access_first.diff
again.

So essentially I'm willing to do whatever people feel is best:
+ Work on problems with what I've got now.
+ Extend it to do what it does more correctly before submitting.
+ Extend it to do more things at the correctness it does now before
  submitting.
+ All of the above.

Tim.

Sorry for taking so long to respond.

Tim Northover wrote:

  

Tim Northover wrote:
    

Am I wrong in my interpretation of that particular paragraph; or is that
actually forbidden on other grounds (perhaps ambiguity resolution?); or
is gcc wrong?
      

Comeau agrees with GCC, so I'd say they're right. This is because the N
that is found would be the N within N that Y inherited as private.
    
I seem to have found similar in an even simpler example where hiding private
bases doesn't come up:

class N {
protected:
        int m;
};

class P : public N {
  int getX(N n) {
    return n.m;
  }
};

g++ (and comeau) complain that m is protected here, but I just can't see how
this can fail the condition:

"— m as a member of N is protected, and R occurs [...] or in a member or
friend of a class P derived from N, where m as a member of P is public,
private, or protected, or"

I can see why it'd make sense to make some demand that N be a
pointer/reference that's actually convertible to a P, but I can't see where
the standard actually does so.

Anyone got any thoughts?
  

Practically speaking, we have every compiler doing it one way, it makes
sense to do it this way, and the standard isn't explicit about it, so
obviously we should do it the same way as everyone else. The example
typically brought up is this:

class N {
protected:
        int m;
};

class O : public N {
};

class P : public N {
  int getX(O o) {
    return o.m; // P has no business messing with O's internals.
  }
};

So, it makes sense. I think the justification from the standard point of
view is in this sentence fragment: "where m as a member of P is public,
private, or protected". n.m (or o.m) here is not a member of P. If n was
a reference to N, then it could perhaps be one, but we can't prove it.
So it makes no sense to speak of m as a member of P, so the condition is
false.

Not a very good justification, I have to admit. You should ask on
comp.std.c++ for clarification.

I'll look at the patch in your later post.

Sebastian

Sebastian Redl wrote:

I think the justification from the standard point of
view is in this sentence fragment: "where m as a member of P is public,
private, or protected". n.m (or o.m) here is not a member of P. If n was
a reference to N, then it could perhaps be one, but we can't prove it.
So it makes no sense to speak of m as a member of P, so the condition is
false.

Not a very good justification, I have to admit. You should ask on
comp.std.c++ for clarification.
  

OK, actually I've found what we were looking for. The test case is
actually forbidden by 11.5p1 [class.protected]. It details access to
protected members. In particular, this:

"All other accesses involve a (possibly implicit) object expression. In
this case, the class of the object expression shall be C or a class
derived from C."
where C is the class in which the access occurs.

Sebastian

Sorry for taking so long to respond.

No worries, I appreciate the effort.

So, it makes sense. I think the justification from the standard point of
view is in this sentence fragment: "where m as a member of P is public,
private, or protected". n.m (or o.m) here is not a member of P. If n was
a reference to N, then it could perhaps be one, but we can't prove it.
So it makes no sense to speak of m as a member of P, so the condition is
false.

I think the separate(ish) section Daniel pointed out ([class.protected]) takes
care of it nicely. There are four cases for "m as a member of N":
public: no problem
private: enclosing scope has to /be/ N, so no problem.
protected: [class.protected] demands the sensible check.
none: effectively becomes a question of whether we can cast some objects to
make things work. Reduces to one of the above cases.

You seem to be suggesting it's a reasonably well-known issue, so perhaps
that's a new addition (I don't actually have C++98 itself, just one of the
drafts for a later version) to clarify things.

I'll look at the patch in your later post.

Thanks again for the time.

Tim.

Tim Northover wrote:

I've now got the access control code to the point where I think it implements
a reasonably self-contained section of the standard, and I'm wondering what I
should do next.

At the moment the code is almost certainly too stringent -- it will disallow
valid C++ programs where that wouldn't happen before. However getting it to
the stage where the only change is to disallow invalid programs would lead to
a much larger patch (the main issue would be dealing with friends everywhere
needed) which I understand is discouraged.
  

We don't even parse friends yet. So if the only valid programs rejected
are those that involve friends, that's fine. However, if the patch makes
Clang reject valid programs that don't involve friends, that's a
problem. Of course, if those programs are very obscure, you can still
commit.

The current patch is at http://www.maths.ed.ac.uk/~s0677366/access_first.diff
again.
  

Detailed review below.

So essentially I'm willing to do whatever people feel is best:
+ Work on problems with what I've got now.
+ Extend it to do what it does more correctly before submitting.
  

I'm not sure where the difference between the two is. Anyway, I think
you should bring it to the point where it doesn't reject any valid
programs and rejects all invalid programs, under the assumption that
friends don't exist. Then implement friends.

In particular, be sure to handle this:

class C {
  typedef int I;
  I f();
};
C::I f() { return 0; }

Index: lib/Sema/SemaAccess.cpp

--- lib/Sema/SemaAccess.cpp (revision 69821)
+++ lib/Sema/SemaAccess.cpp (working copy)
  
+ // C++ [class.access.base]p1
+ // Describes how access specifiers propagate through a heirarchy.
+ BasePath::const_reverse_iterator i = Path.rbegin(), e = Path.rend();
+ for(; i != e; ++i) {
  

A typo in "heirarchy", and put the iterator declarations into the for.
Yes, it results in a very long for condition, but that's the way we have
it throughout the code base.

+ if (BestPerm.Perm == AS_none
+ || (FoundPerm.Perm != AS_none && FoundPerm.Perm < BestPerm.Perm))
+ BestPerm = FoundPerm;
  
I wonder if any ill effects would come from simply reordering the AS_*
constants to none,private,protected,public and writing this (and
probably other places) as a single comparison.

+/// Determines whether a given context has privileged access to the members of a
+/// container.
  

This is not what the function does. What it does is, it looks up a
context that *has* such access in the context stack.

+ for(DeclContext *OuterContext = InnerContext;
+ OuterContext; OuterContext = OuterContext->getParent()) {
+ if (!OuterContext->isRecord())
+ continue;
  

Would it be worth it to abort the loop on encountering a namespace? Or
are namespaces likely to be so shallow that the repeated test hurts more
than it helps? I don't think there's any way to ever find out.
Intuitively, I'd say no, though. The assumption here is that most
programs are valid, so a privileged context is found before reaching
namespace level.

+bool Sema::CheckMemberAccessLimited(const RecordType *Container,
+ NamedDecl *MemberDecl,
+ DeclContext *ExprContext,
+ AccessResult &Permission) {
  

Is there any particular reason why MemberDecl and ExprContext are
non-const here? There's actually a lot of pointers in your patch that, I
think, could be made to-const.

Keep up the good work.

Sebastian