[PATCH] PCH Support for C++ decls

All,

Since I noticed Doug commit PCH support for NamespaceDecl’s the other day (week?), I thought I’d send an update to my last PCH patch. I’ve been sitting on this work for quite some time while I’ve been busy with other things.

The attached implements serialization for

  • namespaces, using directives, namespace aliases, using (and shadow) declarations
  • classes
  • methods, constructors, destructors, conversion functions
  • and linkage specifications
    There’s some support for serializing friend declarations, but only if it’s not a friend type. Fixing that should be relatively easy. There is also some work on serializing templates and dependent types, but it’s asserted out for now. I integrated Doug’s implementation of VisitNamespaceDecl (I wasn’t serializing the AnonymousNamespace) into this patch.

The patch adds a large number of getters and setters to AST declarations since the PCH code has to construct them piecemeal. I’ve had to expose some new functionality for creating CXXRecord::DefinitionData objects (see createDefinition() and shareDefinition() in the attached patch). I think it’s kind of nice. I also renamed one or two existing methods to make them have similar names as other functions–I forget which methods, though. These changes have been propagated to Sema.

I had to add new Create() functions for CXXMethod and its derivations since the normal Create() function asserts when you try to create these objects without names. As per a previous suggestion, these take an “EmptyShell” type, which is added as a public empty class in Decl (should it be private?)

WRT to the PCH reader/writer classes, I added some new functions for reading and writing base class specifiers and nested name specifiers. I also added a ReadSourceLocation and ReadSourceRange methods to the PCHReader to make their reading consistent with other operations (and easier in some cases).

I modified PCH/namespaces test to include some of the other namespace and using decls and linkage specs. There should be a new classes test in the patch with (let’s just say… “not exhaustive”) tests for serializing classes.

Please let me know if there is anything that I can do to help get this patch applied.

Andrew Sutton
andrew.n.sutton@gmail.com

pch.patch (64.6 KB)

All,

Since I noticed Doug commit PCH support for NamespaceDecl's the other day (week?), I thought I'd send an update to my last PCH patch. I've been sitting on this work for quite some time while I've been busy with other things.

The attached implements serialization for
- namespaces, using directives, namespace aliases, using (and shadow) declarations
- classes
- methods, constructors, destructors, conversion functions
- and linkage specifications
There's some support for serializing friend declarations, but only if it's not a friend type. Fixing that should be relatively easy. There is also some work on serializing templates and dependent types, but it's asserted out for now. I integrated Doug's implementation of VisitNamespaceDecl (I wasn't serializing the AnonymousNamespace) into this patch.

Nifty.

The patch adds a large number of getters and setters to AST declarations since the PCH code has to construct them piecemeal. I've had to expose some new functionality for creating CXXRecord::DefinitionData objects (see createDefinition() and shareDefinition() in the attached patch). I think it's kind of nice.

This didn't seem necessary to me; see comments below.

I had to add new Create() functions for CXXMethod and its derivations since the normal Create() function asserts when you try to create these objects without names. As per a previous suggestion, these take an "EmptyShell" type, which is added as a public empty class in Decl (should it be private?)

It's fine as a public empty class; that's what we do for expressions.

WRT to the PCH reader/writer classes, I added some new functions for reading and writing base class specifiers and nested name specifiers. I also added a ReadSourceLocation and ReadSourceRange methods to the PCHReader to make their reading consistent with other operations (and easier in some cases).

Good, good. At some point (but not as part of this patch!) we should go replace all of the manually-deserialized source locations and ranges with uses of these functions.

I modified PCH/namespaces test to include some of the other namespace and using decls and linkage specs. There should be a new classes test in the patch with (let's just say... "not exhaustive") tests for serializing classes.

I think we really need better tests. Due to PCH's laziness, it is extremely annoying to debug when it breaks, so small tests for each kind of declaration/statement/expression are important for catching problems quickly.

Please let me know if there is anything that I can do to help get this patch applied.

Some comments:

Index: include/clang/Frontend/PCHBitCodes.h

Since I noticed Doug commit PCH support for NamespaceDecl’s the other day (week?), I thought I’d send an update to my last PCH patch. I’ve been sitting on this work for quite some time while I’ve been busy with other things.

Nifty.

I’ve spent the morning addressing issues in the patch and had some questions, comments. I’m not sending up a new patch just yet, though.

I think we really need better tests. Due to PCH’s laziness, it is extremely annoying to debug when it breaks, so small tests for each kind of declaration/statement/expression are important for catching problems quickly.

Yes… it is very annoying to debug :slight_smile:

We should keep the existing DECL_* constants in order, since each reshuffle is (unnecessary) breakage in backward-compatibility for the PCH format.

I moved all of the constants down to the bottom (and reorganized a couple).

I don’t see why we should need this. We already have startDefinition()/completeDefinition(), which takes care of allocating the DefinitionData and linking it in.

Removed. I was concerned about side effects, but after looking I don’t think there are any. I’m still having problems understanding which classes should be serialized. More below.

Please store/load fields in the PCH record in the same order as they occur in the corresponding AST; it makes it easier to verify that everything is there. In this case, BaseOfClass and Virtual should be swapped (the same goes for the reader).

I think I’ve addressed all instances that you’ve pointed out, but haven’t checked every decl yet. I wish there was an easier way to do this. Maybe I could use clang to dump the order of members…

Why const-qualify the NestedNameSpecifier? You’re just const_cast’ing it away anyway.

It’s a non-modifying operation. I was only const-casting it away because getPrefix() returns a non-const ptr. It’s probably easier just to remove the const in the function that try to make NNS more const-friendly.

Why not push all of the NestedNameSpecifiers onto a stack and then emit them in reverse order? That would make it far easier for the reader to reconstruct them properly, since the prefix will come before the nested-name-specifier itself.

This won’t work with nested-name-specifiers that have prefixes, because we see the nested-name-specifier before its prefix, and end up trying to build the nested-name-specifier backwards. Use the stack I mentioned previously to address this problem, and you can then get rid of the NNS/P dance, since the last nested-name-specifier you build will be the complete one.

For some reason, I thought that trying to write this without a stack would be a better solution. Other than that no real reason. I’ll rewrite this in the next day or two.

FWIW, I’ve typically been ordering the Visit functions the same way as the Decl nodes are ordered in DeclNodes.def.

Fair enough. Would it be worthwhile reordering all of the ObjC visitors too?

Aside: anonymous namespaces are somewhat interesting for PCH, because we need to make sure that an anonymous namespace defined in the source file gets linked to an anonymous namespace defined in the PCH file, since they are effectively the same anonymous namespace. One fun way to test this is:

// In PCH:
namespace {
class X;
}

// In source file (that includes PCH):
namespace {
class X;
}

I’ll add this to the test tomorrow.

+void PCHDeclReader::VisitCXXRecordDecl(CXXRecordDecl *D) {

  • VisitRecordDecl(D);
  • // Injected class names are, I think, actually empty.

Yes, they are. Aside from making sure to get them into the right declaration context and link up their previous declaration chains, there shouldn’t be much to serialize for them.

isInjectedClassName() is the wrong check for this. Instead, PCHDeclWriter::VisitCXXRecordDecl should serialize all of the bits in the definition when D->getDefinition() == D, i.e., the declaration you’re serializing is the definition.

This is where I’m a little lost w.r.t. class decls, defs, and injected names… Here’s my current understanding.

class C; // call this CXXRecordDecl* decl;
class C { }; // call this CXXRecordDecl* def;
// call C::C (the injected name) CXXRecordDecl *inj;

decl->getDefinition() == def
def->getDefinition() == def
decl in def->redeclarations() // pseduo python-ese
inj->getDefinition() == def // ???
decl->DefinitionData == def->DefinitionData == inj->DefinitionData // ???
decl->DefiniotinData->Definition == def // ???

And then for a slightly different case, if C isn’t defined (so def == inj == 0), then:
decl->getDefinition() == 0?

Is this about right or is there something I might be missing?

Also, w.r.t. class serialization. Should virtual bases be serialized the same way that regular bases are serialized? What b

+void PCHDeclReader::VisitCXXConstructorDecl(CXXConstructorDecl *D) {

  • VisitCXXMethodDecl(D);
  • D->setExplicitSpecified(Record[Idx++]);
  • if (D->isThisDeclarationADefinition())
  • D->setImplicitlyDefined(Record[Idx++]);
  • // FIXME: Serialize member initializers?
    +}

I suggest that the PCH reader/writer avoid cleverness, and serialize exactly what is in the AST (rather than doing the isThisDeclarationADefinition() check).

This is a problem since isImplicitlyDefined and setImplicitlyDefined will assert if isThisDeclarationADefintion returns false. Would it be an acceptable workaround to add non-asserting accessors?

Back to work…

Andrew Sutton
andrew.n.sutton@gmail.com

Since I noticed Doug commit PCH support for NamespaceDecl’s the other day (week?), I thought I’d send an update to my last PCH patch. I’ve been sitting on this work for quite some time while I’ve been busy with other things.

Nifty.

I’ve spent the morning addressing issues in the patch and had some questions, comments. I’m not sending up a new patch just yet, though.

I think we really need better tests. Due to PCH’s laziness, it is extremely annoying to debug when it breaks, so small tests for each kind of declaration/statement/expression are important for catching problems quickly.

Yes… it is very annoying to debug :slight_smile:

We should keep the existing DECL_* constants in order, since each reshuffle is (unnecessary) breakage in backward-compatibility for the PCH format.

I moved all of the constants down to the bottom (and reorganized a couple).

Thanks.

I don’t see why we should need this. We already have startDefinition()/completeDefinition(), which takes care of allocating the DefinitionData and linking it in.

Removed. I was concerned about side effects, but after looking I don’t think there are any. I’m still having problems understanding which classes should be serialized. More below.

Please store/load fields in the PCH record in the same order as they occur in the corresponding AST; it makes it easier to verify that everything is there. In this case, BaseOfClass and Virtual should be swapped (the same goes for the reader).

I think I’ve addressed all instances that you’ve pointed out, but haven’t checked every decl yet. I wish there was an easier way to do this. Maybe I could use clang to dump the order of members…

Well, Clang can clearly parse itself, which means that there is a fun (but time-consuming) answer to this.

Why const-qualify the NestedNameSpecifier? You’re just const_cast’ing it away anyway.

It’s a non-modifying operation. I was only const-casting it away because getPrefix() returns a non-const ptr. It’s probably easier just to remove the const in the function that try to make NNS more const-friendly.

A const getPrefix (which does a const_cast internally) would be fine. NestedNameSpecifiers are logically immutable anyway.

Why not push all of the NestedNameSpecifiers onto a stack and then emit them in reverse order? That would make it far easier for the reader to reconstruct them properly, since the prefix will come before the nested-name-specifier itself.

This won’t work with nested-name-specifiers that have prefixes, because we see the nested-name-specifier before its prefix, and end up trying to build the nested-name-specifier backwards. Use the stack I mentioned previously to address this problem, and you can then get rid of the NNS/P dance, since the last nested-name-specifier you build will be the complete one.

For some reason, I thought that trying to write this without a stack would be a better solution. Other than that no real reason. I’ll rewrite this in the next day or two.

SmallVector makes a good stack.

FWIW, I’ve typically been ordering the Visit functions the same way as the Decl nodes are ordered in DeclNodes.def.

Fair enough. Would it be worthwhile reordering all of the ObjC visitors too?

Not in this patch. We can tweak that later.

Aside: anonymous namespaces are somewhat interesting for PCH, because we need to make sure that an anonymous namespace defined in the source file gets linked to an anonymous namespace defined in the PCH file, since they are effectively the same anonymous namespace. One fun way to test this is:

// In PCH:
namespace {
class X;
}

// In source file (that includes PCH):
namespace {
class X;
}

I’ll add this to the test tomorrow.

+void PCHDeclReader::VisitCXXRecordDecl(CXXRecordDecl *D) {

  • VisitRecordDecl(D);
  • // Injected class names are, I think, actually empty.

Yes, they are. Aside from making sure to get them into the right declaration context and link up their previous declaration chains, there shouldn’t be much to serialize for them.

isInjectedClassName() is the wrong check for this. Instead, PCHDeclWriter::VisitCXXRecordDecl should serialize all of the bits in the definition when D->getDefinition() == D, i.e., the declaration you’re serializing is the definition.

This is where I’m a little lost w.r.t. class decls, defs, and injected names… Here’s my current understanding.

class C; // call this CXXRecordDecl* decl;
class C { }; // call this CXXRecordDecl* def;
// call C::C (the injected name) CXXRecordDecl *inj;

decl->getDefinition() == def
def->getDefinition() == def
decl in def->redeclarations() // pseduo python-ese
inj->getDefinition() == def // ???

Yes

decl->DefinitionData == def->DefinitionData == inj->DefinitionData // ???
decl->DefiniotinData->Definition == def // ???

Yes. An injected-class-name is just a redeclaration with a funny scope. And getDefinition always points at the definition.

And then for a slightly different case, if C isn’t defined (so def == inj == 0), then:
decl->getDefinition() == 0?

Yes.

Is this about right or is there something I might be missing?

I think you have it correct.

Also, w.r.t. class serialization. Should virtual bases be serialized the same way that regular bases are serialized? What b

They’re all the same. Decls are serialized as a reference to somewhere else in the PCH, so it doesn’t matter how many times we reference that decl.

+void PCHDeclReader::VisitCXXConstructorDecl(CXXConstructorDecl *D) {

  • VisitCXXMethodDecl(D);
  • D->setExplicitSpecified(Record[Idx++]);
  • if (D->isThisDeclarationADefinition())
  • D->setImplicitlyDefined(Record[Idx++]);
  • // FIXME: Serialize member initializers?
    +}

I suggest that the PCH reader/writer avoid cleverness, and serialize exactly what is in the AST (rather than doing the isThisDeclarationADefinition() check).

This is a problem since isImplicitlyDefined and setImplicitlyDefined will assert if isThisDeclarationADefintion returns false. Would it be an acceptable workaround to add non-asserting accessors?

Oh, I missed the assert. In that case, please ignore my comment.

Aside: anonymous namespaces are somewhat interesting for PCH, because we need to make sure that an anonymous namespace defined in the source file gets linked to an anonymous namespace defined in the PCH file, since they are effectively the same anonymous namespace. One fun way to test this is:

// In PCH:
namespace {
class X;
}

// In source file (that includes PCH):
namespace {
class X;
}

X *x; // okay, but will likely fail with an ambiguity if the two anonymous namespaces don’t get linked.

Just a quick update/observation on this point. It looks like this was easily solved by making sure that the TU serializes its anonymous namespace, which it wasn’t. I’ve fixed this in the patch that I’ll probably push tomorrow. I need to write some more tests for classes.

Andrew Sutton
andrew.n.sutton@gmail.com

I’ve spent the morning addressing issues in the patch and had some questions, comments. I’m not sending up a new patch just yet, though.

I had planned to send this up on Saturday, but couldn’t connect to the server in the morning so I worked on other stuff instead. Attached is a new version of C++/PCH Decl support. I think I’ve addressed all of your comments. I’ve also added support for

  • unresolved value/type decls
  • friend type references
  • ctor member initializers
  • stubbed a function for friend template decls

There are 3 issues with this patch that I either don’t like or are cause problems.

  1. I needed a way to attach the DefinitionData of a CXXRecordDecl that is not a definition to a previously defined record declaration via the PCH reader (hence the attachDefinition()) function in CXXRecordDecl. It basically addresses this problem:

class C { };
class C; // DefinitionData not attached during reading

The DefinitionData isn’t attached but the previous declaration is set correctly. IIRC, the same situation happens with injected class names.

I’m not thrilled with this solution. I think the best solution would be to make setPreviousDeclaration automatically share (forward?) DefinitionData. I just didn’t want to open that can of worms in this patch.

  1. Anonymous namespaces within other namespaces appear to be incorrectly linked in semantic analysis but not PCH serialization. Of course, my interpretation of C++ semantics may also be wrong :slight_smile: Let’s say we have:

// Yes, there are two of them
namesapce N { namespace { class C; } } // 1
namesapce N { namespace { class C; } } // 2
N::C* p; // ERROR ambiguous lookup for C.

My understanding is that the anonymous namespace should be linked so that 1 and 2 refer to the same class. Is that right? I added a test to SemaCXX/namespaces.cpp to check.

Strangely, the PCH serialization seems to deal with this correctly. Emphasis on “seems to”. It doesn’t generate an error, and it doesn’t crash… I can only guess that this is somehow an issue with lookup or a bizarre artifact of lazy deserialization.

  1. There seems to be a bug in the PCH reader when you explicitly initialize member variables of a base class. Let’s say you have:

struct B : A {
B() : A(), x() { } // ← ASSERT
int x;
};

The assertion occurs in the deserialization (ReadStmt) of the ImplicitValueInit expr for x(). Somehow, after visiting that stmt/expr, the Idx == Record.size() - 1. Curiously, if B has no base classes, this seems to work fine.

I’m not sure what the status of this patch is going to be since a) clang/llvm seems to be in a code freeze and b) this patch introduces an assertion. I have a lot of other things to take care of so I may not be able to address those issues in a timely manner.

Andrew Sutton
andrew.n.sutton@gmail.com

pch.patch (79.2 KB)

I’ve spent the morning addressing issues in the patch and had some questions, comments. I’m not sending up a new patch just yet, though.

I had planned to send this up on Saturday, but couldn’t connect to the server in the morning so I worked on other stuff instead. Attached is a new version of C++/PCH Decl support. I think I’ve addressed all of your comments. I’ve also added support for

  • unresolved value/type decls
  • friend type references
  • ctor member initializers
  • stubbed a function for friend template decls

It would be much easier to handle such additions as separate patches once the main patch has gone in. Big patches take much longer to review. We haven’t quite dealt with all of the issues there yet.

Also, friends are in a state of flux, so it’s not worth trying to (de-)serialize them now. Once the AST settles down and access control is turned on by default, it will make sense to implement PCH support with friends.

There are 3 issues with this patch that I either don’t like or are cause problems.

  1. I needed a way to attach the DefinitionData of a CXXRecordDecl that is not a definition to a previously defined record declaration via the PCH reader (hence the attachDefinition()) function in CXXRecordDecl. It basically addresses this problem:

class C { };
class C; // DefinitionData not attached during reading

The DefinitionData isn’t attached but the previous declaration is set correctly. IIRC, the same situation happens with injected class names.

I’m not thrilled with this solution. I think the best solution would be to make setPreviousDeclaration automatically share (forward?) DefinitionData. I just didn’t want to open that can of worms in this patch.

Since you posted the original patch, a better solution has come up. When we start a definition of a CXXRecordDecl, we go back and fix up all of the DefinitionData pointers.

  1. Anonymous namespaces within other namespaces appear to be incorrectly linked in semantic analysis but not PCH serialization. Of course, my interpretation of C++ semantics may also be wrong :slight_smile: Let’s say we have:

// Yes, there are two of them
namesapce N { namespace { class C; } } // 1
namesapce N { namespace { class C; } } // 2
N::C* p; // ERROR ambiguous lookup for C.

My understanding is that the anonymous namespace should be linked so that 1 and 2 refer to the same class. Is that right? I added a test to SemaCXX/namespaces.cpp to check.

Yes, good catch. I ended up fixing this as part of PR6341.

  1. There seems to be a bug in the PCH reader when you explicitly initialize member variables of a base class. Let’s say you have:

struct B : A {
B() : A(), x() { } // ← ASSERT
int x;
};

The assertion occurs in the deserialization (ReadStmt) of the ImplicitValueInit expr for x(). Somehow, after visiting that stmt/expr, the Idx == Record.size() - 1. Curiously, if B has no base classes, this seems to work fine.

I’m not sure what the status of this patch is going to be since a) clang/llvm seems to be in a code freeze and b) this patch introduces an assertion. I have a lot of other things to take care of so I may not be able to address those issues in a timely manner.

Only the 2.7 branch of Clang/LLVM is in code freeze. The trunk is in its usual frenzy of activity.

Overall, this patch isn’t quite ready to go in. I’ve attached an updated version of the patch, which brings your patch into line and fixes some of the issues. It cleans up the handling of DefinitionData (#1), eliminates the code for friend type references (which have changed, and may change yet again), and takes a step further toward getting conversion functions right. However, to get conversion functions 100% correct, the list of conversion functions needs to be loaded lazily: see the FIXME I added in PCHDeclReader::VisitCXXRecordDecl.

I think fixing conversion functions and the ctor-initializers are crucial for this patch to go in.

  • Doug

pch-updated.patch (76 KB)

It would be much easier to handle such additions as separate patches once the main patch has gone in. Big patches take much longer to review. We haven’t quite dealt with all of the issues there yet.

I realized that after I re-posted it. I can try to break it down into more digestible pieces.

Also, friends are in a state of flux, so it’s not worth trying to (de-)serialize them now. Once the AST settles down and access control is turned on by default, it will make sense to implement PCH support with friends.

Sounds good. I’ll try to take out the friend stuff.

The DefinitionData isn’t attached but the previous declaration is set correctly. IIRC, the same situation happens with injected class names.

Since you posted the original patch, a better solution has come up. When we start a definition of a CXXRecordDecl, we go back and fix up all of the DefinitionData pointers.

That is a better solution :slight_smile:

Only the 2.7 branch of Clang/LLVM is in code freeze. The trunk is in its usual frenzy of activity.

Overall, this patch isn’t quite ready to go in.

I think fixing conversion functions and the ctor-initializers are crucial for this patch to go in.

Sounds good. I’ll take another turn on it. Unfortunately, I’m pretty busy for the next week or so. Maybe extending PCH support for clang (expressions, too) would make a good GSoC project :slight_smile:

Andrew Sutton
andrew.n.sutton@gmail.com

It would be much easier to handle such additions as separate patches once the main patch has gone in. Big patches take much longer to review. We haven’t quite dealt with all of the issues there yet.

Also, friends are in a state of flux, so it’s not worth trying to (de-)serialize them now. Once the AST settles down and access control is turned on by default, it will make sense to implement PCH support with friends.

After a couple months of writing papers, a dissertation, making presentations, and interviewing, I’ve finally been able to sit down and work on some code… So I’m resubmitting this patch. Sort of.

I took the previous (big) patch, re-applied it to a fresh checkout, and then stripped out most of the updates that didn’t have anything to do with namespaces, namespace aliases, linkage specs, using directives, etc. I also updated the NamespaceDecl to use the PtrUnion for anon/original namespace pointers.

This might still look like a big patch, but it’s not too large. I’m actually stubbing out most of the other Decls, but will just assert if you try to serialize them–except (C++) classes, constructors, destructors, and conversions. These are stubbed out enough to prevent any new regressions from the test suite, but they don’t actually do anything.

I think fixing conversion functions and the ctor-initializers are crucial for this patch to go in.

Hopefully, since you’ve already reviewed most of this, it can be applied in relatively short order. After its accepted, I can re-patch and update w.r.t. class (and member) serializations. That should be a much smaller patch, and its easier for me to focus on.

Andrew Sutton
andrew.n.sutton@gmail.com

pch-namespaces.patch (52.6 KB)

This patch looks great to me. I committed it in r103301.

I’m also interested in working on C++ PCH support in the next couple days. What can I work on that won’t conflict with your work? If possible, I’d like to shoot for getting C++ PCH support working in the next week or two.

-Chris

This patch looks great to me. I committed it in r103301.

Great!

I’m also interested in working on C++ PCH support in the next couple days. What can I work on that won’t conflict with your work? If possible, I’d like to shoot for getting C++ PCH support working in the next week or two.

The remainder of the patch that Doug reviewed dealt with classes, constructors, destructors, and conversion operators needed a little work (to be loaded lazily I think). The next patch will definitely include read/write support for CXXBaseSpecifier and CXXBaseOrMemberInitializer. I cropped those out of this patch, but I don’t think that there wasn’t anything wrong with them.

I’ll try to get the next patch out Monday.

I don’t plan on touching the serialization of templates and template parameters The last time I tried to do something with it, I kept getting assertions that I shouldn’t serialize dependent types or expressions. That was probably in December, though. Things may have changed since then :slight_smile:

I haven’t looked at any of the CXX expressions either.

Andrew Sutton
andrew.n.sutton@gmail.com

Ok, sounds good. I’ll start simple with the CXX expressions!

-Chris