Serialization bug

$ cat test.c
void foo() {
int i;
}

$ clang -serialize test.c

$ clang -emit-llvm test.c.ast
clang: CGDecl.cpp:41: void clang::CodeGen::CodeGenFunction::EmitDecl(const clang::Decl&): Assertion `0 && “Should not see file-scope variables inside a function!”’ failed.
clang[0x84f5f2a]
clang[0x84f605c]
[0x110420]
/lib/libc.so.6(abort+0x101)[0x578f91]
/lib/libc.so.6(__assert_fail+0xee)[0x57093e]
clang(_ZN5clang7CodeGen15CodeGenFunction8EmitDeclERKNS_4DeclE+0xc8)[0x82ca9d2]
clang(_ZN5clang7CodeGen15CodeGenFunction12EmitDeclStmtERKNS_8DeclStmtE+0x28)[0x82df094]
clang(_ZN5clang7CodeGen15CodeGenFunction8EmitStmtEPKNS_4StmtE+0x274)[0x82df86e]
clang(_ZN5clang7CodeGen15CodeGenFunction16EmitCompoundStmtERKNS_12CompoundStmtEbPN4llvm5ValueEb+0x98)[0x82e0c86]
clang(_ZN5clang7CodeGen15CodeGenFunction8EmitStmtEPKNS_4StmtE+0x179)[0x82df773]
clang(_ZN5clang7CodeGen15CodeGenFunction12GenerateCodeEPKNS_12FunctionDeclE+0x56b)[0x82c7963]
clang(_ZN5clang7CodeGen13CodeGenModule12EmitFunctionEPKNS_12FunctionDeclE+0x120)[0x82aff9e]
clang[0x82ad772]
clang[0x829efc8]
clang(main+0x3c8)[0x82a2206]
/lib/libc.so.6(__libc_start_main+0xe0)[0x564390]
clang[0x82783a1]
Aborted

The assertion fails because the DeclContext * field of i becomes 0x0 after deserialization.

I am investigating the problem. What is the difference between InRec and OutRec methods in serialization?

Thanks.
–Zhongxing Xu

It's a low-level thing having to do with the bitcode file's use of "records". Basically the bitcode file consists of a set of records, similar to XML, with special records (called "blocks") being allowed to contain other records in a tree-like structure.

InRec serializes out data that is basically guaranteed to remain in the same record. OutRec serializes out data that is known to end the current record and create new ones. Because each record adds extra metadata to the bitcode file, keeping things packed as much as possible in one record (as we do in InRec) saves space on disk. It's purely an optimization thing. For example, OutRec may call EmitOwnedPtr, which basically ends the current record with a persistent pointer ID for the next object being serialized out and a new record contains the data for the other object.

Here is an example:

1) EmitInt
2) EmitInt
3) EmitOwnedPtr(X)
4) EmitInt

(1) and (2) just write out the bits for an integer; EmitInt doesn't terminate the current record, but just adds more data to it. EmitOwnedPtr then writes out a single integer (in the current record) to represent the persistent pointer ID, then starts a new record which contains the data of X, then ends that record. Then (4) is written out, this time in a new record (since no record is active). This leads to 3 records (assuming X is contained in one record). A more optimal encoding is to move (4) before (3), causing us to have 2 records.

So... EmitInRec and EmitOutRec are just methods where we have collected the data that will be written in the current record (EmitInRec) and data that will result in the creation of other records (EmitOutRec). EmitOutRec should be called after EmitInRec.

One last thing: the serialization support is incomplete. I stopped fully implementing it when I got to handling EnumDecls and RecordDecls, since we then (and still don't) have a clear ownership model for all of these decls (which the serialization mechanism absolutely depends on). I plan on completing it one day, but it wasn't pressing at the time (and I knew the ASTs would evolve and some of the ownership issues would get cleaned up).