AST Serialization


I have a question regarding the assumptions and correct usage of the AST serialization (regarding C and C++ sources).

I have done the following:

  1. I have implemented a ClangTool which builds ASTs from compilation databases.

  2. I have dumped the contents of the ASTs in both textual and binary formats.

  3. Then I have read in the serialized binary, and dumped that one again in both formats.

What I have noticed, is that dump of the different generations are different in size (up to a magnitude). Textual dumps also differ.

I would have assumed the serialization and deserialization steps to produce an AST which is the same as the original.

Maybe I have done it the wrong way, in the following outline I try to give the gist of the method used:

void textual_dump_to_file(const ASTUnit& unit, StringRef file_path) {

using namespace llvm::sys::fs;

using namespace llvm::sys::path;

// mkdir -p


std::error_code EC;

llvm::raw_fd_ostream out {file_path, EC};

unit.getASTContext().getTranslationUnitDecl()->dump(out, /deserialize/ true);


void experiment_with_unit(CompilerInstance& CI, ASTUnit& Unit, StringRef MethodPrefix, StringRef SourcePath) {

using namespace llvm::sys::fs;

using namespace llvm::sys::path;

IntrusiveRefCntPtr DiagOpts = new DiagnosticOptions();

TextDiagnosticPrinter *DiagClient = new TextDiagnosticPrinter(llvm::errs(), &*DiagOpts);

IntrusiveRefCntPtr DiagID(new DiagnosticIDs());

IntrusiveRefCntPtr Diags(

new DiagnosticsEngine(DiagID, &*DiagOpts, DiagClient));

llvm::SmallString<256> TextDumpPath{MethodPrefix};


llvm::SmallString<256> BinaryDumpPath {TextDumpPath};

replace_extension(TextDumpPath, “.txt1”);

replace_extension(BinaryDumpPath, “.bin1”);


textual_dump_to_file(Unit, TextDumpPath);

auto Dump1Loaded = ASTUnit::LoadFromASTFile(

std::string(BinaryDumpPath), CI.getPCHContainerOperations()->getRawReader(),

ASTUnit::LoadEverything, Diags, CI.getFileSystemOpts());

replace_extension(TextDumpPath, “.txt2”);

replace_extension(BinaryDumpPath, “.bin2”);


textual_dump_to_file(*Dump1Loaded, TextDumpPath);


Files with extensions txt1 and txt2 differ, and bin1 and bin2 as well.

I would think that if there is a problem in the reproducibility of the AST, then it would affect modules, and the analyzer as well.

Any thoughts on this?