Attached are two patches (svn diff format) that correct two issues encountered when loading AST files generated by Clang with the '-emit-ast' option.
Problem 1: Loading an AST file via ASTUnit::LoadFromASTFile() sets ASTUnit::MainFileIsAST to true so that calls to ASTUnit::isMainFileAST() return true. However, SourceManager::MainFileID does not get restored. This is addressed by clang-main-file-id.patch
Problem 2: When the main source file is compiled using a relative path (ie, 'clang -emit-ast file.c'), filesystem stat lookups use the relative path exactly as passed on the command line. This results in the stat cache populated by ASTWriter via MemorizeStatCalls using the relative path as the key to the stat cache for the main file. However, the main source file name is also stored in the AST file using an absolute path as the "original file". When the AST file is later read by ASTReader, the original (absolute) path is used for filesystem stat lookup resulting in a cache miss for the stat cache because the absolute path doesn't match (string comparison) the relative path used as a key in the stat cache. In this case, the real filesystem is queried which may return stat data that does not reflect the stat values from the time of the compilation if the filesystem stat values have changed. This is addressed by clang-main-file-stat-cache.patch which simply converts input file names to absolute paths before passing them on to FrontendAction::BeginSourceFile().
A test program is attached to demonstrate these problems. When compiled and linked with an unpatched version of clang, the test program will abort on a failed assertion:
Assertion `!source_manager.getMainFileID().isInvalid()' failed.
Applying clang-main-file-id.patch will correct the assertion failure. At this point, the second problem can be demonstrated with the following steps:
1) Compile an example .c file with 'clang -emit-ast file.c'. Make
sure to use a relative path to the source file.
2) Run the test program against the .ast file generated in step 1.
Output similar to the following should be displayed:
main file: /path/to/file.c
Size: 15
Time: 1324402591: Tue Dec 20 12:36:31 2011
Mode: 0100644
3) Change the mode of the .c file (ie, 'chmod g+w file.c'), and
rerun the test program. If the stat cache is working properly,
no changes should be seen. However, without the attached patch,
a change to the mode is seen:
main file: /path/to/file.c
Size: 15
Time: 1324402591: Tue Dec 20 12:36:31 2011
Mode: 0100664
The reason the steps above change the file mode is because the AST reader will notice changes to the file size or modification stamp and abort the AST load if either fails to match. Changing the file mode avoids the load failure, but still demonstrates the cache miss.
I suspect that stat cache lookups will remain fragile as long as the stat cache continues to use non-normalized paths as keys. I haven't attempted to address this.
These issues were present in Clang 2.9 and 3.0 as well.
Tom.
clang-main-file-id.patch (1.7 KB)
clang-main-file-stat-cache.patch (1.21 KB)
testcase.cpp (2.15 KB)