token caching ("PCH")

Hi Cedric,

Redirecting to cfe-dev.

These are fair questions. The short answer is that this is a stepping stone towards other forms of PCH support in Clang.

Here is the long answer.

In the long term the leading idea on the table is to implement PCH via lazy deserialization of ASTs. This means you only deserialize the pieces of the header that you actually need. Beyond requiring serialization support to be fully implemented, there are many details to get right. It's a complicated solution that will take some time to implement. In the meantime, we would like to have other forms of "PCH" available today (which also may be compatible with other PCH solutions we implement).

The approach we are investigating right now is "token caching". The idea is to cache un-preprocessed (raw) tokens included from all the files sucked in from the prefix header. The cached tokens can then be used in a manner similar to how precompiled headers are used today: instead of lexing from the headers, we simply read in the tokens. The idea is to drastically reduce lexing time.

Here is the motivation for this approach.

We did some extensive timings of Clang. Clang is 2x faster than GCC without PCH, and GCC with PCH is (on average) 2x faster than Clang. This is on Mac OS X where we have some really huge headers (including Cocoa.h is about 11MB of source code).

The majority of the work (50-60%) at -fsyntax-only is spent lexing. Thus simply speeding up lexing significantly will have a substantial performance win. Moreover, a significant amount of time is spent just stat'ing header files, opening them, reading them. Placing the cached tokens from a prefix header can remove a significant amount of time from lexing as well as remove a significant amount of overhead from system calls. Further, caching the lexed tokens is a simple solution that is trivial to implement, which means we will have something that works in the short term while we investigate other (complementary) solutions.

As to your point about simply deserializing all the ASTs in a header, we (Daniel, Steve, Chris, and I) believe that would give roughly the same performance as the token caching approach. There are three arguments for this.

First, the time spent in Sema itself is largely spent allocating objects, and performing a deserialization of all the ASTs will recover only a fraction of time from Sema as we need to allocate the same number of objects. While it would also recover parse time, only 10% of -fsyntax-only is spent doing parsing. Also, object deserialization isn't free (there is computation involved in reading VBR encoded integers, parsing bitcode records, etc.), so any time recovered from removing parsing would probably be replaced by work doing deserialization.

Second, deserializing all of the ASTs in the prefix header would also be far more complicated than caching tokens; not only would serialization need to be feature complete (which we plan on doing anyway) but more aspects of Clang, e.g. Sema, would need to be aware of our PCH solution. The token caching approach means that only the Preprocessor needs to be aware that it is getting its tokens from another source; Parse and Sema remain unaffected.

Third, token caching is also something we can probably implement in a week or so, so it is something we can quickly evaluate to see if it buys us anything. Further, token caching has some other incidental benefits; for example, we will be able to get better measurements of how much time we spend in preprocessing.

As I mentioned earlier, the ultimate solution (as currently conceptualized) for PCH in Clang is to indeed use AST serialization, but only to deserialize the pieces of the header that are needed. This requires the ability to do lazy deserialization of ASTs in Sema, something which our current infrastructure just doesn't really support right now. Moreover, C++ will bring a whole host of other complexities when considering fast PCH support, so there is some benefit in waiting until C++ support progresses a little further so we can a better idea of the right design tradeoffs for PCH support for C++.

Ted

Ted Kremenek a écrit :

In the long term the leading idea on the table is to implement PCH via lazy deserialization of ASTs. This means you only deserialize the pieces of the header that you actually need.

I can see this is an ambitious project. Implementing this would also help greatly with implementing module in C++ (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2316.pdf). A very interesting proposal, but too complex (to implement) to be accepted in C++0x :(.

> [...]

thanks for your detailed answer.

regards,

Cédric