Using clang to analyze compile times

I was experimentally migrating a couple of very large projects from gcc to clang 3.4/3.5. This resulted in increased compile times while on the same systems clang compiles itself 6 times faster than gcc.

My previous passes at trying to improve compile times have been grueling and never satisfactorily informative (e.g. I found individual compilation units took gcc 4.6 100ms longer when they #include'd <string>). So naturally, my first instinct is that it's probably poor #include organization, or maybe it's the result of namespace pollution, it could be time spent on template instantiation, or is it some nuance of the way the code has been written that turns out to be the ultimate degenerate case for the clang/llvm optimizer pass.

So after reading about 3.5's Pretokenized Headers, I got to wondering if clang already has some extant mechanism or addons for such an investigation?

-Oliver