Precompiled headers with libclang

Hi cfe-devs,

I have a particular scenario where there is a single changing source C++ file which depends on large number of header files (which are constant). What I want libclang to do is to just reparse this source file and extract rest of information from a single precompiled header file in lieu of those header files.

I am able to serialize and deserialize a Translation unit. But I am not able to serialize a TU and use it as a library. Does .pch files work with libclang?

For our needs, we have 1 changing C++ source file and rest everything is constant in very large codebase. And we want to repeatedly invoke code complete or error checks on these. Is this setup possible with libclang?

Hi cfe-devs,

I have a particular scenario where there is a single changing source C++ file which depends on large number of header files (which are constant). What I want libclang to do is to just reparse this source file and extract rest of information from a single precompiled header file in lieu of those header files.

I am able to serialize and deserialize a Translation unit. But I am not able to serialize a TU and use it as a library. Does .pch files work with libclang?

For our needs, we have 1 changing C++ source file and rest everything is constant in very large codebase. And we want to repeatedly invoke code complete or error checks on these. Is this setup possible with libclang?

Just turn on precompiled preambles in libclang. After the first reparse, libclang will automatically create a PCH file for the headers at the top of your source file and use it for subsequent reparses and code completions.

I have actually set it ‘on’ already. But by running strace, I found that libclang is still accessing quite a few headers (Although Number of headers accessed reduced a bit).
Is this expected?

Or libclang is just expected to access preamble file? Is there any case where it can discard contents of preamble file.

To be precise, I am using ‘clang_defaultEditingTranslationUnitOptions’ with clang_parseTranslationUnit();

 unsigned clang_defaultEditingTranslationUnitOptions() {
   return CXTranslationUnit_PrecompiledPreamble | 
            CXTranslationUnit_CacheCompletionResults;
 }

I have actually set it ‘on’ already. But by running strace, I found that libclang is still accessing quite a few headers (Although Number of headers accessed reduced a bit).
Is this expected?

You’ll see libclang stat’ing all of the headers in the precompiled preamble, because it needs to determine whether they have changed.

Or libclang is just expected to access preamble file? Is there any case where it can discard contents of preamble file.

If the underlying headers change, or the #includes of the source file change, the precompiled preamble will be thrown out and regenerated.

  • Doug

I have actually set it ‘on’ already. But by running strace, I found that libclang is still accessing quite a few headers (Although Number of headers accessed reduced a bit).
Is this expected?

You’ll see libclang stat’ing all of the headers in the precompiled preamble, because it needs to determine whether they have changed.

Or libclang is just expected to access preamble file? Is there any case where it can discard contents of preamble file.

If the underlying headers change, or the #includes of the source file change, the precompiled preamble will be thrown out and regenerated.

Douglas, is there any debugging information I can generate like - Knowing when precompiled preamble is discarded, or what was file which caused it.

[Apart from using strace to figure out by looking at calls ]

Set the environment variable LIBCLANG_TIMING to get some timing data, which will say when (but not why) the precompiled preamble is (re)built. For any more information than that, you’ll have to modify Clang itself.

  • Doug

Douglas,

I am exploring one more possibility. Is it possible for clang to just use precompile header file (without corresponding source header files available) to do its work?
I don’t want clang to check for any changes source headers file but simply use precompile header file.

If this is possible, I can create a single pch file for the source file and can quickly move setup across machines. This would be similar to .jars in java world.

Douglas,

I am exploring one more possibility. Is it possible for clang to just use precompile header file (without corresponding source header files available) to do its work?

I could be wrong, but I would not expect this to work. I think lots of things still need to refer back to the underlying header files for printing out snippets in diagnostics, etc.

I don’t want clang to check for any changes source headers file but simply use precompile header file.

Disabling the checks might be do-able – but it’s not clear to me why these are a problem?

If this is possible, I can create a single pch file for the source file and can quickly move setup across machines. This would be similar to .jars in java world.

Ahh. I wonder if it would help to use digests rather than timestamps for these checks? Potentially as an optional mode?

While computing digests is more expensive than checking timestamps, it is much less expensive than rebuilding a PCH image etc, so we could potentially enable this in general as a layered system:

  1. if timestamp + inode etc differ, and
  2. if sha1 digests differ, then rebuild

Douglas,

I am exploring one more possibility. Is it possible for clang to just use precompile header file (without corresponding source header files available) to do its work?

I could be wrong, but I would not expect this to work. I think lots of things still need to refer back to the underlying header files for printing out snippets in diagnostics, etc.

I don’t want clang to check for any changes source headers file but simply use precompile header file.

Disabling the checks might be do-able – but it’s not clear to me why these are a problem?

I am trying to reduce every fraction of second possible. Since there are very large of header files, I guess there would be some time getting wasted in stat-ing those files. I want clang to assume everything else is constant except source file.

Stating is extremely fast, and does not have any bearing on the size of the header file.

Let me emphasize: extremely fast. ‘git status’ stats every single file in the entire git tree in a tiny fraction of a second.

Do you have a benchmark that clearly shows how much time is spent on ‘stat’ here? Are you using a network filesystem that makes stat system calls slow for some reason?

Yes I am using FUSE filesystems. I am not sure how slow stats are in my case. I can check that. But it looks like I have already stretched clang for my use to its maximum.

Doug/Chandler,

I am seeing a following improvments. Without precompiled preamble, I get latencies of 2s to 3s. But when it is enabled, I get latencies around ~1s. Is this the best possible optimization available with libclang?

When I reparse translation unit without ‘precompile preamble’ enabled, I am seeing following latencies. I reparsed same source.cc five times:
#1. Reparsing source.cc: 0.7100 (100.0%) 0.0700 (100.0%) 0.7800 (100.0%) 2.6748 (100.0%)
#2. Reparsing source.cc: 0.8000 (100.0%) 0.0300 (100.0%) 0.8300 (100.0%) 3.2166 (100.0%)
#3. Reparsing source.cc: 0.7900 (100.0%) 0.0300 (100.0%) 0.8200 (100.0%) 2.4005 (100.0%)
#4. Reparsing source.cc: 0.7300 (100.0%) 0.0200 (100.0%) 0.7500 (100.0%) 2.1998 (100.0%)

#5. Reparsing source.cc: 0.7400 (100.0%) 0.0100 (100.0%) 0.7500 (100.0%) 2.2896 (100.0%)

But when I enable precompile preamble, I do see some improvement in general. (from around ~2s earlier to ~1s now)
#1.
Precompiling preamble: 0.9200 (100.0%) 0.0200 (100.0%) 0.9400 (100.0%) 1.6569 (100.0%)

Cache global code completions for /tmp/cider/abhanshu/63/google3/bigtable/aggregate/aggregate_client.cc: 0.0600 (100.0%) 0.0600 (100.0%) 0.1440 (100.0%)
Reparsing /tmp/cider/abhanshu/63/google3/bigtable/aggregate/aggregate_client.cc: 1.0900 (100.0%) 0.0200 (100.0%) 1.1100 (100.0%) 2.5874 (100.0%)

#2. Reparsing /tmp/cider/abhanshu/63/google3/bigtable/aggregate/aggregate_client.cc: 0.1300 (100.0%) 0.0200 (100.0%) 0.1500 (100.0%) 3.5431 (100.0%)
#3. Reparsing /tmp/cider/abhanshu/63/google3/bigtable/aggregate/aggregate_client.cc: 0.1200 (100.0%) 0.0200 (100.0%) 0.1400 (100.0%) 1.0338 (100.0%)
#4. Reparsing /tmp/cider/abhanshu/63/google3/bigtable/aggregate/aggregate_client.cc: 0.1400 (100.0%) 0.1400 (100.0%) 1.0945 (100.0%)
#5. Reparsing /tmp/cider/abhanshu/63/google3/bigtable/aggregate/aggregate_client.cc: 0.1000 (100.0%) 0.0300 (100.0%) 0.1300 (100.0%) 0.9824 (100.0%)
#6. Reparsing /tmp/cider/abhanshu/63/google3/bigtable/aggregate/aggregate_client.cc: 0.1300 (100.0%) 0.0100 (100.0%) 0.1400 (100.0%) 1.1601 (100.0%)
[Note: There is no precompiling happening if no headers are changed as expected.]

And when I modified one of its header file, I see precompiling preamble again :
#7. Precompiling preamble: 0.9000 (100.0%) 0.0500 (100.0%) 0.9500 (100.0%) 2.1090 (100.0%)

Reparsing /tmp/cider/abhanshu/63/google3/bigtable/aggregate/aggregate_client.cc: 1.0300 (100.0%) 0.0700 (100.0%) 1.1000 (100.0%) 3.5036 (100.0%)

It might be. Others have experimented with other optimizations, e.g., turning off instantiation of function templates (which trades some functionality for a decent performance win), but those can't directly be enabled via libclang at this point.

  - Doug

Why??

It's an earnest question. I'm having trouble imagining a why a
developer interested in speed would choose to keep source code (or
object code) on anything except locally attached storage. Putting
a network and/or userspace between the compiler and the source code is
a recipe for poor productivity.

Modern source-control systems replaced CVS partly because CVS's design
presumes expensive storage (which was true when CVS was young).
Subversion explicitly chose to prefer local storage to network I/O
whenever feasible because disk storage is now cheaper than water. git
took that design one step further, keeping the whole repository local.

But you surely know all that. Whatever could require you to ignore
it?

--jkl

Editing code on a local PC but building and executing it on a server farm (because it’s much faster) is a common pattern I think.

Then you have the choice of either keeping two copies of the code (one local, one on the server) and then synchronizing before compiling/running or to have them share their storage, one way or another.

– Matthieu

Le 6 avril 2012 00:49, James K. Lowden <jklowden@schemamania.org> a
écrit :
> >
> > Yes I am using FUSE filesystems.
>
> I'm having trouble imagining a why a
> developer interested in speed would choose to keep source code (or
> object code) on anything except locally attached storage.

Editing code on a local PC but building and executing it on a server
farm (because it's much faster) is a common pattern I think.

Granted, yes.

Then you have the choice of either keeping two copies of the code (one
local, one on the server) and then synchronizing before
compiling/running or to have them share their storage, one way or
another.

Sure. And your experience (and mine) both point to optimizing for
compilation.

Because the number of files compiled will always equal or exceed the
number edited, and because I/O is a bigger fraction of the compiler's
performance than of the editor's (especially if we include the
keyboard), best results will come from the compiler using local
storage.

You can edit over NFS or FUSE; you can use rsync; you can check in and
have the build script check out. ISTM any of those would be faster
than encumbering the compiler I/O.

Thanks for the explanation. I see your motivation; I'm sure that even
if there are "better" ways, the particular environment you're in may
not be changed all that easily. That said, optimizing away stat(2)
calls would be a limited victory at best. The only way to make
compilation fast, in the end, is to arrange things such that it *can*
be fast.

--jkl

Le 6 avril 2012 00:49, James K. Lowden <jklowden@schemamania.org> a
écrit :

Yes I am using FUSE filesystems.

I’m having trouble imagining a why a
developer interested in speed would choose to keep source code (or
object code) on anything except locally attached storage.

Editing code on a local PC but building and executing it on a server
farm (because it’s much faster) is a common pattern I think.

This is exactly the case here. However, If I put everything on tmpfs, latencies reduce to
~400ms to 600ms for error checking and roughly same for code completion.
I guess this is the best clang could probably do.