Improving Clang Parse Performance

I am trying to use Clang for parsing some C++ programs and have had good
success. The problem is performance. I can parse about 400,000 lines of C
code scattered in 400 files in about 25 minutes on a really fast machine. I
think that this is not very good, considering that I am doing a lot less
than a compiler. Am I mistaken in my expectations?

My sense is that to get a real bump in performance, I need to use
pre-compiled header files, because the vast majority of parsing is being
done over and over. However, my initial experience with pch has not been
successful. It isn't clear to me how to go about doing this in a practical
situation. Do I need to write a small pre-processing step that identifies
what the included files are for each source file and then pre-compile them
with the same set of pre-processor directives? Is it better to pre-compile
groups of header files together or to pre-compile each header files
separately?

Any suggestions that will get me headed in the right direction will be most
helpful.

I am trying to use Clang for parsing some C++ programs and have had good
success. The problem is performance. I can parse about 400,000 lines of C
code scattered in 400 files in about 25 minutes on a really fast machine.

That is odd. Clang's parser is pretty fast, as C++ parsers go. Have
you tried using an optimized build of Clang to compiler your 400
files? 400,000 lines isn't much code, and 25 minutes is a long time.

I
think that this is not very good, considering that I am doing a lot less
than a compiler. Am I mistaken in my expectations?

A rule of thumb is that, when not optimizing, lexing/parsing takes
about half of the total compiler time. If you do no work beyond
parsing you might be able to be twice as fast as clang++ when it's
compiling.

My sense is that to get a real bump in performance, I need to use
pre-compiled header files, because the vast majority of parsing is being
done over and over.

My guess is that something is strange in your setup, because I'd
expect that Clang is faster than that. How long does it take to
*compile* your 400k LOC?

-- James

I am trying to use Clang for parsing some C++ programs and have had good
success. The problem is performance. I can parse about 400,000 lines of C
code scattered in 400 files in about 25 minutes on a really fast machine.

That is odd. Clang’s parser is pretty fast, as C++ parsers go. Have
you tried using an optimized build of Clang to compiler your 400
files? 400,000 lines isn’t much code, and 25 minutes is a long time.

If it’s single threaded that is actually in line with numbers I have run (3.75 seconds per TU on average). I also think I saw some regressions here over the past year, but we’ll need to investigate more. My numbers are mostly C++ code though, so that might be apples and oranges.

Cheers,
/Manuel

Thanks for your suggestion about comparing performance to compiler time. I
will try it out. I can actually compile the target code base using Visual
Studio within 3-4 minutes. But that build makes heavy use of pre-compiled
files. Hence the reason for my focus on pre-compiled header files with
Clang. Are there any suggestions on using pre-compiled header files with
Clang?