Some clang benchmarking on Windows

Hi,

I did some benchmarking of clang on Windows this morning, just for fun.
I used LLVM_TARGETS_TO_BUILD:STRING=all on cmake to include all the
targets (to be able to compare with what Rafael Espindola posted on
LLVM-Dev.)

I compiled 4 versions of clang:
   win32 and /O1
   win32 and /O2
   win64 and /O1
   win64 and /O2

/O1 is Minimize Size (configuration MinSizeRel)
/O2 is Maximize Speed (configuration Release)

Everything was compiled using Visual Studio 2008, Windows 7 x64.

clang.exe size:
   win32 /O1: 9.3 MB 24% smaller
   win32 /O2: 12.3MB
   win64 /O1: 15.7 MB 15% smaller
   win64 /O2: 18.5 MB

Then I used my huge test.cpp that includes all the non templated
headers of Windows to compare the speed of compilation.
I used -fsyntax-only.

clang.exe speed (in seconds):
   win32 /O1: 7.9
   win32 /O2: 5.4 31% faster
   win64 /O1: 5.9
   win64 /O2: 5.4 8% faster

Now we need a way to create a new configuration using CMake to enable
LTO compilation with MSVC.

Francois Pichet <pichet2000@gmail.com>
writes:

[snip]

Now we need a way to create a new configuration using CMake to enable
LTO compilation with MSVC.

If you describe the process involved on using LTO I could help a bit.

(Is that the same LTO based on LLVM? Does it work on Windows at all? Or
is it the LTO provided by VS?)

I am talking about the LTO (Microsoft calls it Whole Program
Optimization) provided by VS. Being able to compile clang using VS
with Whole Program Optimization on.

Here are some docs:

Francois Pichet <pichet2000@gmail.com> writes:

Francois Pichet <pichet2000@gmail.com>
writes:

[snip]

Now we need a way to create a new configuration using CMake to enable
LTO compilation with MSVC.

If you describe the process involved on using LTO I could help a bit.

(Is that the same LTO based on LLVM? Does it work on Windows at all? Or
is it the LTO provided by VS?)

I am talking about the LTO (Microsoft calls it Whole Program
Optimization) provided by VS. Being able to compile clang using VS
with Whole Program Optimization on.

Here are some docs:
/GL (Whole Program Optimization) | Microsoft Learn
/LTCG (Link-time Code Generation) | Microsoft Learn

That's trivial. Try this patch and see if the results warrants a new
cmake option for supporting LTO on MSVC:

diff --git a/CMakeLists.txt b/CMakeLists.txt
index d473f51..0f1bb24 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -224,6 +224,7 @@ if( MSVC )
   add_llvm_definitions( -D_SCL_SECURE_NO_DEPRECATE )
   add_llvm_definitions( -wd4146 -wd4503 -wd4996 -wd4800 -wd4244 -wd4624 )
   add_llvm_definitions( -wd4355 -wd4715 -wd4180 -wd4345 -wd4224 )
+ add_llvm_definitions( /GL )

   # Suppress 'new behavior: elements of array 'array' will be default initialized'
   add_llvm_definitions( -wd4351 )

ok but if you give /GL to the compiler you need to give /LTCG to the
linker.. how do u do that?

Francois Pichet <pichet2000@gmail.com> writes:

ok but if you give /GL to the compiler you need to give /LTCG to the
linker.. how do u do that?

From the second web page you linked:

"The linker invokes link-time code generation if it is passed a module
that was compiled with /GL or an MSIL module (see .netmodule Files as
Linker Input for more information). If you do not explicitly specify
/LTCG when passing /GL or MSIL modules to the linker, the linker will
eventually detect this and restart the link with /LTCG."

Thank you.
I tested and I was disappointed.. I don't think it is worth adding a
new configuration for LTO using MSVC:

result:
size: win32 /O2 + LTO: 12.1MB (save 0.1 meg)
speed: win32 /O2 + LTO: 5.4s (save 0.1 second)

+ the link-time took an eternity (something like 20 mins, could be
more) and close to 1 gig of RAM.

Francois Pichet <pichet2000@gmail.com> writes:

Now we need a way to create a new configuration using CMake to enable
LTO compilation with MSVC.

Thank you.
I tested and I was disappointed.. I don't think it is worth adding a
new configuration for LTO using MSVC:

My experience with my projects is similar to yours, that's why I asked
for a test before adding the build option. Unless you do something
really stupid (like not inlining your class accessors) the only effect
of /GL is increasing build time.

[snip]

For people who are curious, I benchmarked MSVC 2008 and 2010 cl.exe
compiler on the same file. Again I used -c (-fsyntax-only)

cl.exe 2008 speed => 5.6 sec
cl.exe 2010 speed => 5.7 sec
clang.exe trunk => 5.4 sec

clang is slightly faster than MSVC at parsing the non templated Visual
Studio header files.
I ran the test many times on both clang and cl to make sure all header
files were in the cache somewhere because the first time you run it,
your get slower result.

For people who are curious, I benchmarked MSVC 2008 and 2010 cl.exe
compiler on the same file. Again I used -c (-fsyntax-only)

cl.exe 2008 speed => 5.6 sec
cl.exe 2010 speed => 5.7 sec
clang.exe trunk => 5.4 sec

clang is slightly faster than MSVC at parsing the non templated Visual
Studio header files.
I ran the test many times on both clang and cl to make sure all header
files were in the cache somewhere because the first time you run it,
your get slower result.

FYI, 18 months later.. clang is no longer faster than cl.exe, it's a draw!!
I am using a file test.cpp that just include all the Windows/MSVC SDK headers.

-fsyntax-only parsing benchmarking:

cl.exe 2010 speed => 7.2 sec
clang.exe trunk => 7.2 sec

I ran the test many many times and I always get 7.2 secs. Strangely
clang.exe and cl.exe (MSVC) take almost exactly the same time to parse
the test file.

(The reason the test went from 5.x sec to 7.2 sec is because it now
includes a lot more .h files).

Hi Francois,

(The reason the test went from 5.x sec to 7.2 sec is because it now
includes a lot more .h files).

It is possible to compare clang -cc1 with vcpp on pre-processed input?