llvm-gcc -O0 compile times

I’ve started investigating -O0 -g compile times with llvm-gcc, which are pretty important for people in development mode (e.g. all debug builds of llvm itself!).

I’ve found some interesting things. I’m testing with mainline as of r52596 in a Release build and with checking disabled in the front-end. My testcase is a large C++ source file: my friend InstructionCombining.cpp. I build it the normal way we build it in a debug mode but with the output redirected to /dev/null, which is:

time llvm-g++ -I/Users/sabre/llvm/include -I/Users/sabre/llvm/lib/Transforms/Scalar -D_DEBUG -D_GNU_SOURCE -D__STDC_LIMIT_MACROS -g -fno-exceptions -Woverloaded-virtual -pedantic -Wall -W -Wwrite-strings -Wno-long-long -Wunused -Wno-unused-parameter -c -MMD -MP -MF “/Users/sabre/llvm/lib/Transforms/Scalar/Debug/InstructionCombining.d.tmp” -MT “/Users/sabre/llvm/lib/Transforms/Scalar/Debug/InstructionCombining.lo” -MT “/Users/sabre/llvm/lib/Transforms/Scalar/Debug/InstructionCombining.o” -MT “/Users/sabre/llvm/lib/Transforms/Scalar/Debug/InstructionCombining.d” InstructionCombining.cpp -o /dev/null

One thing that is interesting is that we are significantly slower than g+±4.2 on this testcase. I’m seeing these timings:

GCC 4.2 -c: 4.27s
GCC 4.2 -S: 3.59s
LLVM4.2 -c: 9.30s

LLVM4.2 -S: 8.40s

One thing I noticed is that with llvm-gcc, the assembler is taking longer than with gcc 4.2 (.9s vs .68s). This turns out to be because we make much larger output than GCC does:

gcc.s → 8943786
llvm.s → 13424378
gcc.o → 2055892
llvm.o → 3044512

Why is this? Lets look at the contents:

$ sdiff -w 120 gcc.size llvm.size
Segment : 1495968 | Segment : 2211617
Section (__TEXT, __text): 251661 | Section (__TEXT, __text): 290873
Section (__DWARF, __debug_frame): 82752 | Section (__DWARF, __debug_frame): 80240
Section (__DWARF, __debug_info): 671478 | Section (__DWARF, __debug_info): 1240778
Section (__DWARF, __debug_abbrev): 3241 | Section (__DWARF, __debug_abbrev): 1535
Section (__DWARF, __debug_aranges): 48 | Section (__DWARF, __debug_aranges): 0
Section (__DWARF, __debug_macinfo): 0 Section (__DWARF, __debug_macinfo): 0
Section (__DWARF, __debug_line): 126106 | Section (__DWARF, __debug_line): 149797
Section (__DWARF, __debug_loc): 0 Section (__DWARF, __debug_loc): 0
Section (__DWARF, __debug_pubnames): 168873 | Section (__DWARF, __debug_pubnames): 165104
Section (__DWARF, __debug_pubtypes): 32449 |
Section (__DWARF, __debug_str): 17541 | Section (__DWARF, __debug_str): 0
Section (__DWARF, __debug_ranges): 456 | Section (__DWARF, __debug_ranges): 0
Section (__DATA, __const): 100 | Section (__DATA, __const): 136
Section (__TEXT, __cstring): 11543 | Section (__TEXT, __cstring): 12678
Section (__DATA, __data): 64 | Section (__DATA, __data): 76
Section (__DATA, __const_coal): 48 |
Section (__TEXT, __const_coal): 128 |
Section (__DATA, __mod_init_func): 4 | Section (__DATA, __mod_init_func): 4
Section (__DATA, __bss): 32 | Section (__DATA, __bss): 65
Section (__TEXT, __textcoal_nt): 116324 | Section (__TEXT, __textcoal_nt): 168920
Section (__TEXT, __literal8): 8 | Section (__TEXT, __eh_frame): 88636
Section (__TEXT, __StaticInit): 147 | Section (__TEXT, __StaticInit): 166
Section (__IMPORT, __jump_table): 12790 | Section (__IMPORT, __jump_table): 12410
Section (__IMPORT, __pointers): 136 | Section (__IMPORT, __pointers): 128
total 1495929 | total 2211546
total 1495968 | total 2211617

There are several problems here:

  1. We’re emitting __eh_frame even though it is being built with -fno-exceptions: http://llvm.org/PR2481. Just the excess labels alone give the assembler a lot more work to do.
  2. The __debug_info section is twice as big and the __debug_line section is a bit bigger: http://llvm.org/PR2482
  3. We aren’t outputting text or data __const_coal sections. I’m not sure what these are, but they seem preferable to __textcoal_nt: http://llvm.org/PR2483

Also, we have no __debug_pubtypes, __debug_aranges, __debug_str, __debug_ranges or sections. I have no idea what these are, but could be a problem :slight_smile:

Fixing these are important for a couple of reasons. Generating more output takes more time, both in the assembler but also in the compiler to push all this around.

Moving up from the assembler, according to -ftime-report, our time in cc1plus is basically going into:

LLVM Passes:
2.65s → X86 DAG->DAG Instruction Selection (all selectiondag stuff)
0.54s → X86 AT&T-Style Assembly Printer
0.42s → Live Variable Analysis
0.19s → Local Register Allocator

C++ Front-end time:

  • 2.22s Tree to LLVM translator
  • 1.94s parser
  • 2.07s name lookup
  • 0.66s preprocessor
  • 0.20s gimplify

This doesn’t add up to 8.4s because -ftime-report adds significant overhead. It isn’t to be trusted, but is a decent indicator.

From this, it looks like there is significant room for improvement in many of the LLVM pieces. The two that sick out are the tree to llvm translator and the selection dag related stuff. However, even the asmprinter is taking a significant amount of time. This is partially because it has to output a ton of stuff, but even then it could be improved.

For example, picking on the frontend for a bit, we spend 10% of “-emit-llvm -O0 -g -c” time in DebugInfo::EmitFunctionStart, most of which is spent recursively walking the debug info with DISerializer. We also spend 9.3% of the time in DebugInfo::EmitDeclare, 10% of the time in eraseLocalLLVMValues, 12% of the time writing the .bc file (which isn’t relevant to normal use), 21% of time parsing (which we can’t help),

Anyone interested in picking off a piece and tackling it?

-Chris

Hi Chris,

I've started investigating -O0 -g compile times with llvm-gcc, which
are pretty important for people in development mode (e.g. all debug
builds of llvm itself!).

even without -g the -O0 performance is not great. I compared the time
for llvm-gcc to compile to bitcode (-emit-llvm) against the time for
mainline gcc-4.2 to compile to assembler. llvm-gcc was about 15% slower
even though it didn't have to do codegen. Removing all names "tmp" (as
recently committed on mainline) speed up llvm-gcc enough that it was taking
the same time as mainline. This is still without llvm-gcc doing codegen;
adding codegen in is sure to slow things down noticeably...

Ciao,

Duncan.

Are you sure the gcc numbers are right? I think these are gcc 4.0 numbers. I got:

Section (__TEXT, __text): 254569
Section (__DWARF, __debug_frame): 82612
Section (__DWARF, __debug_info): 841164

Evan

It is entirely possible they are 4.0 numbers, sorry I don't recall...

-Chris

Yep, I have confirmed these are 4.0 numbers. We are now basically even with gcc 4.2 (except for the extra eh_frame).

Evan