I’ve started investigating -O0 -g compile times with llvm-gcc, which are pretty important for people in development mode (e.g. all debug builds of llvm itself!).
I’ve found some interesting things. I’m testing with mainline as of r52596 in a Release build and with checking disabled in the front-end. My testcase is a large C++ source file: my friend InstructionCombining.cpp. I build it the normal way we build it in a debug mode but with the output redirected to /dev/null, which is:
time llvm-g++ -I/Users/sabre/llvm/include -I/Users/sabre/llvm/lib/Transforms/Scalar -D_DEBUG -D_GNU_SOURCE -D__STDC_LIMIT_MACROS -g -fno-exceptions -Woverloaded-virtual -pedantic -Wall -W -Wwrite-strings -Wno-long-long -Wunused -Wno-unused-parameter -c -MMD -MP -MF “/Users/sabre/llvm/lib/Transforms/Scalar/Debug/InstructionCombining.d.tmp” -MT “/Users/sabre/llvm/lib/Transforms/Scalar/Debug/InstructionCombining.lo” -MT “/Users/sabre/llvm/lib/Transforms/Scalar/Debug/InstructionCombining.o” -MT “/Users/sabre/llvm/lib/Transforms/Scalar/Debug/InstructionCombining.d” InstructionCombining.cpp -o /dev/null
One thing that is interesting is that we are significantly slower than g+±4.2 on this testcase. I’m seeing these timings:
GCC 4.2 -c: 4.27s
GCC 4.2 -S: 3.59s
LLVM4.2 -c: 9.30s
LLVM4.2 -S: 8.40s
One thing I noticed is that with llvm-gcc, the assembler is taking longer than with gcc 4.2 (.9s vs .68s). This turns out to be because we make much larger output than GCC does:
gcc.s → 8943786
llvm.s → 13424378
gcc.o → 2055892
llvm.o → 3044512
Why is this? Lets look at the contents:
$ sdiff -w 120 gcc.size llvm.size
Segment : 1495968 | Segment : 2211617
Section (__TEXT, __text): 251661 | Section (__TEXT, __text): 290873
Section (__DWARF, __debug_frame): 82752 | Section (__DWARF, __debug_frame): 80240
Section (__DWARF, __debug_info): 671478 | Section (__DWARF, __debug_info): 1240778
Section (__DWARF, __debug_abbrev): 3241 | Section (__DWARF, __debug_abbrev): 1535
Section (__DWARF, __debug_aranges): 48 | Section (__DWARF, __debug_aranges): 0
Section (__DWARF, __debug_macinfo): 0 Section (__DWARF, __debug_macinfo): 0
Section (__DWARF, __debug_line): 126106 | Section (__DWARF, __debug_line): 149797
Section (__DWARF, __debug_loc): 0 Section (__DWARF, __debug_loc): 0
Section (__DWARF, __debug_pubnames): 168873 | Section (__DWARF, __debug_pubnames): 165104
Section (__DWARF, __debug_pubtypes): 32449 |
Section (__DWARF, __debug_str): 17541 | Section (__DWARF, __debug_str): 0
Section (__DWARF, __debug_ranges): 456 | Section (__DWARF, __debug_ranges): 0
Section (__DATA, __const): 100 | Section (__DATA, __const): 136
Section (__TEXT, __cstring): 11543 | Section (__TEXT, __cstring): 12678
Section (__DATA, __data): 64 | Section (__DATA, __data): 76
Section (__DATA, __const_coal): 48 |
Section (__TEXT, __const_coal): 128 |
Section (__DATA, __mod_init_func): 4 | Section (__DATA, __mod_init_func): 4
Section (__DATA, __bss): 32 | Section (__DATA, __bss): 65
Section (__TEXT, __textcoal_nt): 116324 | Section (__TEXT, __textcoal_nt): 168920
Section (__TEXT, __literal8): 8 | Section (__TEXT, __eh_frame): 88636
Section (__TEXT, __StaticInit): 147 | Section (__TEXT, __StaticInit): 166
Section (__IMPORT, __jump_table): 12790 | Section (__IMPORT, __jump_table): 12410
Section (__IMPORT, __pointers): 136 | Section (__IMPORT, __pointers): 128
total 1495929 | total 2211546
total 1495968 | total 2211617
There are several problems here:
- We’re emitting __eh_frame even though it is being built with -fno-exceptions: http://llvm.org/PR2481. Just the excess labels alone give the assembler a lot more work to do.
- The __debug_info section is twice as big and the __debug_line section is a bit bigger: http://llvm.org/PR2482
- We aren’t outputting text or data __const_coal sections. I’m not sure what these are, but they seem preferable to __textcoal_nt: http://llvm.org/PR2483
Also, we have no __debug_pubtypes, __debug_aranges, __debug_str, __debug_ranges or sections. I have no idea what these are, but could be a problem
Fixing these are important for a couple of reasons. Generating more output takes more time, both in the assembler but also in the compiler to push all this around.
Moving up from the assembler, according to -ftime-report, our time in cc1plus is basically going into:
LLVM Passes:
2.65s → X86 DAG->DAG Instruction Selection (all selectiondag stuff)
0.54s → X86 AT&T-Style Assembly Printer
0.42s → Live Variable Analysis
0.19s → Local Register Allocator
…
C++ Front-end time:
- 2.22s Tree to LLVM translator
- 1.94s parser
- 2.07s name lookup
- 0.66s preprocessor
- 0.20s gimplify
This doesn’t add up to 8.4s because -ftime-report adds significant overhead. It isn’t to be trusted, but is a decent indicator.
From this, it looks like there is significant room for improvement in many of the LLVM pieces. The two that sick out are the tree to llvm translator and the selection dag related stuff. However, even the asmprinter is taking a significant amount of time. This is partially because it has to output a ton of stuff, but even then it could be improved.
For example, picking on the frontend for a bit, we spend 10% of “-emit-llvm -O0 -g -c” time in DebugInfo::EmitFunctionStart, most of which is spent recursively walking the debug info with DISerializer. We also spend 9.3% of the time in DebugInfo::EmitDeclare, 10% of the time in eraseLocalLLVMValues, 12% of the time writing the .bc file (which isn’t relevant to normal use), 21% of time parsing (which we can’t help),
Anyone interested in picking off a piece and tackling it?
-Chris