I have a compiler project using an LLVM backend (compiled for 64 bit windows), and since it has started to get to production level source code sizes the compile times have gotten somewhat extreme (around 30 mins each time).
Not sure what is the best metric for measuring whether 30 mins is reasonable. The .ll output file is ~1.5 million lines.
A snip of the top of the stats output is below:
2495586 asm-printer - Number of machine instrs printed
2 assembler - Number of assembler layout and relaxation steps
32434 assembler - Number of emitted assembler fragments - align
106972 assembler - Number of emitted assembler fragments - data
11179 assembler - Number of emitted assembler fragments - fill
86082 assembler - Number of emitted assembler fragments - relaxable
236667 assembler - Number of emitted assembler fragments - total
22267993 assembler - Number of emitted object file bytes
1083677 assembler - Number of evaluated fixups
648094 assembler - Number of fragment layouts
41654 assembler - Number of relaxed instructions
2345 branchfolding - Number of block tails merged
687 branchfolding - Number of branches optimized
1047 branchfolding - Number of dead blocks removed
281 branchfolding - Number of times common instructions are hoisted
8002 codegen-dce - Number of dead instructions deleted
860 codegenprepare - Number of GEPs converted to casts
8907 codegenprepare - Number of blocks eliminated
591 codegenprepare - Number of memory instructions whose address computations were sunk
801 codegenprepare - Number of uses of Cast expressions replaced with uses of sunken Casts
203 codegenprepare - Number of uses of Cmp expressions replaced with uses of sunken Cmps
315089 dagcombine - Number of dag nodes combined
135493 isel - Number of blocks selected using DAG
695 isel - Number of entry blocks encountered
9853322 isel - Number of times dag isel has to try another path
The top of the pass execution table is:
Total Execution Time: 2061.1788 seconds (2068.1383 wall clock)
—User Time— --System Time-- --User+System-- —Wall Time— — Name —
682.0364 ( 33.1%) 0.0468 ( 2.6%) 682.0832 ( 33.1%) 684.0131 ( 33.1%) Machine Instruction Scheduler
309.1940 ( 15.0%) 1.0920 ( 60.9%) 310.2860 ( 15.1%) 311.8238 ( 15.1%) X86 DAG->DAG Instruction Selection
295.4971 ( 14.3%) 0.2964 ( 16.5%) 295.7935 ( 14.4%) 296.6090 ( 14.3%) Greedy Register Allocator
231.7395 ( 11.3%) 0.0156 ( 0.9%) 231.7551 ( 11.2%) 232.4293 ( 11.2%) Simple Register Coalescing
196.9981 ( 9.6%) 0.0000 ( 0.0%) 196.9981 ( 9.6%) 197.6663 ( 9.6%) Live Variable Analysis
49.5459 ( 2.4%) 0.0156 ( 0.9%) 49.5615 ( 2.4%) 49.7798 ( 2.4%) Live Interval Analysis
23.6342 ( 1.1%) 0.0000 ( 0.0%) 23.6342 ( 1.1%) 23.8153 ( 1.2%) Machine Copy Propagation Pass
21.9181 ( 1.1%) 0.0000 ( 0.0%) 21.9181 ( 1.1%) 21.8963 ( 1.1%) Machine Loop Invariant Code Motion
I’m looking to find out whether this is something we can affect significantly via changing our IR code gen phase, altering llvm (or opt) parameters or something else.
Is the compile time likely to be exponential with source file size or something like that? (the compile above is currently combining a number of modules that we could perhaps separate).
Also, we are on a pretty old version now (3.7) as that was what was current when the project started in 2016, but would rather not upgrade (right now) if it is not likely to affect the performance here. Would be interested in opinion in whether this is likely to have an effect. Was planning on dumping an IR file and running it through 6.0 independently when I have time to build that for windows version.
Any advice appreciated.