Diego, Teresa, David,
Sorry for my delayed reply; I left for vacation right after sending my message about this.
Diego, it wasn't explicit from your message whether LLVM LTO can handle Firefox-scale programs, which you said GCC can handle. I assumed that's what you meant, but could you confirm that? I understand that neither can handle the very large Google applications, but that's probably not a near-term concern for a project like the one Charles is embarking on.
Vikram, LLVM can handle Firefox size programs. Honza wrote two good
articles about LTO.
Comparison with LLVM is described in the second article. It took about
40min to finish building Firefox with llvm using lto and -g. The
following is a quote:
"This graph shows issues with debug info memory use. LLVM goes up to
35GB. LLVM developers are also working on debug info merging
improvements (equivalent to what GCC's type merging is) and the
situation has improved in last two releases until the current shape.
Older LLVM checkouts happily run out of 60GB memory & 60GB swap on my
I'd be interested to hear more about the LTO design you folks are working on, whenever you're ready to share the details.
We will share the details as soon as we can -- possibly some time in Jan 2015.
I read the GCC design docs on LTO, and I'm curious how similar or different your approach will be. For example, the 3-phase approach of WHOPR is fairly sophisticated (it actually follows closely some research done at Rice U. and IBM on scalable interprocedural analysis, in the same group where Preston did his Ph.D.).
In Google, we care mostly about peak optimization performance. Peak
Optimization is basically PGO + CMO. For cross-module optimization
(CMO) to be usable for large applications, small memory footprint is
just one aspect of it, and fast build time is equally important. Peak
optimization is not only used in release build but in developer
workflow too. This means build time with CMO should be close to O2
time as much as possible. It is important to compiler engineers too
-- you don't want to wait for more than 20min to hit a breakpoint in
debugging a compiler problem
For this reason, GCC LTO is not used in Google. Instead, the much more
scalable solution called LIPO is widely used for CMO:
https://gcc.gnu.org/wiki/LightweightIpo. LIPO by design requires PGO.
While LIPO is scalable, it has its own limitation that prevents the
compiler from maximizing the benefit of CMO. The new design is
intended to solve the problem with more very aggressive objectives.
The new design is pretty simple and shares the basic principles of
LIPO without requiring PGO (though it still works best with PGO). It
still fits in LTO framework, so that toolchain support change is
minimized. For now, without giving details, I can share some of the
objectives of the new design:
* The build should be almost fully parallelizable (at both process
level and build machine node level)
* The build should scale to programs with *any/unlimited* size
(measured in number of TUs). It should handle programs 10x, 100x the
size of Firefox.
* The build time should be very close to non-LTO build, and can be
considered to be turned on *by default* for O2 or at least O3
* When turned on the by default, it can eliminate the need for
users to put inline functions in header files (thus greatly help
improving parsing time)
* Most of the benefit of CMO comes from cross module inlining and
cross module indirect call promotions. By default, the design only
enables these two, but it is still compatible with whole program
analysis which can be turned on with additional option.
For now, I would like to introduce you all to Charles, so that he has access to people working on this issue, which will probably continue to be a concern for his project. I have copied you on my reply to him.
thanks for introduction! I am interested in knowing more about