Hi All,
Am I mistaken in concluding that the initial compile for LTO will generate optimized LLVM IR to be merged and fed into the LTO backend?
I was speculating about whether having clang’s -flto generate unoptimized LLVM IR would create more opportunities for the LTO backend to generate better overall performance or reduced code size.
Does anyone have any thoughts to share on this topic?
Thanks and Regards,
Todd Snider
Quite a few people did a lot of experiment on this. In general the optimizations you do at compile time don’t hurt or prevent further optimizations done during LTO.
Then the more you can add during LTO the better of course, but you get into a tradeoff with compile time aspect: for a FullLTO approach, the compile phase is parallel but the LTO phase is single-threaded. Any optimization that you move from the compile phase to the LTO phase will drastically slow-down the process.
ThinLTO has the benefit of having both phases parallel, which is why we were able to move some optimization from the compile phase to the LTO phase: the ThinLTO pass pipeline is shorter in the compile phase compared to LTO, but much longer during LTO. When I tried to use the same pipeline for FullLTO, the link-time was twice as slow!
It is also always beneficial to run some optimization early, because you will reduce the size of the IR and even eliminate entire functions (inline
functions can be removed when they don’t have any call-site left): this impacts the size of the bitcode on disk and the time it’ll take to reload it and process it during link-time.
See also the backup slides we had in this talk: https://llvm.org/devmtg/2016-11/Slides/Amini-Johnson-ThinLTO.pdf
3 Likes
Hi Mehdi,
That’s very useful information, Thanks!
~ Todd