AArch64 Instruction Selection taking a long time with --lto-O0

One of the things that contribute to excessive toolchain build times is the use of LTO on test binaries and unittests. In all cases, the runtime benefits of LTO don’t outweigh the compile time costs for tests. Linking the clang Frontend unittest can take 10+ minutes when running thinLTO. Luckily, this change makes it so that unittests are compiled with --lto-O0 which brings me to my next problem.

When targeting for Darwin arm64, I found that linking the clang Frontend unittest can still take 4-5 minutes even with --lto-O0. From a recent profile I did, the pass that takes the longest is the AArch64 Instruction Selection pass. Switching to use GlobalISel also produces similar results, but instead taking an almost equivalent amount of time on the IRTranslator pass. Including debug info in the object files 2x’s the time, which is unfortunate since building with debuginfo makes it easier for developer experience. I’m wondering if anyone has seen this issue or knows if there’s a way to speed this up.

Alternatively, I know there are recent changes to have -ffat-lto-objects, but it doesn’t support lld Mach-O port yet. I’m not even sure if the functionality supports arm64 either (though in theory it should?). cc: @petrhosek @smeenai

1 Like

You’re correct that there isn’t a fundamental reason why FatLTO couldn’t be used for targets other than ELF. In fact it’s something I’d like to see happen. As far as I know the other linkers just need to enable support.

The only other bit that would need to happen would be to differentiate the old embed bitcode section names from the new ones. For ELF, I believe @MaskRay added some section flags we could use to differentiate them in addition to a new naming convention. So if there are equivalent approaches we can take for COFF and MachO then I think most of the support would be straightforward to add.

Lastly, there are some improvements to the FatLTO pipeline that I want to prioritize this quarter to avoid naively running two distinct pipelines on separate modules. I think @nikic has a prospective patch to make the Thin and Full LTO pipelines more similar, which should allow us to simplify the FatLTO pipeline a great deal.

1 Like

Yep, I was thinking of taking that on (if I get around to it), provided the existing clang support work done for FatLTO applies to MachO ports as well. It sounds like that isn’t the case. Do you know if anyone is actively investigating this portion?

Have you seen any compilation time increase with the current approach?