Recently we discovered a (hidden) performance bug that happened when you’re using Full LTO setup + BFD linker to compile first-stage PGO executables that were expected to generate IR level profile files. That is, compiling an executable using following commands:
clang -flto -fprofile-generate -c sample.c -o sample.c.bc.o clang -flto sample.c.bc.o -o sample_exe -fprofile-generate
The resulting sample_exe executable will always emit Frontend level profile files despite the presence of the -fprofile-generate flag, which should make the executable generate IR level profile files. Other prerequisites to reproduce this problem including:
• Using (latest version of) BFD linker
• Using LLVM’s own LTO linker plugin (i.e. LLVMgold.so)
• Using LLVM’s compiler-rt as the runtime library
• On a Linux / BSD platform
This problem has been confirmed not to happen with gold and LLD linkers. Impacts on usages on Windows platform remain unknown.
The intriguing (or annoying) part of this bug is that you’ll never find the issue (since it neither crashes nor causes significant performance regression) unless you take a look into the generated profile files. So I just want to put a heads up here, though I know most of the folks here are moving away from using BFD linkers.