Is the fast register allocator O(n^2) in the size of the function?

Apologies if this email is a bit weirdly formatted; I had actually
already unsubscribed from the mailing list when I noticed that I had
received a response.

Here is the stack trace I found for LLVM 11:

#0 hasOneNonDBGUse () at
/build/llvm-toolchain-11-9xfOLw/llvm-toolchain-11-11.0.0/llvm/lib/CodeGen/MachineRegisterInfo.cpp:420
#1 0x00007ffff3e8a478 in defineVirtReg () at
/build/llvm-toolchain-11-9xfOLw/llvm-toolchain-11-11.0.0/llvm/lib/CodeGen/RegAllocFast.cpp:787
#2 0x00007ffff3e883d1 in allocateInstruction () at
/build/llvm-toolchain-11-9xfOLw/llvm-toolchain-11-11.0.0/llvm/lib/CodeGen/RegAllocFast.cpp:1188
#3 allocateBasicBlock () at
/build/llvm-toolchain-11-9xfOLw/llvm-toolchain-11-11.0.0/llvm/lib/CodeGen/RegAllocFast.cpp:1277
#4 runOnMachineFunction () at
/build/llvm-toolchain-11-9xfOLw/llvm-toolchain-11-11.0.0/llvm/lib/CodeGen/RegAllocFast.cpp:1316
#5 0x00007ffff3d8a39e in runOnFunction () at
/build/llvm-toolchain-11-9xfOLw/llvm-toolchain-11-11.0.0/llvm/lib/CodeGen/MachineFunctionPass.cpp:73
#6 0x00007ffff3bc7579 in runOnFunction () at
/build/llvm-toolchain-11-9xfOLw/llvm-toolchain-11-11.0.0/llvm/lib/IR/LegacyPassManager.cpp:1516
#7 0x00007ffff3bccb23 in runOnModule () at
/build/llvm-toolchain-11-9xfOLw/llvm-toolchain-11-11.0.0/llvm/lib/IR/LegacyPassManager.cpp:1552
#8 0x00007ffff3bc7b90 in runOnModule () at
/build/llvm-toolchain-11-9xfOLw/llvm-toolchain-11-11.0.0/llvm/lib/IR/LegacyPassManager.cpp:1617
#9 run () at /build/llvm-toolchain-11-9xfOLw/llvm-toolchain-11-11.0.0/llvm/lib/IR/LegacyPassManager.cpp:614
#10 0x000000000040d792 in main ()

It looks like at least as of LLVM 11, RegAllocFast.cpp was indeed
calling hasOneNonDBGUse, which was accounting for some 55% of the self
time reported in my profile on the bitcode file I was testing with.

I did some digging into the more recent history of LLVM and it seems
that this function was refactored sometime between LLVM 11 and LLVM 12
RC1, so I went ahead and built the most recent release candidate from
source to test if the problem still occurred or not. I don't remember
seeing a release candidate when I originally composed the email or I
would have tried it then, although it seems RC1 must have been out by
then. I must have overlooked it by accident.

With that being said, when I run the same bitcode file with LLVM 12
RC2, I see that register allocation only takes 4% of the CPU time in
total. I also see that the time spent in register allocation has
reduced to only 5 seconds from 5 minutes, a 60-fold improvement!

I do notice that 30% of self time is spent in each of
llvm::FastISel::handlePHINodesInSuccessorBlocks and
llvm::SelectionDAGBuilder::HandlePHINodesInSuccessorBlocks, for a
whopping 60% of self time total, but that is a completely separate
issue, and much more acceptable given that wall time has decreased
from 10 minutes for the entire process to 2.5.

In short, it looks like someone else already beat me to the punch.
Sorry for wasting your time. It's good to know that this issue will be
addressed for us simply by waiting for LLVM 12 to be released, though!

Thanks,
Dwight

Glad to hear that your problem will be fixed by LLVM 12!