Strange regalloc behaviour: one more available register causes much worse allocation

Preamble

attachments.zip (51.2 KB)

This has cropped up before in X86 (https://bugs.llvm.org/show_bug.cgi?id=26810 / https://reviews.llvm.org/rL316295), and there’s at least a partial mitigation
(I recently ran into an eviction change on X86 when trying variants of a MachineScheduler change, but couldn’t find a reproduction post the landed patch).

I suggest you try enabling enableAdvancedRASplitCost() for ARM and seeing if that helps.

-Nirav

enableAdvancedRASplitCost() does the same thing as ConsiderLocalIntervalCost, but as a

subtarget option instead of a command-line option, and as I’ve said it doesn’t help because

it’s a non-local interval causing the eviction chain (RAGreedy::splitCanCauseEvictionChain

only considers the local interval for a single block, and it’s unclear to me how to make it

handle a non-local interval).

John

+ Marina, maybe she would have ideas on how to improve the eviction
chain detection.

Hi John,

What you're seeing is indeed another instance of eviction chain
madness that Marina improved with the splitCanCauseEvictionChain but
is still a problem.
The problem with what you're suggesting is that it will improve this
instance, but I am pretty sure it will regress some others.
Unfortunately, this is the case we whatever change we do in that space
though.

Does anyone know where and how exactly these bundles are decided?

I have to relearn this every time there is something wrong happening
there... Needless to say this is complicated! IIRC the bundles are
chosen using an Hopfield network, which in short is a kind of neural
network that is guarantee to converge to a local minimum. As you could
imagine, there is little tweak you can do here and maybe its time to
rethink that approach.

Cheers,
-Quentin

A bit of an update on this: I looked into SpillPlacement.cpp, which is
where the choice of the bundles to split the live range at is done, and
I think that the choice it's making is correct given the information it
has. It's recognising that it's a bad idea to split around bb.17, but
it's an even worse idea to not split before bb.30 as it's a very hot
block and inserting a spill there would be much worse. Or in other words,
if we didn't have an eviction chain then the choice would have been
correct, so fixing this looks like it will require some kind of eviction
chain detection.

I had a go at making RAGreedy::calcGlobalSplitCost (and the functions it
calls) consider all intervals, not just local intervals, when looking for
eviction chains and it looks like that does work. I'll look at it some
more in the new year, as I'm now on holiday, to see if it's generally a
good idea.

John

Hi John,

It indeed looks like the eviction chains I encountered.
In a later message you mentioned you've experimented in extending calcGlobalSplitCost to consider non local intervals too, which seems like a good idea.
I wonder if there are any cases where the following scenario doesn't cause an eviction chain of several movs and is considered as a good split decision:
vreg0 evicts vreg1 from physreg0, vreg1 is split into vreg2 and vreg3, one of the vreg2/vreg3 evicts vreg0 from physreg0 again.
If so, maybe there is something else we are missing when making the cost decisions.

I think the best way to check this is to detect these scenarios having the ability to allow/prevent them (this seems to be what you did) and then check the global spill cost of several workloads, see if there are any workloads for which allowing these scenarios is better than preventing them.
Last I checked LLVM had a statistic that counted the number of spills in loops (reportNumberOfSplillsReloads()), but this statistic didn't count the cost of those spills (considering the BB frequency), so this needs to be fixed to get a more accurate evaluation of the global spill cost.

Thanks,
Marina