PBQP crash

Hi,

I have run into a test case on an out-of-tree target where PBQP fails to complete register allocation after “Attempting to spill already spilled value” (the triggered assert in InlineSpiller::spill().

First, the original LiveInterval is spilled. It is a load of a symbol into a narrow register class, i.e. a subset of the class of address registers. InlineSpiller decides to rematerialize the load of the symbol to lie right before its only user, which makes good sense. The original def is removed.

The new LiveInterval pushed is thus much smaller in the next PBQP round. The spill cost is marked as ‘inf’ during graph building. This small interval has also a lot of overlapping intervals and thus edges in the PBQP graph. It gets pushed on the node stack to later be popped after 17 others.

Those 17 nodes use up all registers of the narrow reg-class, and the cost vector has become all infinities. Spill option is selected again, and thus the error is a fact of spilling an already spilled value.

I wonder what has gone wrong here, and have some initial thoughts:

  • The problematic node that was spilled again, was in the ConservativelyAllocatableNodes set during reduce(). The comment in reduce() “Conservatively allocatable nodes will never spill…” indicates that perhaps this is an incorrect insertion, as the regs did in fact run out in this case.

In setup(), the node is first put into not-provably-allocatables. However, one of it’s neigbhour invoked handleDisconnectEdge(), and moves it into conservatively-allocatables, because DeniedOpts had become less than NumOpts (in isConservativelyAllocatable().

  • There are lots of spillable nodes being popped before the one that can’t be spilled. This seems intuitively wrong, as they are intervals that actually could be spilled.

I would really appreciate some help and pointers on what might be going wrong here,

Jonas Paulsson

Hi Jonas,

I think you may be facing a problem with the reduction order. I found out it is important and can affect the quality of the allocation, and with your case, it affects the allocability.

With your example, I think the node should be pushed amongst the last nodes onto the reduction order list: spilling is not an option, so it is more constrained and should be allocated first. It should simply not have made it into the ConservativelyAllocatableNodes. It should have stayed into the NotProvablyAllocatableNodes set, and the SpillCostComparator will ensure it is on the top of the reduction list (because its spill cost is +inf).

The logic using NodeMetadata::isConservativelyAllocatable should be tweaked to detect such nodes. This would take place in RegAllocSolverImpl, because the node costs cannot be accessed in NodeMetadata.

Lang will probably be able to give a more educated answer than mine.

Thanks for using PBQP !

Cheers,

Arnaud

Hi Jonas,

  • The problematic node that was spilled again, was in the ConservativelyAllocatableNodes set during reduce(). The comment in reduce() “Conservatively allocatable nodes will never spill…” indicates that perhaps this is an incorrect insertion, as the regs did in fact run out in this case.

Arnaud is correct: A node should never be put into the conservatively allocatable list if there is a chance of it spilling. Off the top of my head I can imagine 2 things going wrong here:

(1) Conservative allocability is mostly-precomputed, and updated with deltas as nodes are removed. It is possible that there is some subtle bug in this code.

(2) I think the current conservative allocability test bakes in the assumption that all register options have non-infinite cost. If you assign infinite costs to any physical register I would expect this to blow up.

Are you able to share a test case at all? If so that would be great. If not, I can add an option to the allocator to dump abstract PBQP graphs, and I could use these to test the problem on my end.

Regards,
Lang.

A node should never be put into the conservatively allocatable list if there is a chance of it spilling.

I can understand why the logic of NodeMetadata::isConservativelyAllocatable is necessary for the node to be allocatable, but I have not been able to convince myself this is sufficient, especially when the node degree > available registers.

Cheers,

Arnaud

Hi,

Sorry for the delay, it has taken some extra time as more than one bug showed up J

I continued to look into this with your viewpoint that a node that is conservatively allocatable should never be spilled. The first thing I did was therefore to add some extra code with an assert for this.

I believe I then found three bugs and fixed the two:

Bug 1: Incorrect transpositions in handleAddEdge() / hanldeRemoveEdge(). For the heuristic of DeniedOpts, if N runs along the columns (Transpose == true), the interesting dimension is along the rows, and vice versa. In other words, the question is what the neighbour could maximally deny N, and the answer is found in WorstRow, not WorstCol (as it was). This makes a difference if sub/super registers overlap.

Bug 2: When it happens that there are no available physical registers for a node (e.g because of regmasks or calls), the number of rows / columns is just 1, for the spill option. This case must be detected in MatrixMetadata(), or WorstColCountForCurRow will get an undefined value, as std::max_element() is called with an empty vector.

Bug 3: Again a conservatively allocatable node was spilled, and the assert triggered. This is a description of what happened in my out-of-tree test case:

applyR2() called on node N, which is overlapping Y and Z. The edges (N,Y) and (N,Z) are all-zeroes. Y and Z are overlapping and already have an interference edge. Z is just on the limit of not being conservatively allocatable: NumOpts is 8 and DeniedOpts is also 8. It is contained in NotProvablyAllocatableNodes. G.setEdgeCosts() is called and then the call stack grows with Solver->handleSetEdgeCosts(), handleRemoveEdge() into handleDisconnectEdge(), where NMd.handleRemoveEdge() is called, which decreases the DeniedOpts by one. After this, it looks like a bug to me that Z is moved to ConservativelyAllocatableNodes, because eventually handleSetEdgeCosts() will complete, and the edge between Y and Z will have been added again in handleAddEdge(), and Z:DeniedOpts is again 8!

I think this also shows up in a test case for arm. It was found by using the assert mentioned above, and running ‘bin/llvm-stress -size 200 -seed 17761’. The attached file (pbqp_reduced.ll) is the failing test case found reduced with bugpoint. Apply patch 2 (the assert), and then run ‘llc pbqp_reduced.ll -mtriple=aarch64-none-linux-gnu -mcpu=cortex-a57 -mattr=+neon -optimize-regalloc -regalloc=pbqp’. The assert shows a node that is spilled that was conservatively allocatable. The test case itself will not fail without this assert. Add the Verbose-PBQP-dump patch and use ‘-debug-only=regalloc -pbqp-dump-graphs’ to see the more verbose output which shows the progress of the algorithm leading to the assert:

  • Applied R2(NId 18)handleDisconnectEdge(9, 2) : DeniedOpts 10 → 9

NId 9(%vreg15, GPR64common) moved to conservatively-allocatables.

handleDisconnectEdge(2, 9) : DeniedOpts 10 → 9

NId 2(%vreg4, GPR64common) moved to conservatively-allocatables.

Popped NId 2(%vreg4, GPR64common) , all edge costs added:

2.002748e+01 inf inf inf inf inf inf inf inf inf inf ** selection: 0

llc: …/include/llvm/CodeGen/PBQP/ReductionRules.h:214: llvm::PBQP::Solution llvm::PBQP::backpropagate(GraphT&, StackT, llvm::PBQP::NodeSet&) [with GraphT = llvm::PBQP::Gra

phllvm::PBQP::RegAlloc::RegAllocSolverImpl; StackT = std::vector; llvm::PBQP::NodeSet = std::set]: Assertion `PushedAsConservativelyAllocatabl

eNodes.find(NId) == PushedAsConservativelyAllocatableNodes.end() && “Node in conservatively allocatable set spilled!”’ failed.

I am not that familiar with the arm architecture, but it looks like NId 2 is incorrectly moved from not-provens to conservatively-allocables, after R2(NId 18). This looks to me like the same bug, but I could be wrong. Note that it does not show up if the bugfix for the transpositions is applied (for random reasons I beleive).

I am not sure how to fix it myself, as it seems non-trivial relating to temporary edge removal. Does anyone have any idea?

In addition to the reduced test case, I attach four patches, where 1 and 4 are not ready to commit, but I would be happy to modify and commit them if wanted:

  1. 0001-Assert… The assert (with some extra code related to it) that checks that a node that was conservatively allocatable never spills. My quick way to achieve this was to use an extra set of nodes as a means of remembering their origin. This is a very good idea, considering that this should never happen, and that there is some worry that the code might be incorrect.

  2. 0001-Bugfix-in-PBQP-matrix-transpositions… A bugfix for the transposition bug. (Bug 1). Small patch and also sent to llvm-commits for review.

  3. 0001-Bugfix-in-PBQP-regarding-Matrixes-and… A bugfix for the 1-column case (Bug 2). Small patch and also sent to llvm-commits for review.

  4. 0001-Verbose… Increased debug-dump and pbqp-graph output. Quite crude code, because I had a conflict in the ARM backend when I tried to #define DEBUG_TYPE “regalloc”. I think this could help PBQP get a wider audience, perhaps, even if it is still far from perfect. Perhaps this could be cleaned up and a target-independent name for all pbqp-debug-dumps could be found and then commited.

Looking forward to your reply,

Jonas Paulsson

pbqp_reduced.ll (3.08 KB)

0001-Assert-in-PBQP-that-a-node-that-selects-0-spilled-wa.patch (2.73 KB)

0001-Bugfix-in-PBQP-matrix-transpositions.patch (1.85 KB)

0001-Bugfix-in-PBQP-regarding-Matrixes-and-spill-only-nod.patch (1.23 KB)

0001-Verbose-PBQP-dump-output.patch (12.2 KB)

Hi Arnaud,

The conservatively allocatable test is supposed to check two conditions, either of which would be sufficient to make a node allocatable:

(1) There exists some register that is not aliased by any register option for any neighbor. This is the “safe row” test. It is straightforward, but likely to fire only rarely.

(2) The sum of the maximum number of registers aliased by any register for each neighbor is less than the number of available registers for this node. This is the “worst-column” test. More intuitively: for each neighbor you compute the maximum number of registers that could be “taken” from this node by an allocation to that neighbor. If you assume (conservatively) that none of these “taken” registers alias one another then you can sum them to find the maximum number of registers that might be unavailable to this node. If this sum is lower than the number of registers available then you’re safely allocatable.

Cheers,
Lang.

Thanks for the explanations !

  • Re-sending to include llvm-dev.

HI Jonas,

This is great - thank you very much for your analysis!

You’re spot on about Bug 1 - the row/column checks are transposed there. I have fixed this in r227628.

Regarding Bug 2, as discussed on the other thread I’m going to teach the register allocator to prune single-option vregs so that they never make it into the graph.

I haven’t had a chance to dig in to Bug 3 yet, but I hope to soon.

Cheers,
Lang.

The dump of the graph can certainly be improved. I’ll look into it.

The problem you had with an ARM backend is due to the fact that the DEBUG_TYPE macro is supposed to be used cautiously (if at all) in a header, as it will probably collide when it is included in .cpp files: as it did with the AArch64 target: see the comment at the top of include/llvm/Support/Debug.h.

Cheers,