Sorry for the delay, it has taken some extra time as more than one bug showed up J
I continued to look into this with your viewpoint that a node that is conservatively allocatable should never be spilled. The first thing I did was therefore to add some extra code with an assert for this.
I believe I then found three bugs and fixed the two:
Bug 1: Incorrect transpositions in handleAddEdge() / hanldeRemoveEdge(). For the heuristic of DeniedOpts, if N runs along the columns (Transpose == true), the interesting dimension is along the rows, and vice versa. In other words, the question is what the neighbour could maximally deny N, and the answer is found in WorstRow, not WorstCol (as it was). This makes a difference if sub/super registers overlap.
Bug 2: When it happens that there are no available physical registers for a node (e.g because of regmasks or calls), the number of rows / columns is just 1, for the spill option. This case must be detected in MatrixMetadata(), or WorstColCountForCurRow will get an undefined value, as std::max_element() is called with an empty vector.
Bug 3: Again a conservatively allocatable node was spilled, and the assert triggered. This is a description of what happened in my out-of-tree test case:
applyR2() called on node N, which is overlapping Y and Z. The edges (N,Y) and (N,Z) are all-zeroes. Y and Z are overlapping and already have an interference edge. Z is just on the limit of not being conservatively allocatable: NumOpts is 8 and DeniedOpts is also 8. It is contained in NotProvablyAllocatableNodes. G.setEdgeCosts() is called and then the call stack grows with Solver->handleSetEdgeCosts(), handleRemoveEdge() into handleDisconnectEdge(), where NMd.handleRemoveEdge() is called, which decreases the DeniedOpts by one. After this, it looks like a bug to me that Z is moved to ConservativelyAllocatableNodes, because eventually handleSetEdgeCosts() will complete, and the edge between Y and Z will have been added again in handleAddEdge(), and Z:DeniedOpts is again 8!
I think this also shows up in a test case for arm. It was found by using the assert mentioned above, and running ‘bin/llvm-stress -size 200 -seed 17761’. The attached file (pbqp_reduced.ll) is the failing test case found reduced with bugpoint. Apply patch 2 (the assert), and then run ‘llc pbqp_reduced.ll -mtriple=aarch64-none-linux-gnu -mcpu=cortex-a57 -mattr=+neon -optimize-regalloc -regalloc=pbqp’. The assert shows a node that is spilled that was conservatively allocatable. The test case itself will not fail without this assert. Add the Verbose-PBQP-dump patch and use ‘-debug-only=regalloc -pbqp-dump-graphs’ to see the more verbose output which shows the progress of the algorithm leading to the assert:
- Applied R2(NId 18)handleDisconnectEdge(9, 2) : DeniedOpts 10 → 9
NId 9(%vreg15, GPR64common) moved to conservatively-allocatables.
handleDisconnectEdge(2, 9) : DeniedOpts 10 → 9
NId 2(%vreg4, GPR64common) moved to conservatively-allocatables.
Popped NId 2(%vreg4, GPR64common) , all edge costs added:
2.002748e+01 inf inf inf inf inf inf inf inf inf inf ** selection: 0
llc: …/include/llvm/CodeGen/PBQP/ReductionRules.h:214: llvm::PBQP::Solution llvm::PBQP::backpropagate(GraphT&, StackT, llvm::PBQP::NodeSet&) [with GraphT = llvm::PBQP::Gra
phllvm::PBQP::RegAlloc::RegAllocSolverImpl; StackT = std::vector; llvm::PBQP::NodeSet = std::set]: Assertion `PushedAsConservativelyAllocatabl
eNodes.find(NId) == PushedAsConservativelyAllocatableNodes.end() && “Node in conservatively allocatable set spilled!”’ failed.
I am not that familiar with the arm architecture, but it looks like NId 2 is incorrectly moved from not-provens to conservatively-allocables, after R2(NId 18). This looks to me like the same bug, but I could be wrong. Note that it does not show up if the bugfix for the transpositions is applied (for random reasons I beleive).
I am not sure how to fix it myself, as it seems non-trivial relating to temporary edge removal. Does anyone have any idea?
In addition to the reduced test case, I attach four patches, where 1 and 4 are not ready to commit, but I would be happy to modify and commit them if wanted:
0001-Assert… The assert (with some extra code related to it) that checks that a node that was conservatively allocatable never spills. My quick way to achieve this was to use an extra set of nodes as a means of remembering their origin. This is a very good idea, considering that this should never happen, and that there is some worry that the code might be incorrect.
0001-Bugfix-in-PBQP-matrix-transpositions… A bugfix for the transposition bug. (Bug 1). Small patch and also sent to llvm-commits for review.
0001-Bugfix-in-PBQP-regarding-Matrixes-and… A bugfix for the 1-column case (Bug 2). Small patch and also sent to llvm-commits for review.
0001-Verbose… Increased debug-dump and pbqp-graph output. Quite crude code, because I had a conflict in the ARM backend when I tried to #define DEBUG_TYPE “regalloc”. I think this could help PBQP get a wider audience, perhaps, even if it is still far from perfect. Perhaps this could be cleaned up and a target-independent name for all pbqp-debug-dumps could be found and then commited.
Looking forward to your reply,
pbqp_reduced.ll (3.08 KB)
0001-Assert-in-PBQP-that-a-node-that-selects-0-spilled-wa.patch (2.73 KB)
0001-Bugfix-in-PBQP-matrix-transpositions.patch (1.85 KB)
0001-Bugfix-in-PBQP-regarding-Matrixes-and-spill-only-nod.patch (1.23 KB)
0001-Verbose-PBQP-dump-output.patch (12.2 KB)