How to verify correct regalloc for a kernel?

Is there a way to (automatically) verify the correctness of register allocation for a given kernel after it has been compiled to machine code, e.g., by stepping through LLVM IR and comparing virtual register values to physical ones?

We appear to be hitting the well-known register spill bugs in our kernels with very high register pressure and conditional execution, but we’d like to verify whether the problems are in fact being caused by register allocation bugs, or whether certain compiler options and/or kernel fission strategies can verifiably avoid the miscompiles.

As far as I know, there is no way to automatically verify correctness of a register allocator. May I suggest running llc -print-changed=diff and looking at the output before and after the register allocator you’re using (default: greedy), to verify this by hand?

Not really, and it’s quite difficult to spot the WWM spilling errors. The primary source of these errors are live ranges split during register allocation itself, so it’s very far removed from the LLVM IR.

1 Like

Ah, ok. I was afraid of this.

Our kernels that fail in bizarre ways are ones that solve very large chemical reaction networks using O(10^4) registers, so identifying the errors from manual inspection was probably going to be essentially impossible anyway.

Is [AMDGPU] Split vgpr regalloc pipeline by cdevadas · Pull Request #93526 · llvm/llvm-project · GitHub expected to fix the spilling errors?

Yes

1 Like