Hello,
I’m currently investigating a bug in the AMDGPU backend that’s caused by a pass updating live intervals information incorrectly. It ends up adding a live range to a segment that doesn’t make sense and it eventually crashes the register allocator.
Here is a greatly simplified (and likely broken) example to illustrate the problem:
# We have the following basic blocks, in order:
- bb.0: 0B -> 32B - Value X is def
- bb.1: 48B -> 70B - Value X is used for the last time
- bb.2: 86B -> 118B
- bb.3: 134B -> 166B - Value X is live-through (unused in this block)
- (etc.)
# The CFG goes through the blocks in a different order, e.g.
bb.0 -> bb.3 -> bb.1 -> bb.2
The original (sane) live interval has two segments
[16r, 60r) [134B, 182B)
One covers the def of X in bb.0, to its death in bb.1. The other represents the fact that X is live through bb.3 (but has no users inside it), which is the block in-between 0 and 1.
(I hope this makes sense, I just recently got into register allocation so the syntax/specifics may be off.)
Our (broken) pass messes with the range, and creates something like:
[50r, 60r) [134B, 182B)
(Note: The first segment is really not important, I gave it as an example with random values.)
Now, the segment for the live-through value X in bb.3 [134B, 182B)
seems broken: it’s meant to represent the fact that X is live-through bb.3 (phi-def ?), but none of the predecessors of bb.3 are covered by any other segment in the interval. The value essentially appears out of thin air.
We were wondering if the verifier should have caught this? I can’t think of a use case where it would make sense that a value is suddenly alive in a BB with no definitions in any successors. Should the live range or live interval verifier have caught it?
I was thinking of updating the verifier so that it checks that, if a value starts at a block index, at least one predecessor of the block (or all of them?) are covered by another segment in the interval. Would that make sense?
Do we already have some code that should have caught this, but didn’t, indicating that something else may be broken in the live interval?
Thanks.