PBQP & register pairing

Arnaud,
another way to look at it, if the description of your register sets includes “pairs”,
is that your assembly language syntax for MPQD is redundant, operand-2 is the second
half of the register-pair in operand-0, so an alternative is to let llvm think this is a two
operand instruction (one of them being a pair) rather than a three operand instruction.

even if you are not currently defining pairs in your register definitions, it might be less work
to do that than to write and add an extra new pass. Many many target machines have some
notion of register pairs, so it should not be too hard to find examples of how to do this.

YMMV,
-Peter Lawrence.

ARM's QQ and QQQQ register classes are examples.

It is currently a bit tedious to specify these constraints; cleaning that up is on my todo list.

/jakob

I also considered this approach, but did not want to dive in the constraint handling for now.

The PBQP path seemed easier at first sight --- and was easy to setup. And I always wanted to give a try to the pbqp :slight_smile:

I will add the hook to the pbqp and propose a patch if this looks clean enough.

Thanks,

Hi Arnaud,

That sounds great. I look forward to seeing a patch.

You may also look forward to big performance improvements in the PBQP allocator: I’m working on updates which will improve compile speeds and massively reduce memory use.

Regards,
Lang.

Attached is a small patch to allow users of the PBQP allocator to optionally insert a custom pass. I believe it can be usefull to other users of the pbqp.

I used it to undo some of the coalescer work, and make sure that I have different virtual registers, inserting a copy if necessary, to build a pair.

I noticed an unexpected — to me at least — behaviour of the allocator.

I have some instructions using 2 pairs of registers, say “mpra R_x, R_x+1, R_y, R_y+1”, and setting the pairing constraints R_x → R_x+1 and R_y → R_y+1 could silently produce wrong code like “mpra %R0, %R2, %R1, %R3”. I add to explicitly add another constraint to describe that y must differ from x, x-1 and x+1 to make the allocator build valid pairs, i.e “mpra %R0, %R1, %R2, %R3”. Is there anything I missed ?

Now, 99% of my users’ codebase is compiling. J

The last issue I have is an assert during regalloc in LiveIntervalAnalysis : “attempt to spill already spilled interval!”, and I do not know where to start looking. Any hint would be welcome.

Regards,

PBQP-customPass.patch (2.2 KB)

Hi Arnaud,

The patch looks good. I’ve committed it in r133249.

I noticed an unexpected — to me at least — behaviour of the allocator.

I have some instructions using 2 pairs of registers, say “mpra R_x, R_x+1, R_y, R_y+1”, and setting the pairing constraints R_x → R_x+1 and R_y → R_y+1 could silently produce wrong code like “mpra %R0, %R2, %R1, %R3”. I add to explicitly add another constraint to describe that y must differ from x, x-1 and x+1 to make the allocator build valid pairs, i.e “mpra %R0, %R1, %R2, %R3”. Is there anything I missed ?

Hmm. Let me make sure I’m reading this right. The constraints are that:
a) All four operands have distinct registers.
b) The first two are in a consecutive pair (second > first)
c) The second two are in a consecutive pair (fourth > third)

If that’s accurate then you will need four constraints to represent it. Two to enforce the pairing, which you’ve already described, and two to enforce the distinction. Without constraint enforcing the R_x != R_y distinction the PBQP allocator is free to allocate mpra %R0, %R1, %R0, %R1. You’ll also need (R_x+1 != R_y) to ensure that you don’t get something like mpra %R0, %R1, %R1, %R2.

That said my understanding of your pairing constraints would definitely prohibit mpra %R0, %R2, %R1, %R3 under any circumstances, even with out the distinction constraint. Either I’ve misunderstood your constraints, or there is a bug somewhere.

Now, 99% of my users’ codebase is compiling. J

The last issue I have is an assert during regalloc in LiveIntervalAnalysis : “attempt to spill already spilled interval!”, and I do not know where to start looking. Any hint would be welcome.

The data structure you want to keep your eye on is the set vregsToAlloc to RegAllocPBQP. This set holds the virtual registers which PBQP must allocate for on its next round. Once a virtual register has been spilled it should be erased from this set (see RegAllocPBQP::mapPBQPToRegAlloc), and it should never re-enter it, and thus never be considered again by the PBQP allocator. At a guess it sounds like one of your vregs may be being added to this set a 2nd time, but I’m not sure how that could be happening.

Is your backend public? Are you able to share a test case with me? I’m very busy with my PhD write-up at the moment, but I will try to find time for a quick look at this.

Regards,
Lang.

Hi Lang,

Hmm. Let me make sure I’m reading this right. The constraints are that:
a) All four operands have distinct registers.

b) The first two are in a consecutive pair (second > first)

c) The second two are in a consecutive pair (fourth > third)

Constraints b & c are OK, but a is too strict : “mpra %R0, %R1, %R0, %R1” is OK. But I though, may be wrongly, that “mpra %vreg1, %vreg2, %vreg3, %vreg4” meant %vreg1 and %vreg3 will be allocated to different physical registers, or they would have been coalesced. For example, the pass I added after the coalescer ensures that instructions like “mpra %vreg1, %vreg2, %vreg1, %vreg4” (impossible for pbqp to solve) is transformed to “mpra %vreg1, %vreg2, %vreg3, %vreg4” with the proper copy of %vreg1 to %vreg3 inserted.

That said my understanding of your pairing constraints would definitely prohibit mpra %R0, %R2, %R1, %R3 under any circumstances, even with out the

distinction constraint. Either I’ve misunderstood your constraints, or there is a bug somewhere.

You got the contraints right. I was surprised to have to add a “safety net” to prevent the impossible case; at first I though pbqp could “backtrack” once it discovers it gets to an impossible case… My backend is private, so I can not share it.

I will trace what’s happening with vregsToAlloc and keep you updated.

Regards,

Hi Arnaud,

Hmm. Let me make sure I’m reading this right. The constraints are that:
a) All four operands have distinct registers.

b) The first two are in a consecutive pair (second > first)

c) The second two are in a consecutive pair (fourth > third)

Constraints b & c are OK, but a is too strict : “mpra %R0, %R1, %R0, %R1” is OK. But I though, may be wrongly, that “mpra %vreg1, %vreg2, %vreg3, %vreg4” meant %vreg1 and %vreg3 will be allocated to different physical registers, or they would have been coalesced. For example, the pass I added after the coalescer ensures that instructions like “mpra %vreg1, %vreg2, %vreg1, %vreg4” (impossible for pbqp to solve) is transformed to “mpra %vreg1, %vreg2, %vreg3, %vreg4” with the proper copy of %vreg1 to %vreg3 inserted.

“mpra %vreg1, %vreg2, %vreg3, %vreg4” does not imply that vreg1 and vreg3 will be allocated different physical registers. It only says that they can (subject to constraints) be allocated different registers.

“mpra %vreg1, %vreg2, %vreg1, %vreg4” is not impossible for PBQP to solve. Provided you have added the pairing constraint between vreg1 & vreg2, and between vreg1 & vreg4, this simply means that the constraints will force vreg2 and vreg4 to be allocated the same physical register. The PBQP allocator will always generate something like “mpra RX, RY, RX, RY” for this set of vreg operands.

By inserting the copy from %vreg1 to %vreg3 you have relaxed the problem for PBQP: It can allocate the same register to %vreg1 and %vreg3, or it can allocate different ones.

That said my understanding of your pairing constraints would definitely prohibit mpra %R0, %R2, %R1, %R3 under any circumstances, even with out the

distinction constraint. Either I’ve misunderstood your constraints, or there is a bug somewhere.

You got the contraints right. I was surprised to have to add a “safety net” to prevent the impossible case; at first I though pbqp could “backtrack” once it discovers it gets to an impossible case… My backend is private, so I can not share it.

The PBQP allocator currently does not do any backtracking. The “backpropagation” phase, which some people mistake for backtracking, is actually just the graph reconstruction/coloring phase (equivalent to the “select” phase of a graph coloring allocator).

I am very surprised to hear that the allocator does generate cases like “mpra %R0, %R2, %R1, %R3”. It is possible that there is a bug in the solver, but in my experience the source of most bugs is the constraint matrices. Have you put infinite costs on the spill option of the pairing constraint? E.g.

SP R0 R1 R2 R3
SP I I I I I
R0 I I 0 I I
R1 I I I 0 I
R2 I I I I 0
R3 I 0 I I I

If that’s the case then an unfortunate combination of vreg uses in more than on mpra instruction could cause the situation you’re seeing. Consider:

mpra %vreg0, %vreg1, %vreg2, %vreg3
mpra %vreg0, %vreg2, %vreg1, %vreg3

With this sequence there is no way the PBQP allocator can construct a correct solution (i.e one with finite costs).

The solution is to remove the infinities from the spill option on the pairing matrix:

SP R0 R1 R2 R3
SP 0 0 0 0 0
R0 0 I 0 I I
R1 0 I I 0 I
R2 0 I I I 0
R3 0 0 I I I

This will allow the allocator to spill as necessary in order to find a correct allocation.

I will trace what’s happening with vregsToAlloc and keep you updated.

Thank you. I will be interested to know what your investigations turn up.

Regards,
Lang.