Subregister coalescing

Hi all,

We are working on a backend for a machine that has 4-wide vector
register & ops, *but* not vector loads. All the vector register elements
are directly accesible, so VI1 reg (Vector Integer 1) has I4, I5, I6 and
I7 as its (integer) subregisters. Subregisters of same reg *never*
overlap.

Therefore, vector loads are lowered to scalar loads followed by a chain
of INSERT_VECTOR_ELTs. Then we select those to INSERT_SUBREG, everything
fine to that point.

Status before live analisys is (non-related instrs removed):

36 %reg16388<def> = LDWr %reg16384, 0; mem:LD4[<unknown>]
68 %reg16392<def> = INSERT_SUBREG %reg16392<undef>, %reg16388<kill>, 1
76 %reg16394<def> = LDWr %reg16386<kill>, 0; mem:LD4[<unknown>]
116 %reg16400<def> = MOVEV %reg16392<kill>
124 %reg16400<def> = INSERT_SUBREG %reg16400, %reg16394<kill>, 2
132 %reg16401<def> = LDWr %reg16390<kill>, 0; mem:LD4[<unknown>]
164 %reg16404<def> = MOVEV %reg16400<kill>
172 %reg16404<def> = INSERT_SUBREG %reg16404, %reg16401<kill>, 3
180 %reg16405<def> = LDWr %reg16398<kill>, 0; mem:LD4[<unknown>]
212 %reg16408<def> = MOVEV %reg16404<kill>
220 %reg16408<def> = INSERT_SUBREG %reg16408, %reg16405<kill>, 4

Which after register coalescing gets transformed into:

36 %reg16404:1<def> = LDWr %reg16384, 0; mem:LD4[<unknown>]
76 %reg16394<def> = LDWr %reg16386<kill>, 0; mem:LD4[<unknown>]
124 %reg16404<def> = INSERT_SUBREG %reg16404, %reg16394<kill>, 2
132 %reg16401<def> = LDWr %reg16390<kill>, 0; mem:LD4[<unknown>]
172 %reg16404<def> = INSERT_SUBREG %reg16404, %reg16401<kill>, 3
180 %reg16405<def> = LDWr %reg16398<kill>, 0; mem:LD4[<unknown>]
220 %reg16404<def> = INSERT_SUBREG %reg16404, %reg16405<kill>, 4

The code is correct, but not optimal. I would like the loads to go
directly to the subregisters of %reg16404, avoiding the extra copies.
But seems Live Range Analisys interprets %reg16404 to be alive in the
whole range, thus preventing coalescing between its subregs and the load
destinations.

Is there a way to solve this?

As an alternate approach, I also tried to do a custom InstrInserter that
ended with the correct code just after MI emission:

68 %reg16392<def> = LDWr %reg16384<kill>, 0; mem:LD4[<unknown>]
76 %reg16393<def> = LDWr %reg16386<kill>, 0; mem:LD4[<unknown>]
84 %reg16394<def> = LDWr %reg16387<kill>, 0; mem:LD4[<unknown>]
92 %reg16395<def> = LDWr %reg16388<kill>, 0; mem:LD4[<unknown>]
132 %reg16400:1<def,dead> = MOVI32rr %reg16392<kill>
140 %reg16400:2<def> = MOVI32rr %reg16393<kill>
148 %reg16400:3<def> = MOVI32rr %reg16394<kill>
156 %reg16400:4<def> = MOVI32rr %reg16395<kill>

but then Live Range Analysis asserts because of multiply defined %
reg16400.

Can anyone give me a clue on the correct way to handle this situation?

Thanks!

Carlos

The ARM backend uses REG_SEQUENCE to solve this problem for NEON registers. REG_SEQUENCE is basically a parallel INSERT_SUBREG operation that inserts all of the subregs at once so the coalescer can deal with it. During the TwoAddressInstructionPass, the REG_SEQUENCE operations are replaced by direct references to the subregs, which is what you want. Look at the ARM backend to see how this works.

Which after register coalescing gets transformed into:

36 %reg16404:1<def> = LDWr %reg16384, 0; mem:LD4[<unknown>]
76 %reg16394<def> = LDWr %reg16386<kill>, 0; mem:LD4[<unknown>]
124 %reg16404<def> = INSERT_SUBREG %reg16404, %reg16394<kill>, 2
132 %reg16401<def> = LDWr %reg16390<kill>, 0; mem:LD4[<unknown>]
172 %reg16404<def> = INSERT_SUBREG %reg16404, %reg16401<kill>, 3
180 %reg16405<def> = LDWr %reg16398<kill>, 0; mem:LD4[<unknown>]
220 %reg16404<def> = INSERT_SUBREG %reg16404, %reg16405<kill>, 4

The code is correct, but not optimal. I would like the loads to go
directly to the subregisters of %reg16404, avoiding the extra copies.
But seems Live Range Analisys interprets %reg16404 to be alive in the
whole range, thus preventing coalescing between its subregs and the load
destinations.

Right. This is a deficiency in the coalescer. It doesn't deal well with multiple values being inserted into a larger register. As you correctly observed, it doesn't understand that a live interval can be partially defined and so may not interfere.

The opposite direction should be fine - using EXTRACT_SUBREG to get small registers from the larger one. Your stores ought to coalesce properly.

Is there a way to solve this?

What Bob said. Use REG_SEQUENCE. You may have to use LLVM from Subversion to do that. Your machine code looks like you are using 2.7.

As an alternate approach, I also tried to do a custom InstrInserter that
ended with the correct code just after MI emission:

68 %reg16392<def> = LDWr %reg16384<kill>, 0; mem:LD4[<unknown>]
76 %reg16393<def> = LDWr %reg16386<kill>, 0; mem:LD4[<unknown>]
84 %reg16394<def> = LDWr %reg16387<kill>, 0; mem:LD4[<unknown>]
92 %reg16395<def> = LDWr %reg16388<kill>, 0; mem:LD4[<unknown>]
132 %reg16400:1<def,dead> = MOVI32rr %reg16392<kill>
140 %reg16400:2<def> = MOVI32rr %reg16393<kill>
148 %reg16400:3<def> = MOVI32rr %reg16394<kill>
156 %reg16400:4<def> = MOVI32rr %reg16395<kill>

but then Live Range Analysis asserts because of multiply defined %
reg16400.

You can't do it like that because the machine code must be in SSA form until TwoAddressInstructionPass runs. This is why we keep the INSERT_SUBREG instruction around instead of just lowering it to

%reg16404:4<def> = COPY %reg16405<kill>

On the other hand, EXTRACT_SUBREG is translated to a subreg COPY immediately.

/jakob

Hi,

Is there a way to solve this?

What Bob said. Use REG_SEQUENCE. You may have to use LLVM from Subversion to do that. Your machine code looks like you are using 2.7.

Ok, so I understand REG_SEQUENCE is to BUILD_VECTOR what INSERT_SUBREG is to INSERT_VECTOR_ELT, in a way. I was actually looking for such a target opcode and could not find it in 2.7 (you are right, that is what i am using). I guess it is time to upgrade.

Thanks Jakob and Bob,

Carlos