"Earlyclobber" but for a subset of the inputs

Hi all,

I’m working on a target whose registers have equal-sized subregisters and all of those subregisters can be named (or the other way round: registers can be grouped into super registers).

So for instance we’ve got 16 registers W (as in wide) W0…W15 and 32 registers N (as in narrow) N0…N31. This way, W0 is made by grouping N0 and N1, W1 is N2 and N3, W2 is N4 and N5, …, W15 is N30 and N31.

The target has some widening instructions that take a number of N registers and output a W register. Possible combinations are

Wdest = widen-op Nsrc1, Nsrc2

Wdest = widen-op Wsrc1, Nsrc2

The target constraints that the output register of these instructions cannot overlap, physically, an input of a different kind (W vs N).

For instance:

W1 = widen-op N4, N5 [this is ok, W1 is (N2, N3), so no overlap]
W1 = widen-op N3, N4 [this is wrong because W1 is (N2, N3), thus overlap]
W1 = widen-op W1, N4 [this is OK, W1 does not overlap with N4]

I can model these constraints using @earlyclobber and it works great for the Wdest = widen-op Nsrc1, Nsrc2 case. While correct, this is suboptimal for the Wdest = widen-op Wsrc1, Nsrc2 case because RA will never assign registers as in:

W1 = widen-op W1, N4 [RegAlloc would do something like W3 = widen-op W1, N4]

Has anyone encountered a similar situation? Perhaps all this can be modelled in a more obvious way?

Thank you very much,

Hi Roger,

Hi all,

I'm working on a target whose registers have equal-sized subregisters and all of those subregisters can be named (or the other way round: registers can be grouped into super registers).

So for instance we've got 16 registers W (as in wide) W0..W15 and 32 registers N (as in narrow) N0..N31. This way, W0 is made by grouping N0 and N1, W1 is N2 and N3, W2 is N4 and N5, ..., W15 is N30 and N31.

The target has some widening instructions that take a number of N registers and output a W register. Possible combinations are

Wdest = widen-op Nsrc1, Nsrc2
Wdest = widen-op Wsrc1, Nsrc2

The target constraints that the output register of these instructions cannot overlap, physically, an input of a different kind (W vs N).

For instance:

W1 = widen-op N4, N5 [this is ok, W1 is (N2, N3), so no overlap]
W1 = widen-op N3, N4 [this is wrong because W1 is (N2, N3), thus overlap]
W1 = widen-op W1, N4 [this is OK, W1 does not overlap with N4]

It sounds like you only need the earlyclobber description for the N, N variant.
In other words, as long as you use different opcodes for widen-op NN and widen-op WN, you model exactly what you want.

What am I missing?

Cheers,
-Quentin

Hi Quentin,

It sounds like you only need the earlyclobber description for the N, N variant.
In other words, as long as you use different opcodes for widen-op NN and widen-op WN, you model exactly what you want.

What am I missing?

we are using different opcodes for widen-op NN and widen-op WN.

My understanding is that not setting earlyclobber to the W, N variant would allow the RegAlloc to do an allocation like this

W1 = widen-op W2, N3

but this is not correct in that target because W1 and N3 are of different kind and W1 (being the group of registers N2, N3) overlaps N3.

If I understand earlyclobber semantics correctly, earlyclobber would allocate the destination to something that doesn’t overlap W2 and also doesn’t overlap N3. For instance

W3 = widen-op W2, N3

But because the dest is a W register the target’s constraint only applies to N3, not to W2, so the following should be OK (however RegAlloc would never make such an assignment under earlyclobber)

W2 = widen-op W2, N3

In principle earlyclobber is always going to do allocations that are correct for the target but there are a valid ones that will be missed.

Kind regards,

Hi Roger,

Hi Quentin,

It sounds like you only need the earlyclobber description for the N, N variant.
In other words, as long as you use different opcodes for widen-op NN and widen-op WN, you model exactly what you want.

What am I missing?

we are using different opcodes for widen-op NN and widen-op WN.

My understanding is that not setting earlyclobber to the W, N variant would allow the RegAlloc to do an allocation like this

W1 = widen-op W2, N3

Sorry I mixed up earlyclobber and tie-operand. For some reason I thought earlyclobber could be set individually on the src operands to say that they interfere with the definition…
That’s obviously wrong.

So yeah there isn’t anything in LLVM right now that conveys the semantic you want.

If you’re really concerned that you would use too many registers, you can either:

  1. Add this concept to llvm
  2. Repair after regalloc

For #2, basically you don’t set any constraints in regalloc then after regalloc (i.e., expand pseudo), you check if there are some overlapping and if so, you change the allocation locally.
E.g.,
W1 = W2, N3
=>
N4 = copy N3
W1 = copy W2, N4
or

W3 = copy W2, N3

W1 = copy W3

Depending on what is the cheapest with respect to what registers are available (you may have to spill).

Cheers,
-Quentin

Hi Quentin,

Thanks! I was worried I might be missing something very obvious. I’ll look into your suggestions.

Kind regards,

Missatge de Quentin Colombet <qcolombet@apple.com> del dia dt., 5 de maig 2020 a les 19:59: