Register based vector insert/extract

How can one let the back end know how to insert and extract elements of a vector through sub-register copies? I’m at a loss how to do this…

You probably want to custom lower the insertelement/extractelement operations for the cases you support. Take a look at X86TargetLowering::LowerEXTRACT_VECTOR_ELT for some examples of how to do this.

-Chris

The issue I’m having is that there is no extract/insert instruction in the ISA, it’s simply based on using subregister operands in subsequent/preliminary instructions. At the pointer of custom lowering register allocation has not yet been done, so I don’t have a way to communicate the dependency.

An example is in order:

If I have a register v4r0 with subregisters {r0, r1, r2, r3} and a DAG that looks like

load v4si <- extract_element 2 <- add -> load i32

I'd like to be able to generate

load v4r0
load r10
add r11, r10, r2 <== subregister 2 of v4r0

I see that Evan has added getSubRegisters()/getSuperRegisters() to MRegisterInfo. This is what's needed in order to implement the register allocation constraint, but there's no way yet to pass the constraint through the operands from the DAG. There would need to be some way to specify that the SDOperand is referencing a subvalue of the produced value (perhaps a subclass of SDOperand?). This would allow the register allocator to try to use the sub/super register sets to perform the instert/extract.

Is any of this kind of work planned? The addition of those MRegisterInfo functions has me curious...

The issue I'm having is that there is no extract/insert
instruction in the ISA, it's simply based on using subregister
operands in subsequent/preliminary instructions. At the pointer of
custom lowering register allocation has not yet been done, so I
don't have a way to communicate the dependency.

Ok.

If I have a register v4r0 with subregisters {r0, r1, r2, r3} and a
DAG that looks like

load v4si <- extract_element 2 <- add -> load i32

I'd like to be able to generate

load v4r0
load r10
add r11, r10, r2 <== subregister 2 of v4r0

Nice ISA. That is entirely too logical. :slight_smile:

We have a similar problem on X86. In particular, an integer truncate or an extend (e.g. i16 -> i8) wants to make use of subregisters. Consider code like this:

   t1 = load i16
   t2 = truncate i16 t1 to i8
   t3 = add i8 t2, 42

What we would really want to generate is something like this at the machine instr level:

   r1024 = X86_LOADi16 ... ;; r1024 is i16
   r1026 = ADDi8 r1024[subreg #0], 42

More specifically, we want to be able to define, for each register class, a set of subregister classes. In the X86 world, the 64-bit register classes could have subregclass0 = i8 parts, subregclass1 = i16 parts, subregclass2 = i32 parts. Each <physreg, subreg#> pair should map to another physreg (e.g. <RAX,1> -> AX).

The idea of this is that the register allocator allocates registers like normal, but when it does the rewriting pass, when it replaces vregs with pregs (e.g. r1024 with CX in this example), it rewrites r1024[subreg0] with CL instead of CX. This would give us this code:

   CX = X86_LOADi16 ...
   DL = ADDi8 CL, 42

In your case, you'd define your vector register class with 4 subregs, one for each piece.

Unfortunately, none of this exists yet :(. To handle truncates and extends on X86, we currently emulate this by generating machineinstrs like:

   r1024 = X86_LOADi16 ...
   r1025 = TRUNCATE_i16_to_i8 r1024
   r1026 = ADDi8 r1025, 42

In the asmprinter, we print TRUNCATE_i16_to_i8 as a commented out noop if the register allocator happens to allocate 1024 and 1025 to the same register. If not, it uses an asmprinter hack to print this as a copy instruction. This is horrible, and doesn't produce good code. OTOH, before Evan improved this, we always copied into AX and out of AL for each i16->i8 truncate, which was much worse :slight_smile:

I see that Evan has added getSubRegisters()/getSuperRegisters() to
MRegisterInfo. This is what's needed in order to implement the
register allocation constraint, but there's no way yet to pass the
constraint through the operands from the DAG. There would need to be
some way to specify that the SDOperand is referencing a subvalue of
the produced value (perhaps a subclass of SDOperand?). This would
allow the register allocator to try to use the sub/super register
sets to perform the instert/extract.

Right. Evan is currently focusing on getting the late stages of the code generator (e.g. livevars) to be able to understand arbitrary machine instrs in the face of physreg subregs. This lays the groundwork for handling vreg subregs, but won't solve it directly.

Is any of this kind of work planned? The addition of those
MRegisterInfo functions has me curious...

This is on our mid-term plan, which means we'll probably tackle it over the next year or so, but we don't have any concrete plans in the immediate future. If you are interested, this should be a pretty reasonable project that will give you a chance to become more familiar with various pieces of the early code generator. :slight_smile:

-Chris

Thanks for the detailed response.

Right. Evan is currently focusing on getting the late stages of the code
generator (e.g. livevars) to be able to understand arbitrary machine
instrs in the face of physreg subregs. This lays the groundwork for
handling vreg subregs, but won't solve it directly.

Is the work Evan doing a prerequisite for supporting vreg subregs?
Is there a PR for the feature Evan is working on?

Is any of this kind of work planned? The addition of those
MRegisterInfo functions has me curious...

This is on our mid-term plan, which means we'll probably tackle it over
the next year or so, but we don't have any concrete plans in the immediate
future. If you are interested, this should be a pretty reasonable project
that will give you a chance to become more familiar with various pieces of
the early code generator. :slight_smile:

I have other higher priority tasks right now, but I think we'll want to have this in sooner rather than later. If you have any pointers on a good starting point it'd be mighty helpful. If I can get a grasp on it I'll start incremental work in the background.

Probably the place to start would be opening a PR. Does "Support for vreg subregs" capture the essence of the enhancement?

Thanks for the detailed response.

Right. Evan is currently focusing on getting the late stages of
the code
generator (e.g. livevars) to be able to understand arbitrary machine
instrs in the face of physreg subregs. This lays the groundwork for
handling vreg subregs, but won't solve it directly.

Is the work Evan doing a prerequisite for supporting vreg subregs?

Sort of. vreg subregs work can start before I finish phyregs subregs support. But unless there are no live-in registers nothing can possibly work.

Is there a PR for the feature Evan is working on?

You filed it. PR1306. :slight_smile:

Is any of this kind of work planned? The addition of those
MRegisterInfo functions has me curious...

This is on our mid-term plan, which means we'll probably tackle it
over
the next year or so, but we don't have any concrete plans in the
immediate
future. If you are interested, this should be a pretty reasonable
project
that will give you a chance to become more familiar with various
pieces of
the early code generator. :slight_smile:

I have other higher priority tasks right now, but I think we'll want
to have this in sooner rather than later. If you have any pointers on
a good starting point it'd be mighty helpful. If I can get a grasp on
it I'll start incremental work in the background.

It's really unclear how we will implement this. I haven't given it much thought because it's not yet important for us. If you have ideas, please share. :slight_smile:

Probably the place to start would be opening a PR. Does "Support for
vreg subregs" capture the essence of the enhancement?

Sure. Please add the relevant information from the thread to the bug though.

Evan

Thanks for the detailed response.

Right. Evan is currently focusing on getting the late stages of
the code
generator (e.g. livevars) to be able to understand arbitrary machine
instrs in the face of physreg subregs. This lays the groundwork for
handling vreg subregs, but won't solve it directly.

Is the work Evan doing a prerequisite for supporting vreg subregs?

Sort of. vreg subregs work can start before I finish phyregs subregs
support. But unless there are no live-in registers nothing can
possibly work.

Is there a PR for the feature Evan is working on?

You filed it. PR1306. :slight_smile:

Ah! I didn't realize that the issue would have such far reaching consequences.

Is any of this kind of work planned? The addition of those
MRegisterInfo functions has me curious...

This is on our mid-term plan, which means we'll probably tackle it
over
the next year or so, but we don't have any concrete plans in the
immediate
future. If you are interested, this should be a pretty reasonable
project
that will give you a chance to become more familiar with various
pieces of
the early code generator. :slight_smile:

I have other higher priority tasks right now, but I think we'll want
to have this in sooner rather than later. If you have any pointers on
a good starting point it'd be mighty helpful. If I can get a grasp on
it I'll start incremental work in the background.

It's really unclear how we will implement this. I haven't given it
much thought because it's not yet important for us. If you have
ideas, please share. :slight_smile:

Chris had some suggestions about 2 posts back <http://lists.cs.uiuc.edu/pipermail/llvmdev/2007-April/008834.html&gt;

He mentioned that the place where the constraint would need to be enforced was during the register rewriting pass.

My first productive thoughts were to create a subclass of SDOperand (SDSubOperand) that Lowering could use to communicate the target specific subregister index. The other thought was to have something akin to "getSubRegisterForIndex()" in MRegisterInfo, which would return a sub register of the correct type at the specified index in a target dependent way. I'm not familiar with the register rewriting pass, so I'm not sure what data structures it needs/has access to.

I have other higher priority tasks right now, but I think we'll want
to have this in sooner rather than later. If you have any pointers on
a good starting point it'd be mighty helpful. If I can get a grasp on
it I'll start incremental work in the background.

It's really unclear how we will implement this. I haven't given it
much thought because it's not yet important for us. If you have
ideas, please share. :slight_smile:

hehe, I've been thinking about this for years :slight_smile:

Chris had some suggestions about 2 posts back <http://
lists.cs.uiuc.edu/pipermail/llvmdev/2007-April/008834.html>

He mentioned that the place where the constraint would need to be
enforced was during the register rewriting pass.

My first productive thoughts were to create a subclass of SDOperand
(SDSubOperand) that Lowering could use to communicate the target
specific subregister index. The other thought was to have something
akin to "getSubRegisterForIndex()" in MRegisterInfo, which would
return a sub register of the correct type at the specified index in a
target dependent way. I'm not familiar with the register rewriting
pass, so I'm not sure what data structures it needs/has access to.

Yes, we need those. I think these are the major pieces needed. These are all relatively small and independent pieces, so we can tackle these one at a time.

1. As you say, we need MRegisterInfo::getSubRegisterForIndex that, given a
    preg/subreg# pair, returns a preg.
2. We need tblgen syntax/registerinfoemitter support to generate tables
    for #1.
3. Register MachineOperands need a subregister number. We should probably
    use 0 to denote "no subreg".
4. The DAG scheduler pass (which creates machine instrs from dag nodes)
    currently thinks of register operands as simple unsigned's for vreg
    #'s. This needs to be extended to be vreg+subreg pairs (see
    'CreateVirtualRegisters').
5. We need to decide how to represent subregs in the DAG. Your
    SDSubOperand idea is fine, but I don't think it needs to be an actual
    new subclass of SDOperand. Instead, it could just be a binary SDNode,
    where the LHS is the register input and the RHS is a TargetConstant
    specifying the subreg#.
6. [optional] We would like syntax to create these things for writting
    patterns in the .td file instead of requiring custom matching code. 7. The register allocator needs to rewrite subreg references using
    #1. This should be very simple.

-Chris

Yes, we need those. I think these are the major pieces needed. These are

all relatively small and independent pieces, so we can tackle these one at

a time.

  1. The DAG scheduler pass (which creates machine instrs from dag nodes)

currently thinks of register operands as simple unsigned’s for vreg

#'s. This needs to be extended to be vreg+subreg pairs (see

‘CreateVirtualRegisters’).

  1. We need to decide how to represent subregs in the DAG. Your

SDSubOperand idea is fine, but I don’t think it needs to be an actual

new subclass of SDOperand. Instead, it could just be a binary SDNode,

where the LHS is the register input and the RHS is a TargetConstant

specifying the subreg#.

  1. [optional] We would like syntax to create these things for writting

patterns in the .td file instead of requiring custom matching code.

  1. The register allocator needs to rewrite subreg references using

#1. This should be very simple.

For 5 I am currently creating new binary SDNodes for ‘from_subreg’ and ‘to_subreg’ in ISD, is this in line with your thinking for the design Chris? The issue I ran into is that you essentially need subreg insert and extract.

Hi Christopher,

I can send you what I have for that so far, that works pretty well.

Nate

That’d be great.