[GlobalISel][RFC] Value to vreg during IR to MachineInstr translation for aggregate type

Hi,

As part of the effort to bring up GlobalISel, I would like your feedbacks on the best way to map LLVM IR values into MachineInstr values (virtual registers), in particular when aggregate types get involved.

I am looking for a long term solution.
Short term is to replicate SDAG solution.

** Context **

The first step of GlobalISel is to translate the LLVM IR into MachineInstr representation. During this process we need to be able to tell where are the different LLVM IR values with respect to their machine representation, such that we can feed the right element to the related machine instructions.

Right now, in SelectionDAG, we have a mapping from one Value* to one SDValue. But this is not the whole story!

** Problem With Aggregate Types **

  • Facts *
  • Aggregate types cannot be expressed with a single EVT (Extended Value Type)
  • SDValues are typed with a single EVT.

Therefore, values with aggregate types need to be break apart to be represented by a SDValue.

  • Implications *
  • Instructions that may accept aggregate type, e.g., load, store, and select, must be split to handle the different component that compose a Value. (Look for the use of ComputeValueVTs in the SelecionDAGBuilder and the related loops this implies).
  • Use of aggregate type generates a bunch of MERGE_VALUE nodes, which sole purpose is to glue all the components that make the aggregate type together.

Therefore, in practice, values with aggregate types are mapped to several SDValue hidden in MERGE_VALUE nodes.

  • Summary *

Values with aggregate type map to a list of SDValue and, consequently, the handling of the instructions is uneven between the ones that need to support aggregate type and the ones that do not.

** Possible Solutions **

  • A. Replicate SDAG Solution *

Not much to add from the title. One can see that in the sketch of patch for the IRTranslator (from the original GlobalISel email http://lists.llvm.org/pipermail/llvm-dev/2015-November/092566.html), we have a map from a Value to a list of virtual registers.

Pros:

  • We know it worked for SDAG.
  • We know backends would generate reasonable code from that.

Cons:

  • We need to pay (compile time, memory consumption) for the extra logic for every non-aggregate type.
  • Translation is not homogeneous between, say, add and store.
  • Premature splitting that need to be fixed later(?), e.g., merge consecutive stores.
  • B. Map Aggregate Type to One (Big) Virtual Register *

Have one big virtual register for the whole aggregate type. Only the size of the register matters for the instructions that handle aggregate type, so we can still use EVT for virtual registers.

Pros:

  • Translation is fast.
  • Code is easy to understand.
  • No premature splitting for instance for load instructions.

Cons:

  • Useless bits (padding) are carried around and may be hard to eliminate.
  • Expose weird types to legalization more often.
  • Backends may not cope very well with that kind of code.
  • C. Ideas *

Any other ideas?

** Feedback Needed **

I want to have your feedbacks on how we should model aggregate types during the IR translation with respect to the mapping to their machine representation.

This is not a road blocker as, at first, I will replicate SDAG solution, but I want we have a conversation on what is the best long term solution.

I will file a PR so that we remember we would need to come back to this part of the implementation of the prototype when we agreed on what the solution should be and look at productizing the selector.

** Example **

Consider the following LLVM IR input:

%struct.bar = type { i16, i16 }

define void @foo(i16 signext %a, i16 signext %b, %struct.bar* nocapture %addr) #0 {
entry:
%tmp = insertvalue %struct.bar undef, i16 %a, 0
%tmp2 = insertvalue %struct.bar %tmp, i16 %b, 1
store %struct.bar %tmp2, %struct.bar* %addr, align 4
ret void
}

Which is roughly:

struct bar {
short a;
short b;
};

void foo(short a, short b, struct bar *addr) {
struct bar tmp;
tmp.a = a;
tmp.b = b;
*addr = tmp;
}

  • Solution A: Replicate SDAG *

Note: (#) is the size of the virtual register.

  • Translation:

arg1(32) = copy R0
arg2(32) = copy R1
addr(32) = copy R2
a(16) = truncate arg1(32)
b(16) = truncate arg2(32)
tmp5(16), tmp6(16) = merge_value a(16), b(16)
store tmp5(16), addr(32), 0
store tmp6(16), addr(32), 4

  • Legalization:

arg1(32) = copy R0

arg2(32) = copy R1
addr(32) = copy R2
tmp5(16) = extract_subreg a(32), 0

tmp6(16) = extract_subreg b(32), 0
store tmp5(16), addr(32), 0
store tmp6(16), addr(32), 4

  • Solution B: One to one mapping Value to Vreg *
  • Translation:

arg1(32) = copy R0

arg2(32) = copy R1
addr(32) = copy R2
a(16) = truncate arg1(32)
b(16) = truncate arg2(32)
tmp(32) = concat a(16), b(16)
store tmp(32), addr

  • Legalization:

arg1(32) = copy R0

arg2(32) = copy R1
addr(32) = copy R2
a(32) = and arg1(32), 0xFFFF
b(32) = and arg2(32), 0xFFFF
tmp(32) = concat a(32/16), b(32/16) ; <— although the input registers are 32bits we are really only interested in the 16bit low part of both a and b.
store tmp(32), addr ; <— legal

The problem now is how do we legalize that thing with generic code?
I.e., we’ll end up with either load/store sequences (maybe storing i16 is not even legal) or SHIFT and OR and that may be painful to optimize later.

Thanks,
-Quentin

From: "Quentin Colombet via llvm-dev" <llvm-dev@lists.llvm.org>
To: "llvm-dev" <llvm-dev@lists.llvm.org>
Sent: Thursday, January 14, 2016 6:41:57 PM
Subject: [llvm-dev] [GlobalISel][RFC] Value to vreg during IR to MachineInstr translation for aggregate type

Hi,

As part of the effort to bring up GlobalISel, I would like your
feedbacks on the best way to map LLVM IR values into MachineInstr
values (virtual registers), in particular when aggregate types get
involved.

I am looking for a long term solution.
Short term is to replicate SDAG solution.

** Context **

The first step of GlobalISel is to translate the LLVM IR into
MachineInstr representation. During this process we need to be able
to tell where are the different LLVM IR values with respect to their
machine representation, such that we can feed the right element to
the related machine instructions.

Right now, in SelectionDAG, we have a mapping from one Value* to one
SDValue. But this is not the whole story!

** Problem With Aggregate Types **

* Facts *

- Aggregate types cannot be expressed with a single EVT (Extended
Value Type)
- SDValues are typed with a single EVT.

Therefore, values with aggregate types need to be break apart to be
represented by a SDValue.

* Implications *

- Instructions that may accept aggregate type, e.g., load, store, and
select, must be split to handle the different component that compose
a Value. (Look for the use of ComputeValueVTs in the
SelecionDAGBuilder and the related loops this implies).
- Use of aggregate type generates a bunch of MERGE_VALUE nodes, which
sole purpose is to glue all the components that make the aggregate
type together.

Therefore, in practice, values with aggregate types are mapped to
several SDValue hidden in MERGE_VALUE nodes.

* Summary *

Values with aggregate type map to a list of SDValue and,
consequently, the handling of the instructions is uneven between the
ones that need to support aggregate type and the ones that do not.

** Possible Solutions **

* A. Replicate SDAG Solution *

Not much to add from the title. One can see that in the sketch of
patch for the IRTranslator (from the original GlobalISel email
http://lists.llvm.org/pipermail/llvm-dev/2015-November/092566.html
), we have a map from a Value to a list of virtual registers.

Pros:
- We know it worked for SDAG.
- We know backends would generate reasonable code from that.

Cons:
- We need to pay (compile time, memory consumption) for the extra
logic for every non-aggregate type.
- Translation is not homogeneous between, say, add and store.
- Premature splitting that need to be fixed later(?), e.g., merge
consecutive stores.

* B. Map Aggregate Type to One (Big) Virtual Register *

Have one big virtual register for the whole aggregate type. Only the
size of the register matters for the instructions that handle
aggregate type, so we can still use EVT for virtual registers.

Pros:
- Translation is fast.
- Code is easy to understand.
- No premature splitting for instance for load instructions.

Cons:
- Useless bits (padding) are carried around and may be hard to
eliminate.

We could associate some metadata with the register somehow to represent the padding bits (we already do something like this for memcpys).

- Expose weird types to legalization more often.
- Backends may not cope very well with that kind of code.

Given that every backend is going to need major work to truly make this transition, I'd advise not to worry too much about your last item.

Seems like this second option is better than replicating the SDAG solution, so I recommend we pursue that.

-Hal

Hi Hal,

Thanks for your quick reply.

From: “Quentin Colombet via llvm-dev” <llvm-dev@lists.llvm.org>
To: “llvm-dev” <llvm-dev@lists.llvm.org>
Sent: Thursday, January 14, 2016 6:41:57 PM
Subject: [llvm-dev] [GlobalISel][RFC] Value to vreg during IR to MachineInstr translation for aggregate type

Hi,

As part of the effort to bring up GlobalISel, I would like your
feedbacks on the best way to map LLVM IR values into MachineInstr
values (virtual registers), in particular when aggregate types get
involved.

I am looking for a long term solution.
Short term is to replicate SDAG solution.

** Context **

The first step of GlobalISel is to translate the LLVM IR into
MachineInstr representation. During this process we need to be able
to tell where are the different LLVM IR values with respect to their
machine representation, such that we can feed the right element to
the related machine instructions.

Right now, in SelectionDAG, we have a mapping from one Value* to one
SDValue. But this is not the whole story!

** Problem With Aggregate Types **

  • Facts *
  • Aggregate types cannot be expressed with a single EVT (Extended
    Value Type)
  • SDValues are typed with a single EVT.

Therefore, values with aggregate types need to be break apart to be
represented by a SDValue.

  • Implications *
  • Instructions that may accept aggregate type, e.g., load, store, and
    select, must be split to handle the different component that compose
    a Value. (Look for the use of ComputeValueVTs in the
    SelecionDAGBuilder and the related loops this implies).
  • Use of aggregate type generates a bunch of MERGE_VALUE nodes, which
    sole purpose is to glue all the components that make the aggregate
    type together.

Therefore, in practice, values with aggregate types are mapped to
several SDValue hidden in MERGE_VALUE nodes.

  • Summary *

Values with aggregate type map to a list of SDValue and,
consequently, the handling of the instructions is uneven between the
ones that need to support aggregate type and the ones that do not.

** Possible Solutions **

  • A. Replicate SDAG Solution *

Not much to add from the title. One can see that in the sketch of
patch for the IRTranslator (from the original GlobalISel email
http://lists.llvm.org/pipermail/llvm-dev/2015-November/092566.html
), we have a map from a Value to a list of virtual registers.

Pros:

  • We know it worked for SDAG.
  • We know backends would generate reasonable code from that.

Cons:

  • We need to pay (compile time, memory consumption) for the extra
    logic for every non-aggregate type.
  • Translation is not homogeneous between, say, add and store.
  • Premature splitting that need to be fixed later(?), e.g., merge
    consecutive stores.
  • B. Map Aggregate Type to One (Big) Virtual Register *

Have one big virtual register for the whole aggregate type. Only the
size of the register matters for the instructions that handle
aggregate type, so we can still use EVT for virtual registers.

Pros:

  • Translation is fast.
  • Code is easy to understand.
  • No premature splitting for instance for load instructions.

Cons:

  • Useless bits (padding) are carried around and may be hard to
    eliminate.

We could associate some metadata with the register somehow to represent the padding bits (we already do something like this for memcpy).

The problem I have with metadata is that they can be lost.
That being said, having a mechanism to tell the optimizer what are in a register would be useful. Right now, to get this kind of information we use the computeKnownBits analysis and I wonder how could we generalize that so that the information is cheaper to get and pass along the way.

  • Expose weird types to legalization more often.
  • Backends may not cope very well with that kind of code.

Given that every backend is going to need major work to truly make this transition, I’d advise not to worry too much about your last item.

Seems like this second option is better than replicating the SDAG solution, so I recommend we pursue that.

My guts tell me that indeed the second option is better, but the lack of actual data makes me nervous.
Anyway, we don’t have to support aggregates to lay the basic pipeline. And when the basic pipeline is set, we can make actual measurements!

I’ve filed https://llvm.org/bugs/show_bug.cgi?id=26161 to keep track of that issue.

Thanks,
-Quentin

From: "Quentin Colombet" <qcolombet@apple.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "llvm-dev" <llvm-dev@lists.llvm.org>
Sent: Friday, January 15, 2016 1:37:50 PM
Subject: Re: [llvm-dev] [GlobalISel][RFC] Value to vreg during IR to
MachineInstr translation for aggregate type

Hi Hal,

Thanks for your quick reply.

> > From: "Quentin Colombet via llvm-dev" < llvm-dev@lists.llvm.org >
>

> > To: "llvm-dev" < llvm-dev@lists.llvm.org >
>

> > Sent: Thursday, January 14, 2016 6:41:57 PM
>

> > Subject: [llvm-dev] [GlobalISel][RFC] Value to vreg during IR to
> > MachineInstr translation for aggregate type
>

> > Hi,
>

> > As part of the effort to bring up GlobalISel, I would like your
>

> > feedbacks on the best way to map LLVM IR values into MachineInstr
>

> > values (virtual registers), in particular when aggregate types
> > get
>

> > involved.
>

> > I am looking for a long term solution.
>

> > Short term is to replicate SDAG solution.
>

> > ** Context **
>

> > The first step of GlobalISel is to translate the LLVM IR into
>

> > MachineInstr representation. During this process we need to be
> > able
>

> > to tell where are the different LLVM IR values with respect to
> > their
>

> > machine representation, such that we can feed the right element
> > to
>

> > the related machine instructions.
>

> > Right now, in SelectionDAG, we have a mapping from one Value* to
> > one
>

> > SDValue. But this is not the whole story!
>

> > ** Problem With Aggregate Types **
>

> > * Facts *
>

> > - Aggregate types cannot be expressed with a single EVT (Extended
>

> > Value Type)
>

> > - SDValues are typed with a single EVT.
>

> > Therefore, values with aggregate types need to be break apart to
> > be
>

> > represented by a SDValue.
>

> > * Implications *
>

> > - Instructions that may accept aggregate type, e.g., load, store,
> > and
>

> > select, must be split to handle the different component that
> > compose
>

> > a Value. (Look for the use of ComputeValueVTs in the
>

> > SelecionDAGBuilder and the related loops this implies).
>

> > - Use of aggregate type generates a bunch of MERGE_VALUE nodes,
> > which
>

> > sole purpose is to glue all the components that make the
> > aggregate
>

> > type together.
>

> > Therefore, in practice, values with aggregate types are mapped to
>

> > several SDValue hidden in MERGE_VALUE nodes.
>

> > * Summary *
>

> > Values with aggregate type map to a list of SDValue and,
>

> > consequently, the handling of the instructions is uneven between
> > the
>

> > ones that need to support aggregate type and the ones that do
> > not.
>

> > ** Possible Solutions **
>

> > * A. Replicate SDAG Solution *
>

> > Not much to add from the title. One can see that in the sketch of
>

> > patch for the IRTranslator (from the original GlobalISel email
>

> > http://lists.llvm.org/pipermail/llvm-dev/2015-November/092566.html
>

> > ), we have a map from a Value to a list of virtual registers.
>

> > Pros:
>

> > - We know it worked for SDAG.
>

> > - We know backends would generate reasonable code from that.
>

> > Cons:
>

> > - We need to pay (compile time, memory consumption) for the extra
>

> > logic for every non-aggregate type.
>

> > - Translation is not homogeneous between, say, add and store.
>

> > - Premature splitting that need to be fixed later(?), e.g., merge
>

> > consecutive stores.
>

> > * B. Map Aggregate Type to One (Big) Virtual Register *
>

> > Have one big virtual register for the whole aggregate type. Only
> > the
>

> > size of the register matters for the instructions that handle
>

> > aggregate type, so we can still use EVT for virtual registers.
>

> > Pros:
>

> > - Translation is fast.
>

> > - Code is easy to understand.
>

> > - No premature splitting for instance for load instructions.
>

> > Cons:
>

> > - Useless bits (padding) are carried around and may be hard to
>

> > eliminate.
>

> We could associate some metadata with the register somehow to
> represent the padding bits (we already do something like this for
> memcpy).

The problem I have with metadata is that they can be lost.

Sorry, I meant metadata in the more-general sense (not actual IR metadata can can be dropped). Something actually associated with the vreg itself.

That being said, having a mechanism to tell the optimizer what are in
a register would be useful. Right now, to get this kind of
information we use the computeKnownBits analysis and I wonder how
could we generalize that so that the information is cheaper to get
and pass along the way.

Agreed. Reducing the computeKnownBits cost would be a good thing. We could use caching (or an actual forward analysis), and that might help.

-Hal


From: “Quentin Colombet” <qcolombet@apple.com>
To: “Hal Finkel” <hfinkel@anl.gov>
Cc: “llvm-dev” <llvm-dev@lists.llvm.org>
Sent: Friday, January 15, 2016 1:37:50 PM
Subject: Re: [llvm-dev] [GlobalISel][RFC] Value to vreg during IR to MachineInstr translation for aggregate type

Hi Hal,

Thanks for your quick reply.


From: “Quentin Colombet via llvm-dev” <llvm-dev@lists.llvm.org>
To: “llvm-dev” <llvm-dev@lists.llvm.org>
Sent: Thursday, January 14, 2016 6:41:57 PM
Subject: [llvm-dev] [GlobalISel][RFC] Value to vreg during IR to MachineInstr translation for aggregate type

Hi,

As part of the effort to bring up GlobalISel, I would like your
feedbacks on the best way to map LLVM IR values into MachineInstr
values (virtual registers), in particular when aggregate types get
involved.

I am looking for a long term solution.
Short term is to replicate SDAG solution.

** Context **

The first step of GlobalISel is to translate the LLVM IR into
MachineInstr representation. During this process we need to be able
to tell where are the different LLVM IR values with respect to their
machine representation, such that we can feed the right element to
the related machine instructions.

Right now, in SelectionDAG, we have a mapping from one Value* to one
SDValue. But this is not the whole story!

** Problem With Aggregate Types **

  • Facts *
  • Aggregate types cannot be expressed with a single EVT (Extended
    Value Type)
  • SDValues are typed with a single EVT.

Therefore, values with aggregate types need to be break apart to be
represented by a SDValue.

  • Implications *
  • Instructions that may accept aggregate type, e.g., load, store, and
    select, must be split to handle the different component that compose
    a Value. (Look for the use of ComputeValueVTs in the
    SelecionDAGBuilder and the related loops this implies).
  • Use of aggregate type generates a bunch of MERGE_VALUE nodes, which
    sole purpose is to glue all the components that make the aggregate
    type together.

Therefore, in practice, values with aggregate types are mapped to
several SDValue hidden in MERGE_VALUE nodes.

  • Summary *

Values with aggregate type map to a list of SDValue and,
consequently, the handling of the instructions is uneven between the
ones that need to support aggregate type and the ones that do not.

** Possible Solutions **

  • A. Replicate SDAG Solution *

Not much to add from the title. One can see that in the sketch of
patch for the IRTranslator (from the original GlobalISel email
http://lists.llvm.org/pipermail/llvm-dev/2015-November/092566.html
), we have a map from a Value to a list of virtual registers.

Pros:

  • We know it worked for SDAG.
  • We know backends would generate reasonable code from that.

Cons:

  • We need to pay (compile time, memory consumption) for the extra
    logic for every non-aggregate type.
  • Translation is not homogeneous between, say, add and store.
  • Premature splitting that need to be fixed later(?), e.g., merge
    consecutive stores.
  • B. Map Aggregate Type to One (Big) Virtual Register *

Have one big virtual register for the whole aggregate type. Only the
size of the register matters for the instructions that handle
aggregate type, so we can still use EVT for virtual registers.

Pros:

  • Translation is fast.
  • Code is easy to understand.
  • No premature splitting for instance for load instructions.

Cons:

  • Useless bits (padding) are carried around and may be hard to
    eliminate.

We could associate some metadata with the register somehow to represent the padding bits (we already do something like this for memcpy).

The problem I have with metadata is that they can be lost.

Sorry, I meant metadata in the more-general sense (not actual IR metadata can can be dropped). Something actually associated with the vreg itself.

That being said, having a mechanism to tell the optimizer what are in a register would be useful. Right now, to get this kind of information we use the computeKnownBits analysis and I wonder how could we generalize that so that the information is cheaper to get and pass along the way.

Agreed. Reducing the computeKnownBits cost would be a good thing. We could use caching (or an actual forward analysis), and that might help.

Agreed.

Q.