Some backend questions

Ok, I'm now trying to write instruction selector and have some questions

1. The MachineInstrBuilder has methods to add register operand and immediate
operand. However, what would be really nice is a method to add Value*. So, I
would write:

    BuildMI(*BB, NM::add, 1).add(I.getOperand(0), I.getOperand(1));

and depending on whether the passed Value* is contant or instruction, the add
method would either add immediate constant or allocate/get new virtual
register.

2. Why SSARegMap is called this way. As far as I can see, it's does not
implement any mapping, it simply allocates registers given a register class.

3. Maybe, the allocation of virtual registers for Value* should be made more
reusable. The X86 backend has the code for that in getReg method in
InstSelectSimple.cpp which:

   - uses SSARegMap instance
   - keeps internal Value* -> register mapping
   - copies constants into register when needed

At least first two things will be necessary for my backend too. I start to
wonder if it makes sense to make a "BasicInstructionSelector" which will have
such reusable logic?

Sorry if the above is not very clear -- I'm only starting to getting underting
of LLVM codebase.

- Volodya

Ok, I'm now trying to write instruction selector and have some questions

1. The MachineInstrBuilder has methods to add register operand and immediate
operand. However, what would be really nice is a method to add Value*. So, I
would write:

    BuildMI(*BB, NM::add, 1).add(I.getOperand(0), I.getOperand(1));

and depending on whether the passed Value* is contant or instruction, the add
method would either add immediate constant or allocate/get new virtual
register.

Okay, the problem with this is that the instruction selector is the code
that is supposed to know what constraints the instructions have. For
example, on many architectures, immediates are limited to 13 bits or some
other small number. Also, for virtual regsiters, there is no context to
hold the mapping of Value*'s -> vregs: this is what the instruction
selector is about.

I recommend taking a look at the getReg(*) methods in the X86 instruction
selector. The basic code generation stage for an add, boiled down to its
simplest form, basically looks like this:

void visitAdd(BinaryOperator &B) {
  unsigned Op0Reg = getReg(B.getOperand(0));
  unsigned Op1Reg = getReg(B.getOperand(1));
  unsigned DestReg = getReg(B);

  unsigned Opcode = (get the opcode for the size of the add);
  BuildMI(<where>, Opcode, 2, DestReg).addReg(Op0Reg).addReg(Op1Reg);
}

The nice thing about the "getReg" functionality is that it is a member of
the instruction selector class, so it has the context to store the maps
and other things needed in it.

If you do this (which I recommend for the first step), you'll notice that
it produces pretty horrible code, as all immediates are copied into
registers before they are used. In other words, instead of getting:

  R2 = add R1, 17

You'll get:

  R3 = mov 17
  R2 = add R2, R3

IOW, the code will work fine, but will be ugly and slow. As the second
step, when the selector is basically working, you can add cases to handle
these. Taken to a limit you'll get something as complex as the X86
InstSelectSimple (which as someone mentioned, is not really simple
anymore).

2. Why SSARegMap is called this way. As far as I can see, it's does not
implement any mapping, it simply allocates registers given a register class.

You're right, it currently just keeps track of the register class for each
virtual register. Eventually it will keep track of which machine
instruction defines each virtual register as well, at which point it will
live up to its name. :slight_smile:

3. Maybe, the allocation of virtual registers for Value* should be made more
reusable. The X86 backend has the code for that in getReg method in
InstSelectSimple.cpp which:

   - uses SSARegMap instance
   - keeps internal Value* -> register mapping
   - copies constants into register when needed

At least first two things will be necessary for my backend too. I start to
wonder if it makes sense to make a "BasicInstructionSelector" which will have
such reusable logic?

That's a good idea, because basically all instruction selectors require
the same code. The problem is that the code needs to know how to get an
immediate/constant into a register as well, which can be very complex on
some targets.

For now, I recommend just copying the code to get something up and running
quickly. In the future we can talk about creating a target independent
helper class once we understand the requirements and constraints better.

Sorry if the above is not very clear -- I'm only starting to getting underting
of LLVM codebase.

Not at all! Please ask! BTW, what architecture are you targetting?

-Chris

Chris Lattner wrote:

> 1. The MachineInstrBuilder has methods to add register operand and
> immediate operand. However, what would be really nice is a method to add
> Value*. So, I would write:
>
> BuildMI(*BB, NM::add, 1).add(I.getOperand(0), I.getOperand(1));
>
> and depending on whether the passed Value* is contant or instruction, the
> add method would either add immediate constant or allocate/get new
> virtual register.

Okay, the problem with this is that the instruction selector is the code
that is supposed to know what constraints the instructions have. For
example, on many architectures, immediates are limited to 13 bits or some
other small number.

Yes, I realize that this code can't be 100% target-independent.

Also, for virtual regsiters, there is no context to
hold the mapping of Value*'s -> vregs: this is what the instruction
selector is about.

I recommend taking a look at the getReg(*) methods in the X86 instruction
selector. The basic code generation stage for an add, boiled down to its
simplest form, basically looks like this:

void visitAdd(BinaryOperator &B) {
  unsigned Op0Reg = getReg(B.getOperand(0));
  unsigned Op1Reg = getReg(B.getOperand(1));
  unsigned DestReg = getReg(B);

  unsigned Opcode = (get the opcode for the size of the add);
  BuildMI(<where>, Opcode, 2, DestReg).addReg(Op0Reg).addReg(Op1Reg);
}

The nice thing about the "getReg" functionality is that it is a member of
the instruction selector class, so it has the context to store the maps
and other things needed in it.

Yes, I've seen this method.

If you do this (which I recommend for the first step), you'll notice that
it produces pretty horrible code, as all immediates are copied into
registers before they are used. In other words, instead of getting:

  R2 = add R1, 17

You'll get:

  R3 = mov 17
  R2 = add R2, R3

Right... that's what alarmed me in the first place. I though about something
code which create immediate operand from constant and virtual register from
Value* which really points to Instruction*. There should also be some
mechanism to avoid creating two immediate operands, if target does not allow
that.
         

> 2. Why SSARegMap is called this way. As far as I can see, it's does not
> implement any mapping, it simply allocates registers given a register
> class.

You're right, it currently just keeps track of the register class for each
virtual register. Eventually it will keep track of which machine
instruction defines each virtual register as well, at which point it will
live up to its name. :slight_smile:

Ok.

> 3. Maybe, the allocation of virtual registers for Value* should be made
> more reusable. The X86 backend has the code for that in getReg method in
> InstSelectSimple.cpp which:
>
> - uses SSARegMap instance
> - keeps internal Value* -> register mapping
> - copies constants into register when needed
>
> At least first two things will be necessary for my backend too. I start
> to wonder if it makes sense to make a "BasicInstructionSelector" which
> will have such reusable logic?

That's a good idea, because basically all instruction selectors require
the same code. The problem is that the code needs to know how to get an
immediate/constant into a register as well, which can be very complex on
some targets.

Right. I though that a virtual method to copy constant to a virtual register
would work.

For now, I recommend just copying the code to get something up and running
quickly. In the future we can talk about creating a target independent
helper class once we understand the requirements and constraints better.

Ok, I'll try to.

> Sorry if the above is not very clear -- I'm only starting to getting
> underting of LLVM codebase.

Not at all! Please ask!

Thanks.

BTW, what architecture are you targetting?

That's NM6403 -- an DSP produced by one russian company (http://module.ru).
As I've already said, my interest is in my PhD research -- I plan to run some
analysis on LLVM representation and assembler for that processor, so it would
be more convenient if assembler is produced by LLVM, and not the standard
compiler.

- Volodya

> If you do this (which I recommend for the first step), you'll notice that
> it produces pretty horrible code, as all immediates are copied into
> registers before they are used. In other words, instead of getting:
>
> R2 = add R1, 17
>
> You'll get:
>
> R3 = mov 17
> R2 = add R2, R3

Right... that's what alarmed me in the first place. I though about something
code which create immediate operand from constant and virtual register from
Value* which really points to Instruction*. There should also be some
mechanism to avoid creating two immediate operands, if target does not allow
that.

The ultimate solution is to use a pattern matching instruction selector
(which we are working on). In the meantime, depending on how RISCy your
target is, it's pretty easy to get reasonable code with few special cases.
Usually this is enough:

... visitAdd(Instruction &I) {

  if (ConstantInt *C = dyn_cast<Constant>(I.getOperand(1))) {
    // handle add r, i
  } else {
    // handle general 'add r,r' case.
  }
}

In particular, I *strongly* recommend getting a working code generator
first, even if it creates mind boggling ugly code... then make it generate
great code.

> BTW, what architecture are you targetting?

That's NM6403 -- an DSP produced by one russian company (http://module.ru).
As I've already said, my interest is in my PhD research -- I plan to run some
analysis on LLVM representation and assembler for that processor, so it would
be more convenient if assembler is produced by LLVM, and not the standard
compiler.

Ohhh, sounds great. From a brief look through their data sheets, it looks
like it's a pretty RISCy processor, but has some snazzy addressing modes
with pre/post in/decrements. I assume that you're planning on only
targetting the RISC core and not the vector copro?

-Chris

Chris Lattner wrote:

The ultimate solution is to use a pattern matching instruction selector
(which we are working on). In the meantime, depending on how RISCy your
target is, it's pretty easy to get reasonable code with few special cases.
Usually this is enough:

... visitAdd(Instruction &I) {

  if (ConstantInt *C = dyn_cast<Constant>(I.getOperand(1))) {
    // handle add r, i
  } else {
    // handle general 'add r,r' case.
  }
}

In particular, I *strongly* recommend getting a working code generator
first, even if it creates mind boggling ugly code... then make it generate
great code.

Thanks, advice taken.

> > BTW, what architecture are you targetting?
>
> That's NM6403 -- an DSP produced by one russian company
> (http://module.ru). As I've already said, my interest is in my PhD
> research -- I plan to run some analysis on LLVM representation and
> assembler for that processor, so it would be more convenient if assembler
> is produced by LLVM, and not the standard compiler.

Ohhh, sounds great. From a brief look through their data sheets, it looks
like it's a pretty RISCy processor, but has some snazzy addressing modes
with pre/post in/decrements.

Yes. It's my impression that most DSPs have such "snazzy" addressing modes :wink:

I assume that you're planning on only
targetting the RISC core and not the vector copro?

Right. It would be experemely hard to generate efficient code for the vector
unit.

- Volodya

Not extremely hard, just a different set of technology. Depending on what
your goals are, we could support it with some simple intrinsics to begin
with, then eventually have LLVM optimizations that produce the intrinsics
automatically. Again, it totally depends on what your goals are, and if
the vector unit is not important for your work, it can be left to someone
who finds it to be important. In the beginning, leaving the vector unit
completely idle is the right way to go! :slight_smile:

-Chris