Instruction descriptions question

Hi,

I'm trying to implement a new backend for an embedded CISC processor.
Therefore I thought that it makes sense to take X86 target as a basis,
to save some time.

But when I look into the X86InstrInfo.td, I have a very strong feeling
that it is one of the most complex instruction set descriptions
compared to other targets. I can imagine that this is due to the
complexity of X86's instruction set. But still I have a feeling that it
is overcomplicated, but may be it is just an initial impression.

Since I just started, here are my first several questions and
proposals. I hope that some of them make sense for you.

1. Why does X86 instruction set description provide different
descriptions for the same instructions, which differ only in the size
of operands?
E.g.

def MOV8rm : I<0x8A, MRMSrcMem, (ops GR8 :$dst, i8mem :$src),
                "mov{b} {$src, $dst|$dst, $src}",
                [(set GR8:$dst, (load addr:$src))]>;
def MOV16rm : I<0x8B, MRMSrcMem, (ops GR16:$dst, i16mem:$src),
                "mov{w} {$src, $dst|$dst, $src}",
                [(set GR16:$dst, (load addr:$src))]>, OpSize;
def MOV32rm : I<0x8B, MRMSrcMem, (ops GR32:$dst, i32mem:$src),
                "mov{l} {$src, $dst|$dst, $src}",
                [(set GR32:$dst, (load addr:$src))]>;

For my target processor, only the types of operands (e.g. rm) are
really important. The size is encoded into the opcode, but otherwise it
does not affect anything except for constraints on the sizes of
operands. And constraint is basically that the size of both operands
should be the same.

Wouldn't it be possible and even more clean to have just one
description like (I use a pseudo-description here):

def MOVrr : I<0x88, MRMDestReg, (ops (GR8|GR16|GR32) :$dst,
(i8mem|i16mem|i32mem):$src),
                "mov{b} {$src, $dst|$dst, $src}", >, isSameSize($dst,
$src);

The semantic of such a description would mean that $dst should be one
of GR8, GR16, GR32 and $dst is one of i8mem, i16mem, i32mem with the
additional constraint that the sizes of both operands are the same
(this is checked by isSameSize predicate).

More over, in very many cases, the operation (e.g. OR) itself affects
only the encoding of the operation inside the opcode and of course the
description of the effect and assembler syntax e.g. [(store (or (load
addr:$dst), GR8:$src), addr:$dst)]. For OR insn the opreation should be
or, and for AND insn - obviously and. But operands and operand
combinations have the same limitations and descriptions also for many
other operations, e.g. XOR, AND, etc.
Wouldn't it make sense to provide an easier way to describe it?
For example, one would describe operands combinations (e.g. RR, RM)
separately and then just say which instructions they are applicable
for? May be using something like:

def rm : (GR32 :$dst, i32mem:$src)
{
  "$assm_op{l} {$src, $dst|$dst, $src}",
  [(set GR32:$dst, ($insn_effect_op GR16:$dst, (load addr:$src)))]
         
};

Mapping from instruction name to instruction syntax and/or effect can
be also simplified to something like:

def AND -> assm_op="and", insn_effect_op="and";
def OR -> assm_op"or", insn_effect_op="or";

let rm in {
  def OR : ...;
  def AND : ...;
  def MOV : ...;
  ...
}

The effect would be a set of instructions descriptions with names
ORrm, ANDrm, MOVrm and required type of operands. Effectively, it
produces the same descriptions that would be written by hand (in this
regard it works pretty much as a preprocessor that expands macro
definitions). The advantage is that descriptions become much shorter
and cleaner, because parts common for multiple instructions are
factored out into a single description element.

I ask it because I have used BURG based instruction selectors (e.g. the
one from LCC compiler and some others) before and there is was quite
easy to express something like that (though not everything described
above) in the tree grammar for a code selector,e.g.
gpr -> GR8 | GR16| GR32;
mem -> i8mem | i16mem | i32mem;
insn -> mov(gpr:src, mem:dst) | operand_size_of(src) ==
operand_size_of(src)

Is it possible to express something like that using TableGen
descriptions? Or are they so limited at the moment that it is not
possible and this is the reason for so elaborative descriptions for the
X86 target. If it is not possible yet, are there any plans to implement
something like that for providing the possibilities for even cleaner,
shorter and more expressive and concise descriptions? Of course, all
extensions the I described are just sketches, but what do you think
about this approach as such?

2. Another related question is about instruciton costs. In BURG-based
selectors there is usually a possibility to describe costs for the
instructions so that a least-cost cover can be found during the
instruction selection process. I have not seen any such cost
descriptions in TableGen files. Does it mean that it is not supported?
Why? As far as I understand, LLVM uses BURG/IBURG for instruction
selection and both of these tools support costs, AFAIK.

Best Regards,
  Roman

I'm trying to implement a new backend for an embedded CISC processor.
Therefore I thought that it makes sense to take X86 target as a basis,
to save some time.

Ok. Note that the X86 backend is one of the most complex though, because it supports several subtargets and ABIs, which makes it more complex than some other targets.

But when I look into the X86InstrInfo.td, I have a very strong feeling
that it is one of the most complex instruction set descriptions
compared to other targets.

Yep.

I can imagine that this is due to the
complexity of X86's instruction set. But still I have a feeling that it
is overcomplicated, but may be it is just an initial impression.

Ok.

Since I just started, here are my first several questions and
proposals. I hope that some of them make sense for you.

Ok.

1. Why does X86 instruction set description provide different
descriptions for the same instructions, which differ only in the size
of operands?
E.g.

def MOV8rm : I<0x8A, MRMSrcMem, (ops GR8 :$dst, i8mem :$src),
               "mov{b} {$src, $dst|$dst, $src}",
               [(set GR8:$dst, (load addr:$src))]>;
def MOV16rm : I<0x8B, MRMSrcMem, (ops GR16:$dst, i16mem:$src),
               "mov{w} {$src, $dst|$dst, $src}",
               [(set GR16:$dst, (load addr:$src))]>, OpSize;
def MOV32rm : I<0x8B, MRMSrcMem, (ops GR32:$dst, i32mem:$src),
               "mov{l} {$src, $dst|$dst, $src}",
               [(set GR32:$dst, (load addr:$src))]>;

We must do this, because they perform different operations. Specifically, the loads all read a different number of bytes (1/2/4).

For my target processor, only the types of operands (e.g. rm) are
really important. The size is encoded into the opcode, but otherwise it
does not affect anything except for constraints on the sizes of
operands. And constraint is basically that the size of both operands
should be the same.

Ok.

Wouldn't it be possible and even more clean to have just one
description like (I use a pseudo-description here):

def MOVrr : I<0x88, MRMDestReg, (ops (GR8|GR16|GR32) :$dst,
(i8mem|i16mem|i32mem):$src),
               "mov{b} {$src, $dst|$dst, $src}", >, isSameSize($dst,
$src);

We already have something like this, but it's a little more general. The X86 backend hasn't been converted to use it. This is the 'multiclass' facility in tblgen:
http://llvm.org/docs/TableGenFundamentals.html#multiclass

Basically this lets you use one definition to implement multiple different instructions. For example, most instructions in the sparc target come in "reg,reg" and "reg,imm" forms. As such, it defines:

multiclass F3_12<string OpcStr, bits<6> Op3Val, SDNode OpNode> {
   def rr : F3_1<2, Op3Val,
                  (ops IntRegs:$dst, IntRegs:$b, IntRegs:$c),
                  !strconcat(OpcStr, " $b, $c, $dst"),
                  [(set IntRegs:$dst, (OpNode IntRegs:$b, IntRegs:$c))]>;
   def ri : F3_2<2, Op3Val,
                  (ops IntRegs:$dst, IntRegs:$b, i32imm:$c),
                  !strconcat(OpcStr, " $b, $c, $dst"),
                  [(set IntRegs:$dst, (OpNode IntRegs:$b, simm13:$c))]>;
}

which allows it to use instructions like:

defm AND : F3_12<"and" , 0b000001, and>;
defm OR : F3_12<"or" , 0b000010, or>;
defm XOR : F3_12<"xor" , 0b000011, xor>;
defm SLL : F3_12<"sll" , 0b100101, shl>;
defm SRL : F3_12<"srl" , 0b100110, srl>;
defm SRA : F3_12<"sra" , 0b100111, sra>;
defm ADD : F3_12<"add" , 0b000000, add>;
defm ADDCC : F3_12<"addcc", 0b010000, addc>;
defm ADDX : F3_12<"addx" , 0b001000, adde>;
defm SUB : F3_12<"sub" , 0b000100, sub>;
defm SUBX : F3_12<"subx" , 0b001100, sube>;
defm SUBCC : F3_12<"subcc", 0b010100, SPcmpicc>;
...

Each of these 'defm's expand into two instructions.

The semantic of such a description would mean that $dst should be one
of GR8, GR16, GR32 and $dst is one of i8mem, i16mem, i32mem with the
additional constraint that the sizes of both operands are the same
(this is checked by isSameSize predicate).

Sure. It would be straight-forward to define a multiclass for your arch, in which each use of defm makes three instructions: one for each width.

More over, in very many cases, the operation (e.g. OR) itself affects
only the encoding of the operation inside the opcode and of course the
description of the effect and assembler syntax e.g. [(store (or (load
addr:$dst), GR8:$src), addr:$dst)]. For OR insn the opreation should be
or, and for AND insn - obviously and. But operands and operand
combinations have the same limitations and descriptions also for many
other operations, e.g. XOR, AND, etc.

Yep.

Wouldn't it make sense to provide an easier way to describe it?
For example, one would describe operands combinations (e.g. RR, RM)
separately and then just say which instructions they are applicable
for? May be using something like:

def rm : (GR32 :$dst, i32mem:$src)
{
"$assm_op{l} {$src, $dst|$dst, $src}",
[(set GR32:$dst, ($insn_effect_op GR16:$dst, (load addr:$src)))]

};

Yep. tblgen gives you lots of tools to factor your instructions.

2. Another related question is about instruciton costs. In BURG-based
selectors there is usually a possibility to describe costs for the
instructions so that a least-cost cover can be found during the
instruction selection process. I have not seen any such cost
descriptions in TableGen files. Does it mean that it is not supported?

LLVM currently uses assumes that all instructions have cost 1, unless told otherwise ('Pat' patterns which produce multiple instructions have cost equal to the sum of their result). Given this, it attempts to match the largest patterns possible, which reduces cost.

The 'largest' metric is somewhat intricate, but you can directly affect it in your .td files with the 'AddedComplexity'. If you increase it, it will make an instruction more likely to be matched. However, you really shouldn't need to do this, except in extreme cases. Are you seeing cases where the selector is missing an optimal match, or are you just used to other tools which aren't as smart at inferring cost as tblgen?

Why? As far as I understand, LLVM uses BURG/IBURG for instruction
selection and both of these tools support costs, AFAIK.

Nope, it uses neither. It uses custom code built into tblgen.

-Chris

Hi Chris,

Thanks a lot for your answer!

Chris Lattner wrote:

1. Why does X86 instruction set description provide different
descriptions for the same instructions, which differ only in the

size

of operands?
E.g.

def MOV8rm : I<0x8A, MRMSrcMem, (ops GR8 :$dst, i8mem :$src),
               "mov{b} {$src, $dst|$dst, $src}",
               [(set GR8:$dst, (load addr:$src))]>;
def MOV16rm : I<0x8B, MRMSrcMem, (ops GR16:$dst, i16mem:$src),
               "mov{w} {$src, $dst|$dst, $src}",
               [(set GR16:$dst, (load addr:$src))]>, OpSize;
def MOV32rm : I<0x8B, MRMSrcMem, (ops GR32:$dst, i32mem:$src),
               "mov{l} {$src, $dst|$dst, $src}",
               [(set GR32:$dst, (load addr:$src))]>;

We must do this, because they perform different operations.

Specifically,

the loads all read a different number of bytes (1/2/4).

OK.

For my target processor, only the types of operands (e.g. rm) are
really important. The size is encoded into the opcode, but otherwise

it

does not affect anything except for constraints on the sizes of
operands. And constraint is basically that the size of both operands
should be the same.

Ok.

Wouldn't it be possible and even more clean to have just one
description like (I use a pseudo-description here):

def MOVrr : I<0x88, MRMDestReg, (ops (GR8|GR16|GR32) :$dst,
(i8mem|i16mem|i32mem):$src),
               "mov{b} {$src, $dst|$dst, $src}", >,

isSameSize($dst,

$src);

We already have something like this, but it's a little more general.

The

X86 backend hasn't been converted to use it.

Any plans to do it, to make things simpler?

This is the 'multiclass'
facility in tblgen:
http://llvm.org/docs/TableGenFundamentals.html#multiclass

This is very interesting.

Basically this lets you use one definition to implement multiple

different

instructions. For example, most instructions in the sparc target

come in

"reg,reg" and "reg,imm" forms. As such, it defines:

multiclass F3_12<string OpcStr, bits<6> Op3Val, SDNode OpNode> {
   def rr : F3_1<2, Op3Val,
                  (ops IntRegs:$dst, IntRegs:$b, IntRegs:$c),
                  !strconcat(OpcStr, " $b, $c, $dst"),
                  [(set IntRegs:$dst, (OpNode IntRegs:$b,

IntRegs:$c))]>;

   def ri : F3_2<2, Op3Val,
                  (ops IntRegs:$dst, IntRegs:$b, i32imm:$c),
                  !strconcat(OpcStr, " $b, $c, $dst"),
                  [(set IntRegs:$dst, (OpNode IntRegs:$b,

simm13:$c))]>;

}

which allows it to use instructions like:

defm AND : F3_12<"and" , 0b000001, and>;
defm OR : F3_12<"or" , 0b000010, or>;
defm XOR : F3_12<"xor" , 0b000011, xor>;
defm SLL : F3_12<"sll" , 0b100101, shl>;
defm SRL : F3_12<"srl" , 0b100110, srl>;
defm SRA : F3_12<"sra" , 0b100111, sra>;
defm ADD : F3_12<"add" , 0b000000, add>;
defm ADDCC : F3_12<"addcc", 0b010000, addc>;
defm ADDX : F3_12<"addx" , 0b001000, adde>;
defm SUB : F3_12<"sub" , 0b000100, sub>;
defm SUBX : F3_12<"subx" , 0b001100, sube>;
defm SUBCC : F3_12<"subcc", 0b010100, SPcmpicc>;
...

Each of these 'defm's expand into two instructions.

The semantic of such a description would mean that $dst should be

one

of GR8, GR16, GR32 and $dst is one of i8mem, i16mem, i32mem with the
additional constraint that the sizes of both operands are the same
(this is checked by isSameSize predicate).

Sure. It would be straight-forward to define a multiclass for your

arch,

in which each use of defm makes three instructions: one for each

width.

Beautiful! This saved my day! Multiclass is an extremely useful
construct. Thanks a lot for explanations. Multiclasses seem to provide
almost everything (if not all) of the features that I was asking for.
I'll try to use it for writing a short and concise definition of
instructions for my target.

2. Another related question is about instruciton costs. In

BURG-based

selectors there is usually a possibility to describe costs for the
instructions so that a least-cost cover can be found during the
instruction selection process. I have not seen any such cost
descriptions in TableGen files. Does it mean that it is not

supported?

LLVM currently uses assumes that all instructions have cost 1, unless

told

otherwise ('Pat' patterns which produce multiple instructions have

cost

equal to the sum of their result). Given this, it attempts to match

the

largest patterns possible, which reduces cost.

The 'largest' metric is somewhat intricate, but you can directly

affect it

in your .td files with the 'AddedComplexity'. If you increase it, it

will

make an instruction more likely to be matched. However, you really
shouldn't need to do this, except in extreme cases. Are you seeing

cases

where the selector is missing an optimal match, or are you just used

to

other tools which aren't as smart at inferring cost as tblgen?

I guess, I'm just used to other more BURG-like tools. And yes, they are
not too intelligent compared to tblgen;)

So far, I haven't seen a real case where an optimal match is missed by
tblgen. But I have not completely converted the old BURG-spec into
tblgen yet. When I'm ready, I'll try to compare the results and check
some corner cases (for example, I was using tree grammar to
automatically select a best addressing mode, which tblgen doesn't
support yet as far as I understand from the documentation; there were
also interesting tree patterns for C operations of the form X OP= Y and
increment operators) - let's see how tblgen will perform.

Why? As far as I understand, LLVM uses BURG/IBURG for instruction
selection and both of these tools support costs, AFAIK.

Nope, it uses neither. It uses custom code built into tblgen.

OK. Now I understand. Probably I thought that tblgen is BURG-based
because there is a Burg subdirectory in the llvm/utils. And in older
versions of llvm it was even non-empty, giving the impression that it
is used.

BTW, does it use the same theory as BURG/IBURG when it generates the
selector? I.e. dynamic programming + precomputed costs,etc? Does it
implicitly build a tree grammar or something like that? Does it try to
merge/share patterns as much as possible to speedup the recognition
phase as it is done by BURMs? I have the impression (by looking at the
produced C++ selector code) that tblgen-based selectors are not as
"precomputed" and optimal as BURG-based ones. Do you have an idea or
figures about how it compares to BURG/IBURG selectors?

Thanks again,
  Roman

Wouldn't it be possible and even more clean to have just one
description like (I use a pseudo-description here):

def MOVrr : I<0x88, MRMDestReg, (ops (GR8|GR16|GR32) :$dst,
(i8mem|i16mem|i32mem):$src),
               "mov{b} {$src, $dst|$dst, $src}", >,

isSameSize($dst,

$src);

We already have something like this, but it's a little more general.
The X86 backend hasn't been converted to use it.

Any plans to do it, to make things simpler?

It would be nice, but noone has stepped up to do it yet. It's a mostly mechanical change, so it should be straight-forward.

Sure. It would be straight-forward to define a multiclass for your

arch,

in which each use of defm makes three instructions: one for each

width.

Beautiful! This saved my day! Multiclass is an extremely useful construct. Thanks a lot for explanations. Multiclasses seem to provide almost everything (if not all) of the features that I was asking for. I'll try to use it for writing a short and concise definition of instructions for my target.

Yay :slight_smile:

shouldn't need to do this, except in extreme cases. Are you seeing

cases

where the selector is missing an optimal match, or are you just used

to

other tools which aren't as smart at inferring cost as tblgen?

I guess, I'm just used to other more BURG-like tools. And yes, they are
not too intelligent compared to tblgen;)

ok.

So far, I haven't seen a real case where an optimal match is missed by
tblgen. But I have not completely converted the old BURG-spec into
tblgen yet. When I'm ready, I'll try to compare the results and check
some corner cases (for example, I was using tree grammar to
automatically select a best addressing mode, which tblgen doesn't
support yet as far as I understand from the documentation;

Right. Currently we end up handling this with ComplexPattern matchers, which are written in C++. In practice, I found that this was simpler than writing patterns in the .td files anyway, at least for architectures with really complex addressing modes (x86). For example, on x86, we want to match "(A+1)*8" as "A*8 + 8" and many other similar things. Handling these with patterns is quite difficult when there are many orthogonal cases that need to be handled.

there were also interesting tree patterns for C operations of the form X OP= Y and increment operators) - let's see how tblgen will perform.

tblgen treats "updating" or "two-address" instructions as if they were three-address instructions, then uses the register allocator to pin the first two operands to the same register. See use of 'isTwoAddress' in the X86 or PPC backends for examples.

Why? As far as I understand, LLVM uses BURG/IBURG for instruction
selection and both of these tools support costs, AFAIK.

Nope, it uses neither. It uses custom code built into tblgen.

OK. Now I understand. Probably I thought that tblgen is BURG-based
because there is a Burg subdirectory in the llvm/utils. And in older
versions of llvm it was even non-empty, giving the impression that it
is used.

Yep, it used to be used for the SparcV9 backend.

BTW, does it use the same theory as BURG/IBURG when it generates the
selector? I.e. dynamic programming + precomputed costs,etc? Does it
implicitly build a tree grammar or something like that?

Currently it uses a very simple greedy bottom-up iterative matcher. We've found that most of the complexity of instruction selection is providing the DAG in a form that can easily be matched, rather than having really aggressive matching techniques. That said, we hope to move to a dynamic programming two pass matcher at some point.

Does it try to merge/share patterns as much as possible to speedup the recognition phase as it is done by BURMs?

Yes. This is also important to reduce code size.

I have the impression (by looking at the produced C++ selector code) that tblgen-based selectors are not as "precomputed" and optimal as BURG-based ones. Do you have an idea or figures about how it compares to BURG/IBURG selectors?

No measurements have been made, because noone has adapted burg/iburg to work with the LLVM DAGs. It would be an interesting project if you're interested though! Given a choice, I'd prefer that the matcher allow dynamic costs to be associated with the pattern. This is really important when subtargets need to be supported (e.g. pentium vs athlon vs core vs ...)

-Chris

Hi,

Few more questions that I found while trying to develop a new backend.
And sorry if I ask too many questions.

1) My target (embedded processor, which is a "not so direct" successor
of Z80 family of processors) does not support SELECT, so I was looking
for a workaround.

First I was thinking about expanding it into conditional flow with
branching, but then I have found that there exists a pass called
LowerSelect already.

I added in the getAnalysisUsage of my DAGtoDAGSel class (derived from
SelectionDAGSel) as a required analysis. After that, there are no more
SELECT instructions passed to my code selector, which is what I wanted.
Is it a correct to do it this way?

Anyhow, I have a very strong impression that after I did it, something
goes wrong with the code generations of copies from PHI nodes.
Sometimes this code is not generated at all. Can it be that LowerSelect
breaks some information like PHIs or some CFG information and this is
the reason why I see these side-effects? May be I have to call some
other pass after LowerSelect to recompute some properties and to bring
the code again into the correct form? Any ideas?

BTW, which pass(es) does transformation of PHIs into copies during code
selection?

2) In the X86 target, I've seen the special register classes GR16_ and
GR32_, which are subsclasses of GR16 and GR32. These two classes are
used just in a few places inside the instructions definition file.
There they are only used to define the fact that normal regs from GR32
can be moved into GR32_ regs. This is not quite obvious for me why this
classes are introduced at all, if they are almost not used anywhere in
the descriptions ...

On my target, there are many instructions that have certain constraints
using register subclasses, for example:
  loads from memory are only allowed to the GR_ regs
  copies between GR and GR_ regs are allowed
  stores to memory are only possible from GR_ regs

How this can be expressed in the definitions of instructions? I tried
to use GR_ register classes at appropriate places, when describing
instructions operands. Is it a right method? So far I was not very
successful, since after I did it as I described, tblgen started to
produce errors like "Pattern x is impossible to select". Why do I have
it and what should it mean? May be I'm doing something wrong.

Besides what I described, my target has additionally support for banks
if registers. Access to the registers in the non-current bank is more
expensive than access to regs from the current bank. And many
operations cannot have such regs from other banks as first operand. How
something like this can be expressed? Is there any support for such
kind of constraints. I can roughly imagine how instruction descriptions
could look like. But I guess also a register allocator should take this
into account. Otherwise too many copies between regs from current bank
and other banks will be generated (e.g. virtual reg was allocated to a
physical register from another bank, but later it is required in an
instruction that cannot take this register as operand. So, it must be
copied into a register in the current bank...). And decisions of
register allocator need to be based on minimization of total costs,
probably. Sounds very complex. I just wonder if LLVM already supports
something like this in any form.

3) My target has only 2 addr instructions, i.e. each instruction has at
most 2 operands. Do I still need to use 3 operands for such operations
like OR, AND, ADD, SUB in my instruction descriptions, but mark them as
isTwoAddrress=1??? Or do I need to call a special pass that converts
everything into 2-operand instructions?

4) Following Chris advice regarding multiclasses, I defined several
multiclasses, each containing 6 defs. But I noticed that I cannot use
"let isTwoAddress=1" or "let isCommutable=1" inside the multiclass
definition, which would be very handy. Is it really impossible or may
be there is a special syntax to be used? As a workaround, I can use
"let ..." with defm definitions using this multiclass.

5) Can definitionss inside a multiclass use another multiclass? I.e. is
it possible to use defm inside multiclass definitions?

Thanks,
Roman

Hi,

Few more questions that I found while trying to develop a new backend.
And sorry if I ask too many questions.

I only have answers to some of them:

1) My target (embedded processor, which is a "not so direct" successor
of Z80 family of processors) does not support SELECT, so I was looking
for a workaround.

First I was thinking about expanding it into conditional flow with
branching, but then I have found that there exists a pass called
LowerSelect already.

I added in the getAnalysisUsage of my DAGtoDAGSel class (derived from
SelectionDAGSel) as a required analysis. After that, there are no more
SELECT instructions passed to my code selector, which is what I wanted.
Is it a correct to do it this way?

You can add the line
setOperationAction(ISD::SELECT, MVT::i32, Expand);
to the constructor of you TargetLowering class. See the current
backend for an example.

....

On my target, there are many instructions that have certain constraints
using register subclasses, for example:
  loads from memory are only allowed to the GR_ regs
  copies between GR and GR_ regs are allowed
  stores to memory are only possible from GR_ regs

How this can be expressed in the definitions of instructions? I tried
to use GR_ register classes at appropriate places, when describing
instructions operands. Is it a right method? So far I was not very
successful, since after I did it as I described, tblgen started to
produce errors like "Pattern x is impossible to select". Why do I have
it and what should it mean? May be I'm doing something wrong.

I think that using register classes is the right solution. I don't
know what is causing the problem.

Besides what I described, my target has additionally support for banks
if registers. Access to the registers in the non-current bank is more
expensive than access to regs from the current bank. And many
operations cannot have such regs from other banks as first operand. How
something like this can be expressed? Is there any support for such
kind of constraints. I can roughly imagine how instruction descriptions
could look like. But I guess also a register allocator should take this
into account. Otherwise too many copies between regs from current bank
and other banks will be generated (e.g. virtual reg was allocated to a
physical register from another bank, but later it is required in an
instruction that cannot take this register as operand. So, it must be
copied into a register in the current bank...). And decisions of
register allocator need to be based on minimization of total costs,
probably. Sounds very complex. I just wonder if LLVM already supports
something like this in any form.

I don't know, but you can implement a register allocator that is
specific to this problem.

3) My target has only 2 addr instructions, i.e. each instruction has at
most 2 operands. Do I still need to use 3 operands for such operations
like OR, AND, ADD, SUB in my instruction descriptions, but mark them as
isTwoAddrress=1??? Or do I need to call a special pass that converts
everything into 2-operand instructions?

Using isTwoAddrress=1 should do it.

Thanks,
Roman

Best Regards,
Rafael

Hi Rafael,

Thanks for the answers.

> 1) My target (embedded processor, which is a "not so direct"
successor
> of Z80 family of processors) does not support SELECT, so I was
looking
> for a workaround.
>
> First I was thinking about expanding it into conditional flow with
> branching, but then I have found that there exists a pass called
> LowerSelect already.
>
> I added in the getAnalysisUsage of my DAGtoDAGSel class (derived
from
> SelectionDAGSel) as a required analysis. After that, there are no
more
> SELECT instructions passed to my code selector, which is what I
wanted.
> Is it a correct to do it this way?
You can add the line
setOperationAction(ISD::SELECT, MVT::i32, Expand);
to the constructor of you TargetLowering class. See the current
backend for an example.

I actually tried it first. But then if, I remember correctly, SELECT
nodes were expanded into something using SELECT_CC, which is also not
supported on my target. Basically, only conditional branches are
supported. Therefore I thought about using the LowerSelect pass. But
I'll try again this evening. May be I was doing something wrong.

What should be the result of expanding SELECT? Some sort of
IF-THEN-ELSE flow?

> On my target, there are many instructions that have certain
constraints
> using register subclasses, for example:
> loads from memory are only allowed to the GR_ regs
> copies between GR and GR_ regs are allowed
> stores to memory are only possible from GR_ regs
>
> How this can be expressed in the definitions of instructions? I
tried
> to use GR_ register classes at appropriate places, when describing
> instructions operands. Is it a right method? So far I was not very
> successful, since after I did it as I described, tblgen started to
> produce errors like "Pattern x is impossible to select". Why do I
have
> it and what should it mean? May be I'm doing something wrong.

I think that using register classes is the right solution. I don't
know what is causing the problem.

I also think so. But when you get such a message like described above,
it is rather difficult to guess the reason. I even extended the tblgen
tool to print a whole set of conflicting patterns in this case. But
they all looked OK for me, i.e. I could not see any clear reason why a
given pattern would not be selected. I'll try to dig deeper into this
issue.

> Besides what I described, my target has additionally support for
banks
> if registers. Access to the registers in the non-current bank is
more
> expensive than access to regs from the current bank. And many
> operations cannot have such regs from other banks as first operand.
How
> something like this can be expressed? Is there any support for such
> kind of constraints. I can roughly imagine how instruction
descriptions
> could look like. But I guess also a register allocator should take
this
> into account. Otherwise too many copies between regs from current
bank
> and other banks will be generated (e.g. virtual reg was allocated
to a
> physical register from another bank, but later it is required in an
> instruction that cannot take this register as operand. So, it must
be
> copied into a register in the current bank...). And decisions of
> register allocator need to be based on minimization of total costs,
> probably. Sounds very complex. I just wonder if LLVM already
supports
> something like this in any form.
I don't know, but you can implement a register allocator that is
specific to this problem.

Sure. I just thought that there may be some sort of initial support for
that kind of issues.

Best regards,
Roman

Check out how the sparc or powerpc backends handle this. They lower to a select_cc pseudo-op that expands to an if/then/else control flow.

-Chris

Thanks! The hint about a pseudo-op was really good. After I realized
how it works, I started the implementation of SELECT_CC using this
approach. Hopefully, I can finish it today.

Chris, may be you could provide some feedback about other questions
from the original mail? In particular about the register classes and
subclasses and about restrictions related to multiclasses? I'd be very
grateful.

-Roman

Check out how the sparc or powerpc backends handle this. They lower
to a
select_cc pseudo-op that expands to an if/then/else control flow.

Thanks! The hint about a pseudo-op was really good. After I realized how it works, I started the implementation of SELECT_CC using this approach. Hopefully, I can finish it today.

Great.

Chris, may be you could provide some feedback about other questions
from the original mail? In particular about the register classes and
subclasses and about restrictions related to multiclasses? I'd be very
grateful.

Can you resend it? I apparently didn't get it or deleted it or something. I just saw the response to it. Thanks,

-Chris

Hi,

What does it mean if a custom Node in the instructions description file
is declared to have a Chain?
Looking at different backends, I have the impression that it describes
some sort of side effect and usually used for nodes affecting the
control flow. But I'm not quite sure. Can someone describe the
semantics of this property and also what is a typical usage of it?

In particular, I have found that CMP nodes for different targets are
described differently with regard to this property. ARM backend defines
armcmp without this property. PCC defines PCCvcmp and PCCvcmp_o also
without this property. In Sparc backend SPcmpicc is also not using it.
But X86cmp does for some reason. I'm trying to understand if I need it
for my backend or not.

It would be also interesting to get some information about other SDNP-*
SelectionDAG node properties, e.g. SDNPOutFlag, SDNPInFlag,
SDNPOptInFlag and their purpose.

-Roman

Hi,

What does it mean if a custom Node in the instructions description file
is declared to have a Chain?
Looking at different backends, I have the impression that it describes
some sort of side effect and usually used for nodes affecting the
control flow. But I'm not quite sure. Can someone describe the
semantics of this property and also what is a typical usage of it?

Right, llvm models control flow dep. as chain operands. We use it to model relative ordering of memory operations. SDNPHasChain is defined in TargetSelectionDAG.td as a node property. It tells tblegen that specific node read / write chains so tblgen can emit the correct selection code for patterns that use these SDNode's.

In particular, I have found that CMP nodes for different targets are
described differently with regard to this property. ARM backend defines
armcmp without this property. PCC defines PCCvcmp and PCCvcmp_o also
without this property. In Sparc backend SPcmpicc is also not using it.
But X86cmp does for some reason. I'm trying to understand if I need it
for my backend or not.

X86ISD::CMP is a X86 specific target node. Other targets have similar nodes but there are subtle differences. Initially X86ISD::CMP was not defined to have a chain. It was added later as an "optimization" to force ordering between a load operand, cmp, and its CMOV or BRCOND use to allow load folding. X86 cmp / test can fold load, but it may cause cycles in the DAG if the instruction selector is not careful. I doubt any other current targets needs this.

It would be also interesting to get some information about other SDNP-*
SelectionDAG node properties, e.g. SDNPOutFlag, SDNPInFlag,
SDNPOptInFlag and their purpose.

TargetSelectionDAG.td describes them briefly. Basically a "flag" is a way to glue two nodes together. That is, the node which produces a flag and its use has to be scheduled together. So SDNPOutFlag means the node produces a flag, SDNPInFlag means it reads a flag, and SDNPOptInFlag has an optional flag operand (which always comes last).

Evan