Instruction Encodings in TableGen

I’m starting to look into binary instruction encodings in TableGen, and I’m a bit confused on how the instruction fields are populated. Perhaps I’m just being dense, but I cannot see how SDAG operands are translated into the encoding fields. Can someone please explain the following snippet from the PPC back-end.

The AND instruction in PPC is defined as:

1011 def AND : XForm_6<31, 28, (outs GPRC:$rA), (ins GPRC:$rS, GPRC:$rB),
1012 “and $rA, $rS, $rB”, IntSimple,
1013 [(set GPRC:$rA, (and GPRC:$rS, GPRC:$rB))]>;

Okay, so rA, rS, and rB are register operands.

The TableGen classes are defined as:

315 class XForm_base_r3xo_swapped
316 <bits<6> opcode, bits<10> xo, dag OOL, dag IOL, string asmstr,
317 InstrItinClass itin>
318 : I<opcode, OOL, IOL, asmstr, itin> {
319 bits<5> A;
320 bits<5> RST;
321 bits<5> B;
322
323 bit RC = 0; // set by isDOT
324
325 let Inst{6-10} = RST;
326 let Inst{11-15} = A;
327 let Inst{16-20} = B;
328 let Inst{21-30} = xo;
329 let Inst{31} = RC;
330 }

337 class XForm_6<bits<6> opcode, bits<10> xo, dag OOL, dag IOL, string asmstr,
338 InstrItinClass itin, list pattern>
339 : XForm_base_r3xo_swapped<opcode, xo, OOL, IOL, asmstr, itin> {
340 let Pattern = pattern;
341 }

Okay, so A, RST, and B are the operand fields in the instruction encoding (I assume). But where are A, RST, and B given values? When the instruction is encoded (and the physical registers are known), where do these values come from? A grep for RST doesn’t come up with anything useful. Is there C++ code somewhere that scans the operands of all instructions and performs the actual encoding?

I'm starting to look into binary instruction encodings in TableGen, and I'm
a bit confused on how the instruction fields are populated. Perhaps I'm
just being dense, but I cannot see how SDAG operands are translated into
the encoding fields. Can someone please explain the following snippet from
the PPC back-end.

The AND instruction in PPC is defined as:

1011 def AND : XForm_6<31, 28, (outs GPRC:$rA), (ins GPRC:$rS, GPRC:$rB),
1012 "and $rA, $rS, $rB", IntSimple,
1013 [(set GPRC:$rA, (and GPRC:$rS, GPRC:$rB))]>;

Okay, so rA, rS, and rB are register operands.

The TableGen classes are defined as:

315 class XForm_base_r3xo_swapped
316 <bits<6> opcode, bits<10> xo, dag OOL, dag IOL, string asmstr,
317 InstrItinClass itin>
318 : I<opcode, OOL, IOL, asmstr, itin> {
319 bits<5> A;
320 bits<5> RST;
321 bits<5> B;
322
323 bit RC = 0; // set by isDOT

324
325 let Inst{6-10} = RST;
326 let Inst{11-15} = A;
327 let Inst{16-20} = B;
328 let Inst{21-30} = xo;
329 let Inst{31} = RC;
330 }

337 class XForm_6<bits<6> opcode, bits<10> xo, dag OOL, dag IOL, string
asmstr,
338 InstrItinClass itin, list<dag> pattern>
339 : XForm_base_r3xo_swapped<opcode, xo, OOL, IOL, asmstr, itin> {
340 let Pattern = pattern;
341 }

Okay, so A, RST, and B are the operand fields in the instruction encoding
(I assume). But where are A, RST, and B given values? When the
instruction is encoded (and the physical registers are known), where do
these values come from? A grep for RST doesn't come up with anything
useful. Is there C++ code somewhere that scans the operands of all
instructions and performs the actual encoding?

The getMachineOpValue() function does the encoding for the non-MC code
emitter. The MC code emitter might be different though.

-Tom

I’m starting to look into binary instruction encodings in TableGen, and I’m
a bit confused on how the instruction fields are populated. Perhaps I’m
just being dense, but I cannot see how SDAG operands are translated into
the encoding fields. Can someone please explain the following snippet from
the PPC back-end.

The AND instruction in PPC is defined as:

1011 def AND : XForm_6<31, 28, (outs GPRC:$rA), (ins GPRC:$rS, GPRC:$rB),
1012 “and $rA, $rS, $rB”, IntSimple,
1013 [(set GPRC:$rA, (and GPRC:$rS, GPRC:$rB))]>;

Okay, so rA, rS, and rB are register operands.

The TableGen classes are defined as:

315 class XForm_base_r3xo_swapped
316 <bits<6> opcode, bits<10> xo, dag OOL, dag IOL, string asmstr,
317 InstrItinClass itin>
318 : I<opcode, OOL, IOL, asmstr, itin> {
319 bits<5> A;
320 bits<5> RST;
321 bits<5> B;
322
323 bit RC = 0; // set by isDOT

324
325 let Inst{6-10} = RST;
326 let Inst{11-15} = A;
327 let Inst{16-20} = B;
328 let Inst{21-30} = xo;
329 let Inst{31} = RC;
330 }

337 class XForm_6<bits<6> opcode, bits<10> xo, dag OOL, dag IOL, string
asmstr,
338 InstrItinClass itin, list pattern>
339 : XForm_base_r3xo_swapped<opcode, xo, OOL, IOL, asmstr, itin> {
340 let Pattern = pattern;
341 }

Okay, so A, RST, and B are the operand fields in the instruction encoding
(I assume). But where are A, RST, and B given values? When the
instruction is encoded (and the physical registers are known), where do
these values come from? A grep for RST doesn’t come up with anything
useful. Is there C++ code somewhere that scans the operands of all
instructions and performs the actual encoding?

The getMachineOpValue() function does the encoding for the non-MC code
emitter. The MC code emitter might be different though.

Yeah, I see the calls to getMachineOpValue() in the generated code, but it seems like it just processes the fields of the TableGen class in the order that they appear, e.g. A, RST, B in the PPC example, which seems very fragile. This makes me believe I’m missing something here.

I’m starting to look into binary instruction encodings in TableGen, and I’m
a bit confused on how the instruction fields are populated. Perhaps I’m
just being dense, but I cannot see how SDAG operands are translated into
the encoding fields. Can someone please explain the following snippet from
the PPC back-end.

The AND instruction in PPC is defined as:

1011 def AND : XForm_6<31, 28, (outs GPRC:$rA), (ins GPRC:$rS, GPRC:$rB),
1012 “and $rA, $rS, $rB”, IntSimple,
1013 [(set GPRC:$rA, (and GPRC:$rS, GPRC:$rB))]>;

Okay, so rA, rS, and rB are register operands.

The TableGen classes are defined as:

315 class XForm_base_r3xo_swapped
316 <bits<6> opcode, bits<10> xo, dag OOL, dag IOL, string asmstr,
317 InstrItinClass itin>
318 : I<opcode, OOL, IOL, asmstr, itin> {
319 bits<5> A;
320 bits<5> RST;
321 bits<5> B;
322
323 bit RC = 0; // set by isDOT

324
325 let Inst{6-10} = RST;
326 let Inst{11-15} = A;
327 let Inst{16-20} = B;
328 let Inst{21-30} = xo;
329 let Inst{31} = RC;
330 }

337 class XForm_6<bits<6> opcode, bits<10> xo, dag OOL, dag IOL, string
asmstr,
338 InstrItinClass itin, list pattern>
339 : XForm_base_r3xo_swapped<opcode, xo, OOL, IOL, asmstr, itin> {
340 let Pattern = pattern;
341 }

Okay, so A, RST, and B are the operand fields in the instruction encoding
(I assume). But where are A, RST, and B given values? When the
instruction is encoded (and the physical registers are known), where do
these values come from? A grep for RST doesn’t come up with anything
useful. Is there C++ code somewhere that scans the operands of all
instructions and performs the actual encoding?

The getMachineOpValue() function does the encoding for the non-MC code
emitter. The MC code emitter might be different though.

Yeah, I see the calls to getMachineOpValue() in the generated code, but it seems like it just processes the fields of the TableGen class in the order that they appear, e.g. A, RST, B in the PPC example, which seems very fragile. This makes me believe I’m missing something here.

If the bitfields are named the same as the operands in the (ins) and (outs) lists, TableGen will match them that way rather than positionally. ARM makes extensive use of that, for example.

-Jim

I’m starting to look into binary instruction encodings in TableGen, and I’m
a bit confused on how the instruction fields are populated. Perhaps I’m
just being dense, but I cannot see how SDAG operands are translated into
the encoding fields. Can someone please explain the following snippet from
the PPC back-end.

The AND instruction in PPC is defined as:

1011 def AND : XForm_6<31, 28, (outs GPRC:$rA), (ins GPRC:$rS, GPRC:$rB),
1012 “and $rA, $rS, $rB”, IntSimple,
1013 [(set GPRC:$rA, (and GPRC:$rS, GPRC:$rB))]>;

Okay, so rA, rS, and rB are register operands.

The TableGen classes are defined as:

315 class XForm_base_r3xo_swapped
316 <bits<6> opcode, bits<10> xo, dag OOL, dag IOL, string asmstr,
317 InstrItinClass itin>
318 : I<opcode, OOL, IOL, asmstr, itin> {
319 bits<5> A;
320 bits<5> RST;
321 bits<5> B;
322
323 bit RC = 0; // set by isDOT

324
325 let Inst{6-10} = RST;
326 let Inst{11-15} = A;
327 let Inst{16-20} = B;
328 let Inst{21-30} = xo;
329 let Inst{31} = RC;
330 }

337 class XForm_6<bits<6> opcode, bits<10> xo, dag OOL, dag IOL, string
asmstr,
338 InstrItinClass itin, list pattern>
339 : XForm_base_r3xo_swapped<opcode, xo, OOL, IOL, asmstr, itin> {
340 let Pattern = pattern;
341 }

Okay, so A, RST, and B are the operand fields in the instruction encoding
(I assume). But where are A, RST, and B given values? When the
instruction is encoded (and the physical registers are known), where do
these values come from? A grep for RST doesn’t come up with anything
useful. Is there C++ code somewhere that scans the operands of all
instructions and performs the actual encoding?

The getMachineOpValue() function does the encoding for the non-MC code
emitter. The MC code emitter might be different though.

Yeah, I see the calls to getMachineOpValue() in the generated code, but it seems like it just processes the fields of the TableGen class in the order that they appear, e.g. A, RST, B in the PPC example, which seems very fragile. This makes me believe I’m missing something here.

If the bitfields are named the same as the operands in the (ins) and (outs) lists, TableGen will match them that way rather than positionally. ARM makes extensive use of that, for example.

Awesome! I was hoping for something like that. Thanks!

>
>
>
>> > I'm starting to look into binary instruction encodings in
>> > TableGen, and
>> I'm
>> > a bit confused on how the instruction fields are populated.
>> > Perhaps I'm just being dense, but I cannot see how SDAG operands
>> > are translated into the encoding fields. Can someone please
>> > explain the following snippet
>> from
>> > the PPC back-end.
>> >
>> > The AND instruction in PPC is defined as:
>> >
>> > 1011 def AND : XForm_6<31, 28, (outs GPRC:$rA), (ins GPRC:$rS,
>> GPRC:$rB),
>> > 1012 "and $rA, $rS, $rB", IntSimple,
>> > 1013 [(set GPRC:$rA, (and GPRC:$rS,
>> > GPRC:$rB))]>;
>> >
>> > Okay, so rA, rS, and rB are register operands.
>> >
>> > The TableGen classes are defined as:
>> >
>> > 315 class XForm_base_r3xo_swapped
>> > 316 <bits<6> opcode, bits<10> xo, dag OOL, dag IOL,
>> > string
>> asmstr,
>> > 317 InstrItinClass itin>
>> > 318 : I<opcode, OOL, IOL, asmstr, itin> {
>> > 319 bits<5> A;
>> > 320 bits<5> RST;
>> > 321 bits<5> B;
>> > 322
>> > 323 bit RC = 0; // set by isDOT
>> >
>> > 324
>> > 325 let Inst{6-10} = RST;
>> > 326 let Inst{11-15} = A;
>> > 327 let Inst{16-20} = B;
>> > 328 let Inst{21-30} = xo;
>> > 329 let Inst{31} = RC;
>> > 330 }
>> >
>> > 337 class XForm_6<bits<6> opcode, bits<10> xo, dag OOL, dag IOL,
>> > string asmstr,
>> > 338 InstrItinClass itin, list<dag> pattern>
>> > 339 : XForm_base_r3xo_swapped<opcode, xo, OOL, IOL, asmstr,
>> > > { 340 let Pattern = pattern;
>> > 341 }
>> >
>> > Okay, so A, RST, and B are the operand fields in the instruction
>> encoding
>> > (I assume). But where are A, RST, and B given values? When the
>> > instruction is encoded (and the physical registers are known),
>> > where do these values come from? A grep for RST doesn't come up
>> > with anything useful. Is there C++ code somewhere that scans
>> > the operands of all instructions and performs the actual
>> > encoding?
>> >
>>
>> The getMachineOpValue() function does the encoding for the non-MC
>> code emitter. The MC code emitter might be different though.
>>
>
> Yeah, I see the calls to getMachineOpValue() in the generated code,
> but it seems like it just processes the fields of the TableGen
> class in the order that they appear, e.g. A, RST, B in the PPC
> example, which seems very fragile. This makes me believe I'm
> missing something here.
>
>
>
> If the bitfields are named the same as the operands in the (ins) and
> (outs) lists, TableGen will match them that way rather than
> positionally. ARM makes extensive use of that, for example.
>

Awesome! I was hoping for something like that. Thanks!

FYI: Using the built-in assembler for PPC is buggy; and so it is
possible that this is because some of the TableGen patterns are wrong.
If you happen to spot a mistake, please send a patch :slight_smile:

Once we have a PPC assembly parser, then we should be able to really
test the patterns.

-Hal

I'm starting to look into binary instruction encodings in
TableGen, and

I'm

a bit confused on how the instruction fields are populated.
Perhaps I'm just being dense, but I cannot see how SDAG operands
are translated into the encoding fields. Can someone please
explain the following snippet

from

the PPC back-end.

The AND instruction in PPC is defined as:

1011 def AND : XForm_6<31, 28, (outs GPRC:$rA), (ins GPRC:$rS,

GPRC:$rB),

1012 "and $rA, $rS, $rB", IntSimple,
1013 [(set GPRC:$rA, (and GPRC:$rS,
GPRC:$rB))]>;

Okay, so rA, rS, and rB are register operands.

The TableGen classes are defined as:

315 class XForm_base_r3xo_swapped
316 <bits<6> opcode, bits<10> xo, dag OOL, dag IOL,
string

asmstr,

317 InstrItinClass itin>
318 : I<opcode, OOL, IOL, asmstr, itin> {
319 bits<5> A;
320 bits<5> RST;
321 bits<5> B;
322
323 bit RC = 0; // set by isDOT

324
325 let Inst{6-10} = RST;
326 let Inst{11-15} = A;
327 let Inst{16-20} = B;
328 let Inst{21-30} = xo;
329 let Inst{31} = RC;
330 }

337 class XForm_6<bits<6> opcode, bits<10> xo, dag OOL, dag IOL,
string asmstr,
338 InstrItinClass itin, list<dag> pattern>
339 : XForm_base_r3xo_swapped<opcode, xo, OOL, IOL, asmstr,
> { 340 let Pattern = pattern;
341 }

Okay, so A, RST, and B are the operand fields in the instruction

encoding

(I assume). But where are A, RST, and B given values? When the
instruction is encoded (and the physical registers are known),
where do these values come from? A grep for RST doesn't come up
with anything useful. Is there C++ code somewhere that scans
the operands of all instructions and performs the actual
encoding?

The getMachineOpValue() function does the encoding for the non-MC
code emitter. The MC code emitter might be different though.

Yeah, I see the calls to getMachineOpValue() in the generated code,
but it seems like it just processes the fields of the TableGen
class in the order that they appear, e.g. A, RST, B in the PPC
example, which seems very fragile. This makes me believe I'm
missing something here.

If the bitfields are named the same as the operands in the (ins) and
(outs) lists, TableGen will match them that way rather than
positionally. ARM makes extensive use of that, for example.

Awesome! I was hoping for something like that. Thanks!

FYI: Using the built-in assembler for PPC is buggy; and so it is
possible that this is because some of the TableGen patterns are wrong.
If you happen to spot a mistake, please send a patch :slight_smile:

Once we have a PPC assembly parser, then we should be able to really
test the patterns.

If I find anything I'll let the list know, but I was really just using PPC as a (somewhat) simple example of what I was talking about. :slight_smile: