TableGen related question for the Hexagon backend

Hi,

I'm looking for some suggestions on a problem related to the Hexagon
backend.

Hexagon architecture allows instructions in various formats. For example, we
have 3 variations of the add instruction as defined below:

ADDrr : r1 = add(r2, r3) --> add 2 32-bit registers ADDrr_p : if(p0) r1 =
add(r2, r3) --> predicated version of ADDrr instruction, executed when p0 is
true ADDrr_np : if(!p0) r1 = add(r2, r3) --> predicated version of ADDrr
instruction, executed when p0 is false

Currently, we rely on switch tables to transform between formats. However,
we would like to have a different mechanism to represent these relationships
instead of switch tables. I am thinking of modeling these relations in
HexagonInstrInfo.td file and use TableGen to generate a table with the
information. Idea is to have a new class, say Relations, with some members
of type 'Instruction' each representing a specific relation between itself
and 'ThisInstr'.

For example:
class Relations {
                Instruction ThisInstr;
                Instruction BaseForm;
                Instruction TruePred;
                Instruction FalsePred;
}

def Rel_ADDrr : Relations<ADDrr, ADDrr, ADDrr_p, ADDrr_np>; def Rel_ADDrr_p
: Relations<ADDrr_p, ADDrr, , >; def Rel_ADDrr_np : Relations<ADDrr_np,
ADDrr, , >;

I wonder if TableGen can parse Relations defs and emit the information into
tabular form. I expect to index the resulting table using the instruction
opcode and find opcode of the new instruction of the desired format. Any
suggestions will be much appreciated.

Relationship table layout:

HexagonInstrRelations[] = {
.
{HexagonISD::ADDrr, HexagonISD::ADDrr, HexagonISD:: ADDrr_p,
HexagonISD::ADDrr_np}, {HexagonISD::ADDrr_p, HexagonISD::ADDrr, -1, -1},
{HexagonISD::ADDrr_np, HexagonISD::ADDrr, -1, -1}, .
};

Each column represents a specific relationship, -1 means
'invalid/nonexistent relation'.
  
Thanks,
Jyotsna

[mailto:llvm-commits-bounces@cs.uiuc.edu] On Behalf Of Tony Linthicum

Currently, we rely on switch tables to transform between formats. However,
we would like to have a different mechanism to represent these relationships
instead of switch tables. I am thinking of modeling these relations in
HexagonInstrInfo.td file and use TableGen to generate a table with the
information.

That would be a good idea. X86 could also use some help with opcode mapping tables.

Idea is to have a new class, say Relations, with some members
of type 'Instruction' each representing a specific relation between itself
and 'ThisInstr'.

For example:
class Relations {
               Instruction ThisInstr;
               Instruction BaseForm;
               Instruction TruePred;
               Instruction FalsePred;
}

def Rel_ADDrr : Relations<ADDrr, ADDrr, ADDrr_p, ADDrr_np>;
def Rel_ADDrr_p: Relations<ADDrr_p, ADDrr, , >;
def Rel_ADDrr_np : Relations<ADDrr_np,ADDrr, , >;

The problem is, this isn't really any better than having a large switch statement. You just moved the table into the .td file.

You should be taking advantage of the instruction multiclasses so you don't have to maintain a full table of opcodes.

/jakob

We wouldn't need to "maintain" this table of opcodes, as this table is
extracted from a higher level representation of the instruction set.
In particular, this table would not need to be modified at all for the
existing supported processors, and any new version of processors will
have its table automatically generated by us.

Why wouldn't it be acceptable for the *.td files defining the instructions
and the relations between them to be generated and not written by hand?

In my opinion, generating parts of the td files is much less error prone and
more convenient for us than having to write the insns definitions by hand
(that would indeed need "maintenance".)

Thanks,
Sebastian

Because it defeats the purpose of open source.

It would be impossible to improve the way Hexagon models instructions without changing these tables, and if they are just intermediate build products generated from secret 'real' sources, it is much more difficult.

This is supposed to be a collaboration that makes LLVM better for everybody. If you are just working around deficiencies in LLVM's tools instead of trying to improve them, there is really no point in sharing that work.

/jakob

From: llvmdev-bounces@cs.uiuc.edu [mailto:llvmdev-bounces@cs.uiuc.edu]
On Behalf Of Jakob Stoklund Olesen
Sent: Friday, August 03, 2012 10:02 AM
To: Sebastian Pop
Cc: llvmdev@cs.uiuc.edu
Subject: Re: [LLVMdev] TableGen related question for the Hexagon
backend

>> The problem is, this isn't really any better than having a large
switch statement. You just moved the table into the .td file.
>>
>> You should be taking advantage of the instruction multiclasses so
you don't have to maintain a full table of opcodes.
>
> We wouldn't need to "maintain" this table of opcodes, as this table
is
> extracted from a higher level representation of the instruction set.
> In particular, this table would not need to be modified at all for
the
> existing supported processors, and any new version of processors will
> have its table automatically generated by us.
>
> Why wouldn't it be acceptable for the *.td files defining the
instructions
> and the relations between them to be generated and not written by
hand?

Because it defeats the purpose of open source.

It would be impossible to improve the way Hexagon models instructions
without changing these tables, and if they are just intermediate build
products generated from secret 'real' sources, it is much more
difficult.

This is supposed to be a collaboration that makes LLVM better for
everybody. If you are just working around deficiencies in LLVM's tools
instead of trying to improve them, there is really no point in sharing
that work.

/Jakob

[Villmow, Micah] Just curious as we have a similar approach for some code in our backed, but if the higher level source and the scripts to produce the intermediates were provided, would that be sufficiently acceptable to using generated td files?

Why wouldn't it be acceptable for the *.td files defining the instructions
and the relations between them to be generated and not written by hand?

Because it defeats the purpose of open source.

It would be impossible to improve the way Hexagon models instructions without
changing these tables, and if they are just intermediate build products
generated from secret 'real' sources, it is much more difficult.

This is supposed to be a collaboration that makes LLVM better for
everybody.

I am only speaking about the Hexagon*.td files: I fail to see how generating
these from our a higher level representation of the Hexagon Instruction Set
manuals is "defeat[ing] the purpose of open source". Let's not turn this
technical review into an "what is open source" thread.

If you are just working around deficiencies in LLVM's tools instead
of trying to improve them, there is really no point in sharing that work.

Just curious: what deficiencies in LLVM do you think we working around here?

Thanks,
Sebastian

You tell me. The td files are supposed to be the primary source for a target description. You chose to use a different source language, and auto-generate td files from that. Why?

If you have an authoritative ISA description in machine-readable form, you can use it to verify that (parts of) your td files are correct. That would be very useful. But the td files should be the primary source.

I am not claiming the TableGen language is awesome, but it is never going to get better if lib/Target evolves into 17 different ways of auto-generating td files.

Most parts of LLVM are constantly improving, including the target descriptions. The goal is to have the best possible way of describing a target for assemblers, disassemblers and code generators. Auto-generated code prevents the language from evolving.

/jakob

No.

It's perfectly fine if the initial version of the td files are generated by a script, but td files in the LLVM tree need to be the primary source you work on going forward.

It think you would be better served by having a way of verifying the correctness of td files using your canonical architecture description. See for example Richard Barton's talk from the April developer meeting.

/jakob

[Villmow, Micah] Just curious as we have a similar approach for some code in our backed, but if the higher level source and the scripts to produce the intermediates were provided, would that be sufficiently acceptable to using generated td files?

No.

It's perfectly fine if the initial version of the td files are generated by a script, but td files in the LLVM tree need to be the primary source you work on going forward.

Ok, I think we are reaching here a middle ground. I think
that our scripts can detect redundancies and can zip the zillion
classes into multiclass hierarchies that would enable us to move
forward: evolving the code by hand and the result would also be
more readable for humans.

It think you would be better served by having a way of verifying the correctness of td files using your canonical architecture description. See for example Richard Barton's talk from the April developer meeting.

Very good point: we can still use our scripts for verification purposes.
Thanks for your useful remarks and guidance.

Sebastian

That would be a great start to good target description files.

Thanks,
/jakob

Hi Everyone,

After some more thoughts to the Jacob's suggestion of using multiclasses for
Opcode mapping, this is what I have come up with. Please take a look at the
design below and let me know if you have any suggestions/questions.

I have tried to keep the design target independent so that other targets
could benefit from it.

1) The idea is to add 3 new classes into include/llvm/Target/Target.td
which provide the basic
infrastructure for relating instructions with each other.

Class Relations {} // any target interested in mapping relations, need to
define a
                    // class of its own as a subclass of 'Relations'.

class IFormat<bits<4> value> { // Used to define basic instruction formats,
Ex: register-register, register-immediate
  bits<4> Value = value;
}

// class RelationMap is actually used to related instructions with each
other.
class RelationMap<IFormat pFormat, IFormat iFormat, list<string> ancestors =
[] > {
  IFormat ParentFormat = pFormat;
  IFormat InstrFormat = iFormat;
  list <string> Ancestors = ancestors;
}

2) Include some basic setup in the InstrInfo.td file. Any target interested
in mapping relation needs to
define a class as a subclass of 'Relations'. This class is primarily used
for the recordkeeping and has one
string variable for each basic format required for relation modeling. All
instructions that want to define
relationship mapping, should inherit from this class in addition to
RelationMap class. This is explained below
using a prototype model for Hexagon.

Example from Hexagon backend:

class RelHexagon : Relations {
  string InstFormat0; //prev form
  string InstFormat1; //rr (register register)
  string InstFormat2; //ri (register immediate)
  string InstFormat3; //pred true
  string InstFormat4; //pred false
}

Define Instruction formats which are IFormat objects and have a unique
integer value
associated with them. It is used to access appropriate field within
RelHexagon Class.

def Format_prev : IFormat<0>; // previous form
def Format_rr : IFormat<1>; // register register
def Format_ri : IFormat<2>; // register immediate
def Format_predt : IFormat<3>; // pred true def Format_predf : IFormat<4>;
// pred false

4) Prototype relationship model for a simple 'add' instruction with 6
variants:

                                            /---- if (p0) R1 = add(R2, R3)
                                           / (Predicate true)
Add register register - R1 = add(R2, R3) --
                                           \
                                            \---- if (!p0) R1 = add(R2, R3)
                                                  (predicate false)

                                               /---- if (p0) R1 = add(R2,
#12)
                                              / (Predicate true)
Add register immediate - R1 = add(R2, #12) --
                                              \
                                               \---- if (!p0) R1 = add(R2,
#12)
                                                    (predicate false)

multiclass Add_rr < IFormat TransformsFrom> {
  def #NAME# : V2_A2_add, RelationMap < TransformsFrom, Format_rr>;
  defm _pt : V2_A2_padd, RelationMap < Format_rr, Format_predt, ["rr"]>;
  defm _pf : V2_A2_padd, RelationMap < Format_rr, Format_predf, ["rr"] >; }

multiclass Add_ri < IFormat TransformsFrom> {
  def #NAME# : V2_A2_addi, RelationMap < TransformsFrom, Format_ri >;
  defm _pt : V2_A2_paddi, RelationMap < Format_ri, Format_predt, ["ri"] >;
  defm _pf : V2_A2_paddi, RelationMap < Format_ri, Format_predf, ["ri"] >;
}

multiclass Add_base {
  defm rr : Add_rr < Format_rr>;
  defm ri : Add_ri < Format_rr>;
}

defm ADD : Add_base, RelHexagon; <- defm needs to inherit from RelHexagon
Class here

5) We need some changes in the TGParser.cpp so that it can use the
information specified
through the RelationMap and populate relevant fields in the RelHexagon
class.
This eventually becomes part of the instruction definition (record).

bool TGParser::ParseDefm(MultiClass *CurMultiClass) { ...

  if (InheritFromClass) {
    // Process all the classes to inherit as if they were part of a
    // regular 'def' and inherit all record values.
    SubClassReference SubClass = ParseSubClassReference(0, false);
    while (1) {
      // Check for error.
      if (SubClass.Rec == 0) return true;

      // Get the expanded definition prototypes and teach them about
      // the record values the current class to inherit has
      for (unsigned i = 0, e = NewRecDefs.size(); i != e; ++i) {
        Record *CurRec = NewRecDefs[i];
**** proposed changes -begin *****
        if (SubClass.Rec->isSubClassOf("Relations")) {
          std::vector<Init*> InstrFormats(SubClass.Rec->getValues().size());
         1) Traverse thru all the records to see if CurRec's InstrFormat is
same as the ParentFormat for any of the
          records and they both share same ancestors. If so, insert that
record at the appropriate place in the
         vector using the integer value of the InstrFormat of the children
instructions.
         2) Use Vector to populate fields of the SubClass.Rec (RelHexagon in
our case) Class.
        }
**** proposed changes -end *****
     // All the fields from a subclass are eventually added into the
instruction record -- existing functionality

6) Add a new command line option into TableGen that will emit this
information as a table in a .inc file.

7) Target is required to add its own API to read the table and extract
relevant information.

Thanks,
Jyotsna

Hi Everyone,

After some more thoughts to the Jacob's suggestion of using multiclasses for
Opcode mapping, this is what I have come up with. Please take a look at the
design below and let me know if you have any suggestions/questions.

Hi Jyotsna,

You are on to something here, but you don't need to define a 'Relations' class on top of the tablegen records. They are already relations, you just need the proper query language to match the instructions you want.

You simply use the existing fields in your instructions, or add new ones as needed. You don't want to be limited to a single 'IFormat' as a column identifier, there can be many different types of relationships between instructions.

Do something like this:

def getPredicatedOpcode : InstrMapping {

  // Only include instructions form the PredRel class.
  let FilterClass = "PredRel";

  // Instructions with the same BaseOpcode field form a row.
  let RowFields = ["BaseOpcode"];

  // Instructions with the same predicate sense form a column.
  let ColFields = ["PredSense"];

  // The key column is the unpredicated instructions.
  let KeyCol = ["nopred"];

  // Value columns are predicate=true and predicate=false
  let ValueCols = [["true"], ["false"]];
};

That should be enough to generate a table:

// key , PredSense=true, PredSense=false
{ ADD , ADDtrue, ADDfalse, // BaseOpcode="ADD"
{ SUB , SUBtrue, SUBfalse, // BaseOpcode="SUB"

5) We need some changes in the TGParser.cpp so that it can use the
information specified
through the RelationMap and populate relevant fields in the RelHexagon
class.

The tablegen parser is definitely not the right place to implement this.

/jakob

Hi Jacob,

Thanks for the suggestions. I have a few questions here.

You are on to something here, but you don't need to define a 'Relations'

class

on top of the tablegen records. They are already relations, you just need

the

proper query language to match the instructions you want.

Are you saying that the mechanism is already present which allows us to
relate instructions with each other? What do you mean by a proper query
language?

You don't want to be limited to a single 'IFormat' as a column
identifier, there can be many different types of relationships between
instructions.

We do have different type of relationships between instructions. I define
multiple IFormat objects one per relationship which finally translates into
a unique column into the mapping table.

def Format_rr : IFormat<1>;
def Format_ri : IFormat<2>;
def Format_predt : IFormat<3>;
def Format_predf : IFormat<4>;

Addrr : { Addrr, Addri, Addrr_pt, Addrr_pf, .. , ..}
Addri : { Addrr, Addri, Addri_pt, Addri_pf,..

Do something like this:

def getPredicatedOpcode : InstrMapping {
  // Only include instructions form the PredRel class.
  let FilterClass = "PredRel";

  // Instructions with the same BaseOpcode field form a row.
  let RowFields = ["BaseOpcode"];

  // Instructions with the same predicate sense form a column.
  let ColFields = ["PredSense"];

  // The key column is the unpredicated instructions.
  let KeyCol = ["nopred"];

  // Value columns are predicate=true and predicate=false
  let ValueCols = [["true"], ["false"]]; };

Can you please elaborate it more? It seems interesting but I coundn't
understand it completely.
Also, how do I get the table from the definition above? For the table, I
need to know the name of the predicated-true and false instructions.

That should be enough to generate a table:

// key , PredSense=true, PredSense=false
{ ADD , ADDtrue, ADDfalse, // BaseOpcode="ADD"
{ SUB , SUBtrue, SUBfalse, // BaseOpcode="SUB"

> 5) We need some changes in the TGParser.cpp so that it can use the
> information specified through the RelationMap and populate relevant
> fields in the RelHexagon class.

The tablegen parser is definitely not the right place to implement this.

I didn't want to modify the tablegen parser either. But, I couldn't think of
a way around it. Once instructions are expended by the TableGen parser,
there is no way to relate them. Since I wanted to encode this formation
within the instruction itself while they are being expanded, parser appeared
to be the only place to do it.

Thanks,
Jyotsna

Hi Jacob,

Thanks for the suggestions. I have a few questions here.

You are on to something here, but you don't need to define a 'Relations'

class

on top of the tablegen records. They are already relations, you just need

the

proper query language to match the instructions you want.

Are you saying that the mechanism is already present which allows us to
relate instructions with each other? What do you mean by a proper query
language?

Yes, in the very simple sense that you can relate instructions that have the same value in a field:

def ADD {
  let BaseOpcode = "ADD";
  let PredSense = "nopred";
}

def ADDtrue {
  let BaseOpcode = "ADD";
  let PredSense = "true";
}

Inside a multiclass, the NAME variable is set to the base name of the defm. You can use that to relate your instructions.

You don't want to be limited to a single 'IFormat' as a column
identifier, there can be many different types of relationships between
instructions.

We do have different type of relationships between instructions. I define
multiple IFormat objects one per relationship which finally translates into
a unique column into the mapping table.

My point is that you don't need to define additional structure when you can just use the record fields.

def getPredicatedOpcode : InstrMapping {
// Only include instructions form the PredRel class.
let FilterClass = "PredRel";

// Instructions with the same BaseOpcode field form a row.
let RowFields = ["BaseOpcode"];

// Instructions with the same predicate sense form a column.
let ColFields = ["PredSense"];

// The key column is the unpredicated instructions.
let KeyCol = ["nopred"];

// Value columns are predicate=true and predicate=false
let ValueCols = [["true"], ["false"]]; };

Can you please elaborate it more? It seems interesting but I coundn't
understand it completely.
Also, how do I get the table from the definition above? For the table, I
need to know the name of the predicated-true and false instructions.

It's similar to an SQL self join:

select * from PredRel as Key
left outer join PredRel as Val1 on Val1.BaseOpcode = Key.BaseOpcode and Val1.PredSense = 'true'
left outer join PredRel as Val2 on Val2.BaseOpcode = Key.BaseOpcode and Val2.PredSense = 'false'
where Key.PredSense = 'nopred'

Basically, RowFields is a list of record fields that are used to identify a row in your table. All the instructions in a row has identical row fields.

Similarly, ColFields identifies instructions in a column of your table. All instructions in a column have identical column fields.

KeyCol specifies the value of the column fields in your key column. ValueCols identifies the other columns in the table.

It should be an error if there are multiple instructions that fit a table entry because both row and column fields are identical.

You don't need to change the parser. Simply start from RecordKeeper::getAllDerivedDefinitions(FilterClass). Identify the row and column (if any) of each instruction, and build your table from that.

/jakob

Hi Jacob,

Your suggestion worked for the simple relations between instructions as
you've included in your example. With one small change, I am able to
represent more complex relations as well.
In the Hexagon backend, a predicated instruction can translate into another
form called 'predicate new'. So, in our example of 'ADD', we can have
another transformation like this -

ADD--- ---> ADDtrue -----> ADDtru_new (predicate new form of true)
          \-----> ADDfalse -----> ADDfalse_new (predicate new form of false)

// Define Predicate New relation
def getPredNewOpcode : InstrMapping {
  let FilterClass = "PredNewRel";

  let RowFields = ["BaseOpcode"];

// ColFields is a list of flags/attributes of the instructions.
  let ColFields = ["DotNewType", "PredSense"];

// Here 'DotNewType' of the KeyCol is "" and Predsense can be either 'true'
or 'false'
  let KeyCol = ["", "-"];

// Value Column has DotNewType= "new" and predsense same as KeyCol.
// '-' is used to indicate the "PredSense" value to be same as KeyCol.
  let ValueCols = ["new", "-"];
}

def ADDtrue_new {
  let BaseOpcode = "ADD";
  let PredSense = "true";
  let DotNewType = "new";
}

This allows me to list all the attributes that must remain same between the
Key column and the related instructions. Let me know what you think about
this.

Thanks,
Jyotsna

> Are you saying that the mechanism is already present which allows us
> to relate instructions with each other? What do you mean by a proper
> query language?

Yes, in the very simple sense that you can relate instructions that have

the

same value in a field:

def ADD {
  let BaseOpcode = "ADD";
  let PredSense = "nopred";
}

def ADDtrue {
  let BaseOpcode = "ADD";
  let PredSense = "true";
}

Inside a multiclass, the NAME variable is set to the base name of the

defm.

You can use that to relate your instructions.

I found 'NAME' variable very difficult to use.

>> You don't want to be limited to a single 'IFormat' as a column
>> identifier, there can be many different types of relationships
>> between instructions.
>
> We do have different type of relationships between instructions. I
> define multiple IFormat objects one per relationship which finally
> translates into a unique column into the mapping table.

My point is that you don't need to define additional structure when you

can

just use the record fields.

>> def getPredicatedOpcode : InstrMapping { // Only include
>> instructions form the PredRel class.
>> let FilterClass = "PredRel";
>>
>> // Instructions with the same BaseOpcode field form a row.
>> let RowFields = ["BaseOpcode"];
>>
>> // Instructions with the same predicate sense form a column.
>> let ColFields = ["PredSense"];
>>
>> // The key column is the unpredicated instructions.
>> let KeyCol = ["nopred"];
>>
>> // Value columns are predicate=true and predicate=false let
>> ValueCols = [["true"], ["false"]]; };
>
> Can you please elaborate it more? It seems interesting but I coundn't
> understand it completely.
> Also, how do I get the table from the definition above? For the table,
> I need to know the name of the predicated-true and false instructions.

It's similar to an SQL self join:

select * from PredRel as Key
left outer join PredRel as Val1 on Val1.BaseOpcode = Key.BaseOpcode and
Val1.PredSense = 'true'
left outer join PredRel as Val2 on Val2.BaseOpcode = Key.BaseOpcode and
Val2.PredSense = 'false'
where Key.PredSense = 'nopred'

Basically, RowFields is a list of record fields that are used to identify

a row in

your table. All the instructions in a row has identical row fields.

Similarly, ColFields identifies instructions in a column of your table.

All

instructions in a column have identical column fields.

KeyCol specifies the value of the column fields in your key column.

ValueCols

identifies the other columns in the table.

It should be an error if there are multiple instructions that fit a table

entry

I am not sure I understand what you are suggesting. What would your table look like?

If you have multiple fields that must be identical in a row, you can add multiple RowFields entries.

/jakob

You're right. I can have use RowFields for that purpose.

Thanks,
Jyotsna

Jakob,

One more question. You had suggested 'ValueCols' as of type
list<list<string> >. Does the TableGen know how to extract it? It appears to
me that we may have to add support for that.

Thanks,
Jyotsna

You just start from getValueAsListInit() and go from there.

/jakob

Sounds good. I've started adding TableGen backend support to emit
relationship table into a .inc file. Hoping to finish it soon.

Thanks for all your help.

-Jyotsna