Issue describing instructions with tablegen

Hi,

I’m writing a backend for a fairly odd architecture. It has many different register banks which are are identified by a few bits in the instruction. These identifier bits are variable length whilst the instruction length stays constant. An example instruction would be:

ASM: mov R0, A1
BIN: 0b0100011100 | 111 | 00 | 0101 | 1

The first “section” is the mov opcode. The next section (111) says that the src register bank is R and 00 identifies R0 in that bank (the R bank has 4 registers). Likewise, 0101 identifies register bank A and 1 identifies A1 (the A bank only has 2 registers, so 1 bit is enough). I hope that explains the variable length properties of the instructions/identifiers.

There are 9 banks, using my naive approach of one instruction description per permutation this results in 81 different instructions (and 810 lines). I wrote a simple perl script to auto generate all these instructions, the result looks like:

def MOV_r_r : MOVrr <(outs RRegs:$dst), (ins RRegs:$src),
[/* No pattern */]> {
bits<2> dst;
bits<2> src;

}

Is there a better way of doing this? This instruction “style” is used throughout almost every instruction. Ending up with a ~800 line .td file for every single instruction seems wrong. Did I miss some tablegen functionality that would be perfect for this?

If this is the best way of approaching the problem, would I need to have 81 if statements in my XXXXInstrInfo::copyPhysReg to identify which instruction to use when copying between registers?

Thanks,

Johnny

From: "Johnny Val" <johnnydval@gmail.com>
To: llvmdev@cs.uiuc.edu
Sent: Tuesday, July 8, 2014 4:45:59 AM
Subject: [LLVMdev] Issue describing instructions with tablegen

Hi,

I'm writing a backend for a fairly odd architecture. It has many
different register banks which are are identified by a few bits in
the instruction. These identifier bits are variable length whilst
the instruction length stays constant. An example instruction would
be:

ASM: mov R0, A1
BIN: 0b0100011100 | 111 | 00 | 0101 | 1

The first "section" is the mov opcode. The next section (111) says
that the src register bank is R and 00 identifies R0 in that bank
(the R bank has 4 registers). Likewise, 0101 identifies register
bank A and 1 identifies A1 (the A bank only has 2 registers, so 1
bit is enough). I hope that explains the variable length properties
of the instructions/identifiers.

I think that you're looking at this the wrong way: The register encoding should be the bank+id, and so you only need one mov instruction. If you have instructions that are specific to a bank, and thus don't encode the full register value, then only take the necessary bits in the encoding specification for those particular instructions.

-Hal