How to define complicated instruction in TableGen (Direct3D shader instruction)

Each register is a 4-component (namely, r, g, b, a) vector register.
They are actually defined as llvm packed [4xfloat].

The instruction:

  add_sat r0.a, r1_bias.xxyy, r3_x2.zzzz

Explaination:

'.a' is a writemask. only the specified component will be update

'.xxyy' and '.zzzz' are swizzle masks, specify the component
permutation, simliar to the Intel SSE permutation instruction SHUFPD

'_bias' and '_x2' are modifiers. they modify the value of source
operands and send the modified values to the adder. '_bias' = source -
0.5, '_x2' = source * 2

'_sat' is an instruction modifier. when specified, it saturates (or
clamps) the instruction result to the range [0, 1] before writing to
the destination register.

All of these 'writemask', 'swizzle', 'source modifier', and
'instruction modifiers' are optionally specified.

How should I define the instruction in a TableGen .td file?

I have two alternatives:

1.
  class WriteMask : Operand<i8> {}
  def WM : WriteMask;

  class Swizzle : Operand<8> {}
  def SW: Swizzle;

  class InstructionModifier : Operand<i8> {}
  def IM: InstructionModifier ;
  
  class SourceModifier : Operand<i8> {}
  def SM: SourceModifier ;

  def ADD<0x01, (ops
    GPR:$dest, ops WM:$wm, IM:$im,
    GPR:$src0, SW:$sw0, SM:$sm0,
    GPR:$src1, SW:$sw1 SM:$sm1 ), ... >

2. add llvm intrinsics:

  ; add_sat r0.a, r1_bias.xxyy, r3_x2.zzzz
  r1_1 = llvm.bias( r1_0 )
  r1_2 = llvm.shuffle( xxyy )
  r3_1 = llvm.x2( r3_0 )
  r3_2 = llvm.shuffle( zzzz )
  r0_0 = add r1_2, r3_2
  r0_1 = llvm.sature( r0_0 )
  r0_2 = llvm.select( a )

but it makes the implementing the instruction selector very diffifult.
in this example, llvm.select() and llvm.sature() are encountered frist
(bootm-up), but they must be 'remembered' and the instruction cannot
be generated (BuildMI) until the opcode is known.

Which one should I do?

Actually the problems that Tzu-Chien Chiu are encountering are similar to what should be done for generating SSE code in the X86 backend and also other SIMD instruction sets. I think LLVM neeeds to add instructions for permuting components, extracting and injecting elements in packed types. If the architecture has instructions which can do permutations for each instruction (for example 'add' with permutation) it should be the role of the pattern instruction selector to recognise the shuffle+add combination and emit a single instruction.

m.

Tzu-Chien Chiu wrote:

Hi,

I am working on this. Part of my Ph.D. thesis work involves extending the LLVM instruction set to express vector parallelism, including but not limited to subword SIMD-style parallelism. We already have extract and inject (we call it combine) instructions. Permutation is something we are going to add. All of this will be checked into LLVM at some point, but I'm not sure when. If you would like to discuss this or have suggestions, your input would be welcome.

Rob

Actually the problems that Tzu-Chien Chiu are encountering are similar to what should be done for generating SSE code in the X86 backend and also other SIMD instruction sets. I think LLVM neeeds to add instructions for permuting components, extracting and injecting elements in packed types. If the architecture has instructions which can do permutations for each instruction (for example 'add' with permutation) it should be the role of the pattern instruction selector to recognise the shuffle+add combination and emit a single instruction.

Agreed 100%.

-Chris

Tzu-Chien Chiu wrote:

Each register is a 4-component (namely, r, g, b, a) vector register. They are actually defined as llvm packed [4xfloat].

The instruction:

  add_sat r0.a, r1_bias.xxyy, r3_x2.zzzz

Explaination:

'.a' is a writemask. only the specified component will be update

'.xxyy' and '.zzzz' are swizzle masks, specify the component
permutation, simliar to the Intel SSE permutation instruction SHUFPD

'_bias' and '_x2' are modifiers. they modify the value of source
operands and send the modified values to the adder. '_bias' = source -
0.5, '_x2' = source * 2

'_sat' is an instruction modifier. when specified, it saturates (or
clamps) the instruction result to the range [0, 1] before writing to
the destination register.

All of these 'writemask', 'swizzle', 'source modifier', and
'instruction modifiers' are optionally specified.

How should I define the instruction in a TableGen .td file?

I have two alternatives:

1. class WriteMask : Operand<i8> {}
  def WM : WriteMask;

  class Swizzle : Operand<8> {}
  def SW: Swizzle;

  class InstructionModifier : Operand<i8> {}
  def IM: InstructionModifier ;
    class SourceModifier : Operand<i8> {}
  def SM: SourceModifier ;

  def ADD<0x01, (ops GPR:$dest, ops WM:$wm, IM:$im, GPR:$src0, SW:$sw0, SM:$sm0,
    GPR:$src1, SW:$sw1 SM:$sm1 ), ... >

2. add llvm intrinsics:

  ; add_sat r0.a, r1_bias.xxyy, r3_x2.zzzz
  r1_1 = llvm.bias( r1_0 )
  r1_2 = llvm.shuffle( xxyy )
  r3_1 = llvm.x2( r3_0 )
  r3_2 = llvm.shuffle( zzzz )
  r0_0 = add r1_2, r3_2
  r0_1 = llvm.sature( r0_0 )
  r0_2 = llvm.select( a )

but it makes the implementing the instruction selector very diffifult.
in this example, llvm.select() and llvm.sature() are encountered frist
(bootm-up), but they must be 'remembered' and the instruction cannot
be generated (BuildMI) until the opcode is known.

Which one should I do?

_______________________________________________
LLVM Developers mailing list
LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-Chris