Add not instruction to PTX backend

Hi, all

  I am trying to add "not" instruction support to PTX backend.
I add the line below in PTXInstrInfo.td,

defm NOT : PTX_LOGIC<"not", not>;

  But I get errors below,

Hi, all

I am trying to add “not” instruction support to PTX backend.
I add the line below in PTXInstrInfo.td,

defm NOT : PTX_LOGIC<“not”, not>;

But I get errors below,


Included from PTX.td:75:
PTXInstrInfo.td:732:10: error: Value ‘PTX_LOGIC::opnode’ of type ‘SDNode’ is incompatible with initializer ‘not’
defm NOT : PTX_LOGIC<“not”, not>;
^
llvm[3]: Building PTX.td subtarget information with tblgen
Included from PTX.td:75:
PTXInstrInfo.td:732:10: error: Value ‘PTX_LOGIC::opnode’ of type ‘SDNode’ is incompatible with initializer ‘not’
defm NOT : PTX_LOGIC<“not”, not>;
^
llvm[3]: Building PTX.td assembly writer with tblgen
Included from PTX.td:75:
PTXInstrInfo.td:732:10: error: Value ‘PTX_LOGIC::opnode’ of type ‘SDNode’ is incompatible with initializer ‘not’
defm NOT : PTX_LOGIC<“not”, not>;
^

How do I fix those errors?

I also see that multiclass PTX_LOGIC only handles 3 operand form,
i.e., xor.pred d, a, b;. Should I add something to make PTX_LOGIC
support 2 operand form?

We’re been writing multiclasses for each unique type of instruction. The current PTX_LOGIC version is for 3-operand instructions. A new multiclass needs to be created for 2-operand logic instructions.

Hi, Justin

We're been writing multiclasses for each unique type of instruction. The
current PTX_LOGIC version is for 3-operand instructions. A new multiclass
needs to be created for 2-operand logic instructions.

  I am trying to add a multiclass for 2-operand logic instructions. For
example,

multiclass PTX_LOGIC_2OP<string opcstr, SDNode opnode> {
  def ripreds : InstPTX<(outs Preds:$d),
                     (ins Preds:$a),
                     !strconcat(opcstr, ".pred\t$d, $a"),
                     [(set Preds:$d, (opnode Preds:$a))]>;

  ...
}

  But the error is still the same. Where else should I look into?

  Thanks.

Regards,
chenwj

陳韋任 wrote:

Hi, Justin

  
We're been writing multiclasses for each unique type of instruction.  The
current PTX_LOGIC version is for 3-operand instructions.  A new multiclass
needs to be created for 2-operand logic instructions.
    

  I am trying to add a multiclass for 2-operand logic instructions. For
example,

multiclass PTX_LOGIC_2OP<string opcstr, SDNode opnode> {
  def ripreds : InstPTX<(outs Preds:$d),
                     (ins Preds:$a),
                     !strconcat(opcstr, ".pred\t$d, $a"),
                     [(set Preds:$d, (opnode Preds:$a))]>;

  ...
}
  

The error here is due to the fact that the ‘not’ dag is defined as a pattern fragment rather than being an explicit dag node in itself.

As a result, you’d need to define your multiclass as:

multiclass PTX_LOGIC_20P<string opcstr, PatFrag opnode> { … }

This will correctly match the opnode, though it’ll depend on the other 2-operand logic instructions whether this is how it should be done. If you look at the definitions in include/Target/TargetSelectionDAG.td, you’ll see which instructions are defined as a PatFrag and which are plain SDNodes.

Dan

The error here is due to the fact that the 'not' dag is defined as a
pattern fragment rather than being an explicit dag node in itself.

As a result, you'd need to define your multiclass as:

multiclass PTX_LOGIC_20P<string opcstr, PatFrag opnode> { ... }

This will correctly match the opnode, though it'll depend on the other
2-operand logic instructions whether this is how it should be done. If
you look at the definitions in include/Target/TargetSelectionDAG.td,
you'll see which instructions are defined as a PatFrag and which are
plain SDNodes.

Dan

Thanks, Dan.

Hi, Dan

  I add "not" instruction support in PTXInstrInfo.td as
you suggested before.

multiclass PTX_LOGIC_2OP<string opcstr,PatFrag opnode> {
  ...
}

  Now I am trying to write test case for logic and shift
operations. But I have a trouble in mapping LLVM IR to PTX
IR for "not" instruction. The test case I wrote is,

define ptx_device i16 @t4_u16(i16 %x) {
; CHECK: not.b16 rh0, rh1, rh2;
; CHECK-NEXT: ret;
  %z = xor i16 %x, 1
  ret i16 %z
}

Since LLVM IR doesn't support logical not directly, I use "xor i16 %x, 1"
to represent logical not in LLVM IR. It turns out the IR is mapped to
PTX "xor" not PTX "not". Any idea on how to get the mapping correct?

  Thanks.

Regards,
chenwj

hi,

define ptx_device i16 @t4_u16(i16 %x) {
; CHECK: not.b16 rh0, rh1, rh2;
; CHECK-NEXT: ret;
%z = xor i16 %x, 1

it should be %z = xor i16 %x, 0xffff?

ret i16 %z
}

best regards
ether

Hi,

Hope you've got this to work. LLVM IR is pretty low-level, much more so than PTX. As there are quite a few different ways to achieve the same thing in PTX, it's likely that we won't need to handle all of the different instructions for PTX as the IR will only support a subset of them.

In the future, I would suggest putting together the test case first and only when the backend is incapable of lowering a particular IR instruction then look to add support to the backend. Your suggested implementation looks completely valid, it's just that it may not actually be needed! (Note I haven't investigated whether or not that's the case here though.) :slight_smile:

Dan

陳韋任 wrote:

Hi, Dan

  Someone on the irc suggest me using custom lowering to do the mapping
correctly. But I am still trying to figure out how to do that.

Regards,
chenwj

Hi, Dan

Someone on the irc suggest me using custom lowering to do the mapping
correctly. But I am still trying to figure out how to do that.

Custom lowering means implementing the SelectionDAG instruction selection in the C++ class instead of the TableGen file. See PTXISelLowering.{h,cpp} for some examples. This just allows arbitrary matching and code generation logic instead of just pattern matching.

Though, I have to agree with Dan on assessing whether the selection logic is needed. Do you have an example where the PTX back-end cannot generate code for some piece of LLVM IR because of the lack of ‘not’ selection?

Though, I have to agree with Dan on assessing whether the selection logic is
needed. Do you have an example where the PTX back-end cannot generate code
for some piece of LLVM IR because of the lack of 'not' selection?

  Honestly, I don't have such example yet. Just want to try to implement
some instructions by myself. :stuck_out_tongue:

Regards,
chenwj

陳韋任 wrote:

Though, I have to agree with Dan on assessing whether the selection logic is
needed.  Do you have an example where the PTX back-end cannot generate code
for some piece of LLVM IR because of the lack of 'not' selection?
    

  Honestly, I don't have such example yet. Just want to try to implement
some instructions by myself. :p

Regards,
chenwj
  

Great. I’ve just been looking through the backend and I believe we can now handle all the standard IR instructions. For many of them though it’s really just the very simplest implementation, so there’s still loads more to do in handling all the modifiers and various options for each instruction. That might be a good starting point if you still wish to contribute, here’s a quick ideas list (Justin, Che-Liang, feel free to comment on any of this):

  • .address_size - New addition to ptx 2.3, which I saw you added as a target in the backend, so it would be good to explicitly add this option (much like .target and .version).

  • ftz - We need a new attribute (-mattr=ftz) that matches the nvcc flag which switches on ftz for all supporting fp operations (add.ftz.f32, etc)

  • rcp - We could also start matching more complex patterns to support a larger subset of ptx instructions, I’d use ‘mad’ as a guide and try and match 1.0f / value → rcp(value) to handle reciprocals.

There are loads more opportunities along these lines, these are just a few examples that might be a relatively easy starting point for you or anyone else that’s keen on helping out. Beyond this, I’d look at the PTX spec and nvcc flags for more ideas. :slight_smile:

Dan