Canonical way to handle zero registers?

I looked around the codebase and didn’t see anything that obviously looked like the natural place to turn constant zero immediates into zero-registers (i.e. registers that always return zero when read). Right now we are expanding them in ISelLowering::LowerOperation but that seems too early.

The specific issue I’m hitting is that we have a register that reads as -1 and so when we replace -1 too early with this register, the standard “not” pattern (xor x, -1) will fail to match to “not”.

Thanks,
Sean Silva

Hi Sean,

Have you looked at inheriting from llvm:SelectionDAGISel for your target, invoking runOnMachineFunction to perform
ISEL, then post processing the output by finding the cases where -1 is synthesized then used and replacing the uses
of the synthesized -1 with the register wired to -1?

The MIPS backend takes this approach for dealing with the zero register, see MipSEISelDAGToDAG.cpp for reference.

Thanks,
Simon

Thanks, that sounds like it would work. Was this based on what any other target did? Or do any other targets take this approach?

I just want to make sure that we don’t already have a hook suitable for this. Overriding runOnFunction to run what could be described as just a “late SelectionDAG pass” sounds pretty intrusive. Do you remember other approaches that didn’t work?

– Sean Silva

An obvious approach that doesn't work: just writing a pattern. This
causes assertions, seemingly as some code paths don't like the
introduction of a physical register.

At least AArch64, Lanai, and RISC-V handle the zero register in
TgtDAGToDAGISel::Select. Lanai also has a "-1" register and handles
that case in the same place.

Copying from LanaiDAGToDAGISel::Select:

  EVT VT = Node->getValueType(0);
  switch (Opcode) {
  case ISD::Constant:
    if (VT == MVT::i32) {
      ConstantSDNode *ConstNode = cast<ConstantSDNode>(Node);
      // Materialize zero constants as copies from R0. This allows the coalescer
      // to propagate these into other instructions.
      if (ConstNode->isNullValue()) {
        SDValue New = CurDAG->getCopyFromReg(CurDAG->getEntryNode(),
                                             SDLoc(Node), Lanai::R0, MVT::i32);
        return ReplaceNode(Node, New.getNode());
      }
      // Materialize all ones constants as copies from R1. This allows the
      // coalescer to propagate these into other instructions.
      if (ConstNode->isAllOnesValue()) {
        SDValue New = CurDAG->getCopyFromReg(CurDAG->getEntryNode(),
                                             SDLoc(Node), Lanai::R1, MVT::i32);
        return ReplaceNode(Node, New.getNode());
      }
    }
    break;

Best,

Alex

Hi Sean,

I didn’t implement that particular functionality as it was before my time at MIPS, Akira (+cc) may recall the
specifics why he took that approach.

As far as I can see, the MIPS’ approach pre-dates the AArch64 style approach which is also used by Lanai &
RISCV as Alex highlights, so I believe it was a novel approach. It appears no other targets take this approach
of a late SelectionDAG pass.

Thanks,
Simon

Thanks! That looks like a winning approach.

I swear I grepped around for ISD::Constant but for some reason never found this code. I think maybe I was searching for ISD::Constant with setOperationAction, which in hindsight was narrowing down my search to just lowering, which is exactly what I didn’t want! (I was looking for other approaches). I also tried looking in depth at PowerPC but it looks like it doesn’t use this approach either.

– Sean Silva

The function looks for “addiu $dst, $zero, 0” and tries to replace it with $zero. I don’t remember whether there was a reason this had to be done after isel. It seems that you can just do it in DAGToDAGISel::Select.

What’s the reason for trying to handle this in SelectionDAG at all? I would just materialize zero like any other constant, and treat replacing that with the zero register as an immediate folding optimization (e.g. FoldImmediate or another peephole pass)

-Matt

I thought about doing that, but I wasn’t sure I could make it work.

The issue is that the hardwired registers are actually the only way to write immediates of this register class (the registers are very small, obviously). I’ve been phrasing this as integer 0 (and -1) to keep the discussion closer to other architectures, but the fact that these hardwired registers are the only way to reference immediates of this register class is one important difference.

One thing I’ve been curious about is how immediates interact with register classes. Could we use ordinary immediate MachineOperand’s (of the appropriate bit width) and just print the immediate MO’s of this register class as the corresponding hardwired register? Does MIR have any constraints on using an immediate MO instead of a register?

– Sean Silva

You can construct an instruction that has an immediate operand in place of a register, but that won't work well. For one, the MachineVerifier will complain about having an invalid operand, plus any code that tries to use operand information for that instruction may end up "surprised" to see an immediate where a register was expected.

There is an assumption in MIR that physical registers should not be used as explicit operands before RA, except in COPY instructions, so the most "canonical" way of having it in MIR is something like
   %vreg2 = COPY %PHYSICAL_ZERO
   ... = %vreg2

If using these special registers is the only way to put an immediate in a register of that class, then that suggests that immediates are generally not "legal" for that register class (except for the special cases). It means that after legalization, the selection DAG should not contain immediates of the type associated with that register class except for the values that are explicitly allowed. You could then simply write a custom selection code for ISD::Constant, and turn it into "CopyFromReg".

-Krzysztof

You could do this, but you would have to define a custom operand type with custom verification for the allowed operand types. This is how AMDGPU handles most operands which can be registers or specific immediates

-Matt

Thanks! That’s exactly the explanation I was looking for!

Hi Sean,

Just to give the GlobalISel perspective on this, GlobalISel supports the declaration of a zero register in the register class like so:
  def GPR32z : RegisterOperand<GPR32> {
    let GIZeroRegister = WZR;
  }
With that definition, the tablegen-erated ISel code will try to replace will try to replace 'G_CONSTANT s32 0' with WZR whenever the operand is specified as GPR32z.

Hi Sean,

Just to give the GlobalISel perspective on this,

Thanks for chiming in!

GlobalISel supports the declaration of a zero register in the register
class like so:
        def GPR32z : RegisterOperand<GPR32> {
          let GIZeroRegister = WZR;
        }
With that definition, the tablegen-erated ISel code will try to replace
will try to replace 'G_CONSTANT s32 0' with WZR whenever the operand is
specified as GPR32z.

Is this method extensible to the case of other hardwired register values?
Tracing through the code, I noticed that it seems to boil down to a
GIR_CopyOrAddZeroReg opcode, which seems like a pretty deep embedding of
the specialness of zero.

Also, IIUC, GPR32z is defining a special reg class that enables the
zero-register transformation. Defining a new reg class seems pretty
heavyweight for this; is there ever a situation where you wouldn't want the
zero-register transformation to fire so you could just put this on GPR32
itself? I noticed that there actually aren't very many uses of GPR32z which
seems strange, as I would expect most instructions could make use of the
zero register.

Thanks,

-- Sean Silva

At the moment, zero is hardwired since losing an AArch64 optimization was the motivating case behind adding it and the only other hardwired register I knew about was Mips’s $0 but I see no reason we couldn’t expand on it with a target hook of some kind.

I haven’t got around to rolling it out to GPR32 yet, we think it’s safe to do that but there are a couple instructions where wzr/xzr aren’t permitted. At the moment, it’s on the instructions that lost the optimization when tablegen took over from the C++.

Hi Sean,

Just to give the GlobalISel perspective on this,

Thanks for chiming in!

GlobalISel supports the declaration of a zero register in the register
class like so:
        def GPR32z : RegisterOperand<GPR32> {
          let GIZeroRegister = WZR;
        }
With that definition, the tablegen-erated ISel code will try to replace
will try to replace 'G_CONSTANT s32 0' with WZR whenever the operand is
specified as GPR32z.

Is this method extensible to the case of other hardwired register values?
Tracing through the code, I noticed that it seems to boil down to a
GIR_CopyOrAddZeroReg opcode, which seems like a pretty deep embedding of
the specialness of zero.

At the moment, zero is hardwired since losing an AArch64 optimization was
the motivating case behind adding it and the only other hardwired register
I knew about was Mips's $0 but I see no reason we couldn't expand on it
with a target hook of some kind.

Also, IIUC, GPR32z is defining a special reg class that enables the
zero-register transformation. Defining a new reg class seems pretty
heavyweight for this; is there ever a situation where you wouldn't want the
zero-register transformation to fire so you could just put this on GPR32
itself? I noticed that there actually aren't very many uses of GPR32z which
seems strange, as I would expect most instructions could make use of the
zero register.

I haven't got around to rolling it out to GPR32 yet, we think it's safe to
do that but there are a couple instructions where wzr/xzr aren't permitted.
At the moment, it's on the instructions that lost the optimization when
tablegen took over from the C++.

Makes sense, thanks for the explanation!

-- Sean Silva