Implementing llvm.atomic.cmp.swap.i32 on PowerPC

Hi all,

I'm trying to figure out how to add the instructions required for
llvm.atomic.cmp.swap.i32 on PowerPC. I figured out LWARX (patch
attached) but the other two (CMP_UNRESw and STWCX) require multiple
instructions:

  let Defs = [CR0] in {
  def STWCX : Pseudo<(outs), (ins GPRC:$rS, memrr:$dst, i32imm:$label),
                    "stwcx. $rS, $dst\n\tbne- La${label}_entry\nLa${label}_exit:",
                     [(PPCstcx GPRC:$rS, xoaddr:$dst, imm:$label)]>;
  
  def CMP_UNRESw : Pseudo<(outs), (ins GPRC:$rA, GPRC:$rB, i32imm:$label),
                           "cmpw $rA, $rB\n\tbne- La${label}_exit",
                           [(PPCcmp_unres GPRC:$rA, GPRC:$rB, imm:$label)]>;
  }

...and I can't figure out the syntax for that. Any suggestions?

Cheers,
Gary

lwarx.patch (694 Bytes)

Hi Gary,

You have to write custom encoding logic in C++ for this. This should go in PPCCodeEmitter::emitBasicBlock. I admit this isn't very elegant. A better solution would be to fix the lowering to produce two instructions instead of this pseudo instruction.

-Chris

Chris Lattner wrote:

> def CMP_UNRESw : Pseudo<(outs), (ins GPRC:$rA, GPRC:$rB, i32imm:
> $label),
> "cmpw $rA, $rB\n\tbne- La${label}_exit",
> [(PPCcmp_unres GPRC:$rA, GPRC:$rB, imm:
> $label)]>;
> }
>
> ...and I can't figure out the syntax for that. Any suggestions?

You have to write custom encoding logic in C++ for this. This
should go in PPCCodeEmitter::emitBasicBlock. I admit this isn't
very elegant. A better solution would be to fix the lowering to
produce two instructions instead of this pseudo instruction.

Hi Chris,

I'd prefer to fix the lowering if possible; the pseudo instructions
are only used in three places, so it shouldn't be a huge change.
I need to generate labels in PPCTargetLowering::LowerAtomicCMP_SWAP
however: how do I do that? FWIW the code it needs to emit is:

  ; inputs: ptr, oldval, newval
  loop:
        lwarx $tmp, 0, $ptr
        cmpw $oldval, $tmp
        bne- exit
        stwcx. $newval, 0, $ptr
        bne- loop
  exit:
        ...

Cheers,
Gary

You need to insert new basic blocks and update CFG to accomplish this. There is a hackish way to do this right now. Add a pseudo instruction to represent this operation and mark it usesCustomDAGSchedInserter. This means the intrinsic is mapped to a single (pseudo) node. But it is then expanded into instructions that can span multiple basic blocks. See PPCTargetLowering::EmitInstrWithCustomInserter().

Evan

Evan Cheng wrote:

You need to insert new basic blocks and update CFG to accomplish this.
There is a hackish way to do this right now. Add a pseudo instruction
to represent this operation and mark it usesCustomDAGSchedInserter.
This means the intrinsic is mapped to a single (pseudo) node. But it
is then expanded into instructions that can span multiple basic
blocks. See PPCTargetLowering::EmitInstrWithCustomInserter().

How does this look? It's a big patch, but it basically does this:

- Adds ATOMIC_LOAD_ADD, ATOMIC_CMP_SWAP and ATOMIC_SWAP nodes,
   and ATOMIC_LOAD_ADD_I{32,64}, ATOMIC_CMP_SWAP_I{32,64} and
   ATOMIC_SWAP_I{32,64} pseudo-instructions with custom inserters.

- Replaces L[WD]ARX and ST[WD]CX pseudo-instructions with the
   actual PPC instructions they represent.

- Removes CMP_UNRESERVE nodes and CMP_UNRES[wd]{,i} pseudo-
   instructions.

Cheers,
Gary

new-atomics.patch (26.7 KB)

Hi Gary,

The patch looks great. But I do have one comment:

+let usesCustomDAGSchedInserter = 1 in {
+ let Uses = [CR0] in {
+ let Uses = [R0] in
+ def ATOMIC_LOAD_ADD_I32 : Pseudo<

The "let Uses = [R0]" is not needed. The pseudo instruction will be expanded like this later:

+ BuildMI(BB, TII->get(is64bit ? PPC::LDARX : PPC::LWARX), dest)
+ .addReg(ptrA).addReg(ptrB);
+ BuildMI(BB, TII->get(is64bit ? PPC::ADD4 : PPC::ADD8), PPC::R0)
+ .addReg(incr).addReg(dest);
+ BuildMI(BB, TII->get(is64bit ? PPC::STDCX : PPC::STWCX))
+ .addReg(PPC::R0).addReg(ptrA).addReg(ptrB);

The second instruction defines R0 and the 3rd reads R0 which is enough to tell the register allocator what to do.

I do have a question, must it use R0? If it's not fixed, it's probably better to create a new virtual register and use that instead.

Thanks,

Evan

Hi Evan,

Evan Cheng wrote:

The patch looks great. But I do have one comment:

+let usesCustomDAGSchedInserter = 1 in {
+ let Uses = [CR0] in {
+ let Uses = [R0] in
+ def ATOMIC_LOAD_ADD_I32 : Pseudo<

The "let Uses = [R0]" is not needed. The pseudo instruction will be
expanded like this later:

+ BuildMI(BB, TII->get(is64bit ? PPC::LDARX : PPC::LWARX), dest)
+ .addReg(ptrA).addReg(ptrB);
+ BuildMI(BB, TII->get(is64bit ? PPC::ADD4 : PPC::ADD8), PPC::R0)
+ .addReg(incr).addReg(dest);
+ BuildMI(BB, TII->get(is64bit ? PPC::STDCX : PPC::STWCX))
+ .addReg(PPC::R0).addReg(ptrA).addReg(ptrB);

The second instruction defines R0 and the 3rd reads R0 which is
enough to tell the register allocator what to do.

I do have a question, must it use R0? If it's not fixed, it's
probably better to create a new virtual register and use that
instead.

It's not fixed. How do I create a new virtual register? This is my
first real go at adding to LLVM, so I'm kind of figuring it out as I
go along :slight_smile:

Cheers,
Gary

Look for createVirtualRegister. These are examples in PPCISelLowering.cpp.

Evan

Cool, thanks, I'll check that out.

Cheers,
Gary

Evan Cheng wrote:

Would it be acceptable to change MachineInstr::getRegInfo from private
to public so I can use it from PPCTargetLowering::EmitInstrWithCustomInserter?

Cheers,
Gary

Evan Cheng wrote:

PPCTargetLowering::EmitInstrWithCustomInserter has a reference
to the current MachineFunction for other purposes. Can you use
MachineFunction::getRegInfo instead?

Dan

Ah, didn't see that, that's what comes of trying to do something at
5pm :slight_smile: I attached an updated patch which creates a virtual register
instead of using R0. How does this look?

Cheers,
Gary

Dan Gohman wrote:

ppc-atomics-take2.patch (26.9 KB)

Looks good.

+ unsigned temp;
+ if (is64bit)
+ temp = RegInfo.createVirtualRegister(&PPC::GPRCRegClass);
+ else
+ temp = RegInfo.createVirtualRegister(&PPC::G8RCRegClass);

How about?

const TargetRegisterClass *RC = is64Bit ? &PPC:GPRCRegClass : &PPC:G8RCRegClass;
unsigned TmpReg = RegInfo.createVirtualRegister(RC);

Evan

Evan Cheng wrote:

How about?

const TargetRegisterClass *RC = is64Bit ? &PPC:GPRCRegClass :
&PPC:G8RCRegClass;
unsigned TmpReg = RegInfo.createVirtualRegister(RC);

I tried something like that yesterday:

  const TargetRegisterClass *RC =
    is64bit ? &PPC::GPRCRegClass : &PPC::G8RCRegClass;

but I kept getting this error no matter how I arranged it:

  error: conditional expression between distinct pointer types
  ‘llvm::PPC::GPRCClass*’ and ‘llvm::PPC::G8RCClass*’ lacks a cast

Any suggestions?

Cheers,
Gary

Just cast both values to const TargetRegisterClass*.

Evan

Cool, that worked. New patch attached...

Cheers,
Gary

Evan Cheng wrote:

ppc-atomics-take3.patch (26.9 KB)

Hi Gary,

This does not patch cleanly for me (PPCISelLowering.cpp). Can you prepare a updated patch?

Thanks,

Evan

Hi Evan,

Evan Cheng wrote:

This does not patch cleanly for me (PPCISelLowering.cpp). Can you
prepare a updated patch?

This should work, though I won't have access to my test box now until
next Thursday so no guarantees :slight_smile:

Cheers,
Gary

ppc-atomics-take4.patch (26.5 KB)

Committed. Thanks!

Evan

Thank you

Cheers,
Gary

Evan Cheng wrote: