[SPARC]: leon2 and leon3: not respecting delayed-write to Y-register

Hi,

in section B.29. (Write State Register Instructions) of 'The SPARC
Architecture Manual Version 8' it is said that the "The write state
register instructions are delayed-write instructions."

The Y-register is a state-register.

Furthermore in the B.29-secion there is a programming note saying:

  MULScc, RDY, SDIV, SDIVcc, UDIV, and UDIVcc implicitly read the Y
  register. If any of these instructions execute within three
  instructions after a WRY which changed the contents of the Y
  register, its results are undefined.

This currently is not respected with LLVM. I'm using 3.9 and checked
whether any commit on the master-branch handles this case, I haven't
seen anything - but I might be wrong.

Unfortunately I'm not (yet) qualified enough to implement a solution
for this problem.

IMHO the best solution could be to add a pass which checks
whether there are read-hazards after any state-register-writes (Y,
PSR, ASR, WIM, TBR) and which adds NOPs if necessary or even reschedules
other instructions. (Some years ago, there was this
HazardRecognition-class in the scheduler which could be used for
that.)

The easy solution would be to simply add three NOPs the moment where
the WRY instruction is added:

https://github.com/llvm-mirror/llvm/blob/master/lib/Target/Sparc/SparcISelDAGToDAG.cpp#L357

I'd appreciate any help and guidances of how to fix this problem.
Starting with adding NOPs and maybe adding a pass to the LEONPasses.
(There is the Filler-pass, is this the right one?)

--- Code example:

clang compiles the following c-code:

    int main(void)
    {
      int *a = (int *) 0x80000000;
      int *b = (int *) 0x80000004;
      return *a / *b;
    }

to

    [...]
    49c: b5 3e 60 1f sra %i1, 0x1f, %i2
    4a0: 81 80 00 1a wr %i2, %y
    4a4: b0 7e 40 18 sdiv %i1, %i0, %i0
    4a8: 81 c7 e0 08 ret

gcc does:

    [...]
    4a0: 87 38 60 1f sra %g1, 0x1f, %g3
    4a4: 81 80 e0 00 wr %g3, %y
    4a8: 01 00 00 00 nop
    4ac: 01 00 00 00 nop
    4b0: 01 00 00 00 nop
    4b4: 82 78 40 02 sdiv %g1, %g2, %g1
    4b8: b0 10 00 01 mov %g1, %i0
    4bc: 81 e8 00 00 restore
    4c0: 81 c3 e0 08 retl

Hi Patrick,

For MIPS we have a similar situation with compact branch instructions, in that they can not be
scheduled back to back, doing so triggers a reserved instruction exception. For my
implementation of compact branches, I extended the instruction definition to tag the relevant
instructions, then had a simple pass which iterated over the instructions very late in the
compilation pipeline to insert nops.

If you're going down the same route, I heavily suggest avoid extending the delay slot filler as
it could end up moving the instruction you're trying to guard.

Using a separate pass may give you better results than blindly inserting nops during ISel.

https://reviews.llvm.org/rL263444 is the commit where I implemented this for reference.

Another thing you can try is to coax the scheduler to pick something else if possible by increasing
the latency of instructions that write to State registers, though you'd still need some other
mechanism to ensure that those arch constraints are met.

Thanks,
Simon

For this we specialise a target specific hazard checker with constraints. During scheduling we maintain a map describing the resources used in a cycle, and then the hazard checker inspects this map to determine if the instruction can be inserted into the selected cycle. This checks for obscure situation such as instruction X cannot be in the 3rd cycle after instruction Y unless instruction Z is in the 8th cycle previous. It works pretty well, and we have many strange instruction relationships that are managed this way due to the interaction of multiple functional units in our VLIW instructions.

  MartinO

The existing DelaySlotFiller logic might be extensible for this. It
currently handles the fcmp variants and restore, I think. Depending on
the specific logic you want to model, that would be the best start, I
think.

Joerg

There is also the InsertNOPLoad pass in the Leon-specific part.

Joerg

Yep, this definitely isn't done. However, it appears that on the hardware
I've looked at, it didn't appear to actually be necessary (as one might
expect from sane hardware...), so I'd not gotten around to fixing it.

Are you looking at this because you have a program which is mis-executing
on some hardware you have, or just because you were looking at the assembly
output and noticed the spec non-compliance?

Yes, my hardware (Leon2, a SparcV8) is (unfortunately)
standard-compliant and has this problem.