Does Mips resolve hazard in pre-ra-sched or post-ra-sched?

Hi, LLVM,

I found LLVM codegen has 3 passes for instruction scheduling:

  1. pre-ra sched

  2. post-ra sched

  3. mi sched.

for RISC machines, there are data hazard cases appear only after Register Allocation(RA). for example, $t0 is used immediately after writing(RAW):

ld $t0, MEM

add $t2, $t0, $0

There may be one or more stall in pipeline. Instruction scheduler can detect this kinds of conflict and insert other instructions to avoid pipeline bubble. I think this work only can be done after RA. If so, what’s the purpose for 1). I found 1) is mandatory and 2/3) are optional. Further, at least one target enable pre-RA-sched with harzardRecognizer. Does it really work out? you can resolve data hazard using pre-RA-sched only?

thanks,

–lx

Mips invokes the post-RA scheduler only when OptLevel > Aggressive, so you will have to compile with -O3.

You can also invoke the MI (pre-RA) scheduler with llc option “-enable-misched”. As you have pointed out, the post-isel scheduler is mandatory, and therefore you don’t have to give any command line options.

Currently, mips has only one generic scheduling itinerary model in MipsSchedule.td that is not tailored to any specific core, so you might have to tweak it to have the scheduler generate efficient code for your target.

Akira,

Thanks you for response.

I understand Post-RA schedule make uses of scoreboardHazardRecognizer. But I found mips codes are good enough by default. basically, I can not easily eyeball any bubbles.

I don’t understand how they can do that without post-RA-sched. pre-ra-scheduler eg. (SelectionDAG/ScheduleDAGRRList.cpp) has little information and they can only schedule node in topology order. It assumes any SU is one cycle delay. I don’t think pre-ra-sched consider any pipeline details.

thanks,

–lx

Hi, Akira,

I found you maintain mips MipsSchedule.td. does it correct? in MipsSchedule.td, every InstrItinData only uses one InstrStage. there’s no ByPass info out there.

are you sure this reflects the real R4xxx/R5xxx processors.

why IILoad uses funcition unit ALU?
InstrItinData<IILoad , [InstrStage<3, [ALU]>]>

for my previous question, I have new input after reading the code. pre-RA-sched is derived from ScheduleDAGSNodes, but post-RA-sched and mi-sched are both derived from ScheduleDAGInstrs.that means pre-RA-sched schedules SDNodes. post-RA-sched schedules MIs.

from -debug-pass=Structure, we can see that the order is “mi-sched”==> RegisterAllocation==>post-RA TD.

Simple Register Coalescing
Machine Instruction Scheduler
Machine Block Frequency Analysis

Greedy Register Allocator
Virtual Register Rewriter

Post RA top-down list latency scheduler
Analyze Machine Code For Garbage Collection

for my testcase, I found -enable-misched is helpful for ARM, they reduce stall numbers from 205 to 160. however, mips is adverse impact. the stall number increases from 554 to 560. this doesn’t make any sense.

thanks

–lx

Hi, Akira,

I found you maintain mips MipsSchedule.td. does it correct? in
MipsSchedule.td, every InstrItinData only uses one InstrStage. there's no
ByPass info out there.
are you sure this reflects the real R4xxx/R5xxx processors.

why IILoad uses funcition unit ALU?
InstrItinData<IILoad , [InstrStage<3, [ALU]>]>

This means IILoad instructions use resource ALU for three cycles. I don't
remember why only two functional units (ALU and IMULDIV) are defined and
used in this .td file, but this would be incorrect if load instructions did
not have any resource conflicts with other ALU instructions on your target.

for my previous question, I have new input after reading the code.
pre-RA-sched is derived from ScheduleDAGSNodes, but post-RA-sched and
mi-sched are both derived from ScheduleDAGInstrs.that means pre-RA-sched
schedules SDNodes. post-RA-sched schedules MIs.

from -debug-pass=Structure, we can see that the order is "mi-sched"==>
RegisterAllocation==>post-RA TD.

      Simple Register Coalescing
      Machine Instruction Scheduler
      Machine Block Frequency Analysis
...
      Greedy Register Allocator
      Virtual Register Rewriter
...
      Post RA top-down list latency scheduler
      Analyze Machine Code For Garbage Collection
...

for my testcase, I found -enable-misched is helpful for ARM, they reduce
stall numbers from 205 to 160. however, mips is adverse impact. the stall
number increases from 554 to 560. this doesn't make any sense.

I don't know for sure why stalls increase. In my experience, both pre-RA
and post-RA improve performance when I run the compiled executables on a
mips board. But as I said in my previous email, the model in
MipsSchedule.td is just a generic one, so it's possible that it isn't
generating efficient code for your target.

At some point, it would be worth the parties interested in MIPS having a discussion about the correct way to support different variants. We are going to be open sourcing our MIPS IV implementation soon[1], which is written in a high-level HDL (BlueSpec) and is designed for teaching and research, so we anticipate a lot of universities using it as a base for experimentation. In the FreeBSD world, we also support a number of weird MIPS variants that use the CP2 instruction prefixes for interesting things.

In GCC, the traditional approach has been for each vendor to fork GCC and binutils, apply patches that add support for their architecture and break all of the others, and then fail to update them for any newer release, so you end up with the vendor's compiler (which is never updated, contains bugs, and misses support for newer language features) being the only one you can use when targeting a specific chip.

It would be great if we could structure the MIPS back end in such a way that it's easy to add processor variants that have very different interpretations of the CPx opcodes and instruction schedules, so that upstreaming vendor-specific modifications becomes easy (and out-of-tree implementations can be kept relatively isolated from changes upstream). This is something I will have time to work on over the next few months, as we already have 3 derivatives of our implementation with different extensions and are likely to add a lot more over coming years.

David

[1] I have patches to add MIPS IV support almost ready to upstream. Mostly it's a matter of changing the IsMIPS64 check to IsGPR64, and then adding some IsMIPS64 checks around a few instructions that are specific to MIPS64.