Mischeduler: Unknown reason for peak register pressure increase

I am working on a project where we are integrating an existing pre-RA scheduler into LLVM and we are trying to match our peak register pressure values with the machine instruction schedulers values while using X86. I am finding some mismatches in test cases like the one attached. The registers “AH” and “AL” are live-out but not live-in and I don’t see that they are defined in the block when walking through the operands for these instructions. The peak pressure printouts from ScheduleDAGMILive look like they are accounting for AH and AL being live because the corresponding pressure sets for these register classes are increased. In the mischeduler is there a way to discover that these two registers may be contributing to peak pressure in the block?

Thanks,

Austin Kerbow

rp_testcase.txt (2.54 KB)

AL and AH are subregisters of RAX. -Eli

Some notes, though I am not sure I completely understand your expectations yet:

  • Pressure and the live-out sets are measured in terms of register units. They allow to factor out register aliases/hierarchies. In this case I assume at least one of RAX, EAX, AX is used somewhere so we report the two underlying register units. The debug print may be a bit confusing as we typically choose the lowest registers in the hierarchy as the name of the register units. You can find some information about register units in MCRegisterInfo.h or my dev meeting talk about register hierarchies.

  • The register pressure estimates in the scheduler are not absolute; they only capture registers used inside the scheduling region. There may be more values being alive throughout the scheduling region but not used inside the region. The idea is that these are not essential and will probably be spilled outside the loop/region and so shouldn’t influence our scheduling decisions. [1]

  • Matthias

[1] While this a good assumption for loops, I have seen this go wrong in situations in which an inner loop is composed of more than 1 scheduling region. I plan to change LLVMs strategy to capture all registers used inside the current loop for the tracking at some point.

Thank you for your answers! I am essentially trying to mimic the behavior of LLVM's register pressure tracker within our combinatorial scheduler. To do this I am collecting register Def/Use information using llvm::RegisterOperands, and finding LiveIn/LiveOut registers using llvm::ScheduleDAGMILive.RegPressure.LiveIn/LiveOut. When a register with some register class is live in our scheduler, we increase the current pressure in the corresponding pressure sets by the weight of the register class. When using this method, for most scheduling regions the peak register pressure for each pressure set match in our scheduler and in the mischeduler. However, in approximately 1 in 10 scheduling regions we are seeing a mismatch between our peak pressure values per-PSet and the values from the mischeduler. Below is a region with this type of mismatch. As an example, the PSset GR8_ABCD_L has a peak value of 1. However I don't see that this PSet is associated with any register class that is an operand of any instruction in the region nor do I see anything about this PSet in the Pressure Diff printouts. There are also no live-in registers that I can see. Do you have any ideas about what could be missing in our implementation?

These are some of the PSets missing pressure increases for in our implementation for this region.

SU(0): %vreg2<def> = MOVSDrm %RIP, 1, %noreg, <cp#0>, %noreg; mem:LD8[ConstantPool] FR64:%vreg2
  # preds left : 0
  # succs left : 1
  # rdefs left : 0
  Latency : 4
  Depth : 0
  Height : 4
  Successors:
   data SU(2): Latency=4 Reg=%vreg2
  Pressure Diff : FR32 -1 FR32X -1

SU(1): %EDI<def,dead> = MOV32ri64 <ga:@_ZSt4cout>, %RDI<imp-def>
  # preds left : 0
  # succs left : 1
  # rdefs left : 0
  Latency : 1
  Depth : 0
  Height : 1
  Successors:
   ord SU(4294967295) *: Latency=1
  Pressure Diff : GR64_NOREX_and_GR64_TC -1 LOW32_ADDR_ACCESS_with_sub_32bit+GR64_NOREX_and_GR64_TC -1 GR64_NOREX -1 GR64_TC -1 LOW32_ADDR_ACCESS_with_sub_32bit+GR64_TC -1 GR64_TC+GR64_TCW64 -1 GR8 -1 GR8+GR64_NOREX -1 GR8+GR64_TCW64 -1 GR64_NOREX+GR64_TC -1 GR8+GR64_TC -1 GR16 -1

SU(2): %XMM0<def> = COPY %vreg2; FR64:%vreg2
  # preds left : 1
  # succs left : 1
  # rdefs left : 0
  Latency : 0
  Depth : 4
  Height : 0
  Predecessors:
   data SU(0): Latency=4 Reg=%vreg2
  Successors:
   ord SU(4294967295) *: Latency=0
  Pressure Diff : VR128L -1

Max Pressure: GR8_ABCD_H=1
GR8_ABCD_L=1
VR128L=1
GR32_TC=2
LOW32_ADDR_ACCESS_with_sub_32bit+GR64_NOREX_and_GR64_TCW64=2
GR64_NOREX_and_GR64_TC=2
LOW32_ADDR_ACCESS_with_sub_32bit+GR64_NOREX_and_GR64_TC=2
FR32=1
GR64_NOREX=2
GR64_TCW64=2
LOW32_ADDR_ACCESS_with_sub_32bit+GR64_TCW64=2
GR64_TC=2
LOW32_ADDR_ACCESS_with_sub_32bit+GR64_TC=2
GR64_TC+GR64_TCW64=2
GR8=2
GR8+GR64_NOREX=2
GR8+GR64_TCW64=2
GR64_NOREX+GR64_TC=2
GR8+GR64_TC=2
FR32X=1
GR16=2
Live In:
Live Out: AH AL
Live Thru:

BB#0: derived from LLVM BB %entry
  ADJCALLSTACKDOWN64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>