scheduling across calls to avoid spills

I have some machine code that looks something like:

  flag = true
  c = call()
  if (c)
    flag = false
    ...
  ...
  if (flag) ...

The 'flag' variable is a Boolean value of type i1. The target has 1-bit condition registers but none are preserved across calls. So flag is spilled to memory, resulting in terrible code.

There is no dependency between the flag=true instruction and the call, so I'd like to move it after the call. I was hoping the front end would do it via TargetTransformInfo::getCostOfKeepingLiveOverCall. But apparently that only applies to vectorization transforms.

The pre-RA MachineScheduler won't do it either, because scheduling regions are delimited by calls. (From MachineScheduler.cpp: "MachineScheduler does not currently support scheduling across calls.")

Is there some standard way in the front end or codegen to make this happen?

-Alan

Well, hopefully someone else will chime in with ideas—maybe some target specific code already does this. I’ll just confirm that this case was not handled at the time the LLVM backend pipeline was designed. The intention was that global code motion across blocks, across calls, and even call reordering, to reduce register pressure would eventually be handled by an earlier pass in the machine pipeline while still in SSA.

It is certainly possible to add MachineScheduler support to schedule across calls. Jonas prototyped it here: https://reviews.llvm.org/D15667
But I think it’s overkill just to fix register liveness issues that can easily be detected earlier.

Since you already have the TTI hook, the fastest way to fix this might be with an IR pass (addIRPass) that just identifies these basic patterns. If the pattern isn’t obvious in IR, then a machine pass shouldn’t be that much harder.

-Andy

I have some machine code that looks something like:

flag = true
c = call()
if (c)
flag = false


if (flag) …

The ‘flag’ variable is a Boolean value of type i1. The target has 1-bit condition registers but none are preserved across calls. So flag is spilled to memory, resulting in terrible code.

If the code looks like this at the IR level (in SSA):

bb1:
c = call()
if (c)
bb2:

bb3:
flag = phi [ true, bb1 ], [ false, bb2 ]
if (flag) …

and copy instructions are inserted when going out of SSA, wouldn’t “flag = true” typically be inserted at the end of bb1 (ie. after the call)?