Problem with cross class joins in the RegisterCoalescer

Hi,

Is it intended that in some cases it is necessary to use
"-disable-cross-class-join" to be sure the resulting code is ok?

I have several cases where cross class joins are carried out that makes
the code turn out illegal, because the "new" register class is not
allowed in all instructions where it is now used.

For example, by joining %vreg4, %vreg7 and %vreg9 the following code

      %vreg7<def> = COPY %vreg4:lo16; aNl_0_7:%vreg7 aN32_0_7:%vreg4
      %vreg9<def> = COPY %vreg7; rN:%vreg9 aNl_0_7:%vreg7
      %vreg17<def> = load %vreg9<kill>; aN40_0_7:%vreg17 rN:%vreg9

is turned into

      %vreg17<def> = load %vreg4:lo16<kill>; aN40_0_7:%vreg17 aN32_0_7:%vreg4

The load instruction however, can only use registers from the rN class,
but the coalescer has changed so it now uses the lo16 part of a aN32
register (which it cannot).

Moves can be carried out from a aN32:lo16 part to an rN register, but
aN32:lo16 and rN cannot be replaced with eachother in all instructions.

The "-disable-cross-class-join" flag prevents this problem from
happening but I'm afraid it will also prevent other (non-problematic)
cross class joins from happening so I'm hesitant to use it.

So, should this not happen, or is the flag needed, or is this just a
sign that we have a really weird or buggy register model?

Hi Patrik,

The "-disable-cross-class-join" flag prevents this problem from
happening but I'm afraid it will also prevent other (non-problematic)
cross class joins from happening so I'm hesitant to use it.

So, should this not happen, or is the flag needed, or is this just a
sign that we have a really weird or buggy register model?

Is this some recent problem? Or it's seen with LLVM 3.0 as well?

Sorry, our backend is currently based on LLVM 3.0. (We are currently in the process of rebasing.)

/Patrik Hägglund

Sorry, our backend is currently based on LLVM 3.0. (We are currently in the process of rebasing.)

Ah, ok. If it would be the recent problem then it might be the new
copy propagation pass which inserts such stuff...
I'm out of ideas off-hand, unfortunately. Maybe Jacob can comment.

Is it intended that in some cases it is necessary to use
"-disable-cross-class-join" to be sure the resulting code is ok?

No.

I have several cases where cross class joins are carried out that makes
the code turn out illegal, because the "new" register class is not
allowed in all instructions where it is now used.

For example, by joining %vreg4, %vreg7 and %vreg9 the following code

     %vreg7<def> = COPY %vreg4:lo16; aNl_0_7:%vreg7 aN32_0_7:%vreg4
     %vreg9<def> = COPY %vreg7; rN:%vreg9 aNl_0_7:%vreg7
     %vreg17<def> = load %vreg9<kill>; aN40_0_7:%vreg17 rN:%vreg9

is turned into

     %vreg17<def> = load %vreg4:lo16<kill>; aN40_0_7:%vreg17
aN32_0_7:%vreg4

The load instruction however, can only use registers from the rN class,
but the coalescer has changed so it now uses the lo16 part of a aN32
register (which it cannot).

When sub-registers are involved, the coalescer uses the getMatchingSuperRegClass hook to determine if the cross class join is possible, and what the resulting register class should be. In this case it would call

  getMatchingSuperRegClass(aN32_0_7, aNl_0_7, lo16)

Check your implementation of that hook, and reread the comment describing what it does. It is quite tricky to get it right. Or you could just rebase. On trunk, TableGen writes this difficult function for you.

So, should this not happen, or is the flag needed, or is this just a
sign that we have a really weird or buggy register model?

We can handle weird, but not buggy. :slight_smile:

Thanks!

Our bug is now fixed. Our getMatchingSuperRegClass is huge (more than 300 lines), messy, and incomplete.

Or you could just rebase. On trunk, TableGen writes this difficult function for you.

That in itself would be a compelling reason to get the rebase to trunk done. I just curious how large the generated version will be. :slight_smile:

/Patrik Hägglund

Our bug is now fixed.

Great!

Our getMatchingSuperRegClass is huge (more than 300 lines), messy, and incomplete.

The versions I deleted from ARM and X86 weren't pretty either.

Or you could just rebase. On trunk, TableGen writes this difficult function for you.

That in itself would be a compelling reason to get the rebase to trunk done. I just curious how large the generated version will be. :slight_smile:

Probably not that bad. TableGen uses a table that takes advantage of the topological register class ordering.

The X86 version looks like this:

const TargetRegisterClass *X86GenRegisterInfo::getMatchingSuperRegClass(const TargetRegisterClass *A, const TargetRegisterClass *B, unsigned Idx) const {
  static const unsigned Table[58][7][2] = {
    ...
  };
  assert(A && B && "Missing regclass");
  --Idx;
  assert(Idx < 7 && "Bad subreg");
  const unsigned *TV = Table[B->getID()][Idx];
  const unsigned *SC = A->getSubClassMask();
  for (unsigned i = 0; i != 2; ++i)
    if (unsigned Common = TV[i] & SC[i])
      return getRegClass(32*i + CountTrailingZeros_32(Common));
  return 0;
}

The table is just over 3kB. It is quite sparse and could probably be compressed. Patches welcome!

/jakob