Possible bug in ExpandShiftWithUnknownAmountBit

Hello,

I'm working in adding support for 64-bit integers to my target. I'm using
LLVM to decompose the 64-bit integer operations by using 32-bit registers
wherever possible and emulating support where not. When looking at the bit
shift decomposition I saw what seems to be a bug in the implementation. The
affected function is ExpandShiftWithUnknownAmountBit in
LegalizeIntegerTypes.cpp. Below is the original code and the proposed fix.
Could someone please review the changes? If they are correct how do I go
about submitting a patch?

Thanks,
Javier

[Original]
/// ExpandShiftWithUnknownAmountBit - Fully general expansion of integer
shift
/// of any size.
bool DAGTypeLegalizer::
ExpandShiftWithUnknownAmountBit(SDNode *N, SDValue &Lo, SDValue &Hi) {
  SDValue Amt = N->getOperand(1);
  EVT NVT = TLI.getTypeToTransformTo(*DAG.getContext(),
N->getValueType(0));
  EVT ShTy = Amt.getValueType();
  unsigned NVTBits = NVT.getSizeInBits();
  assert(isPowerOf2_32(NVTBits) &&
         "Expanded integer type size not a power of two!");
  DebugLoc dl = N->getDebugLoc();

  // Get the incoming operand to be shifted.
  SDValue InL, InH;
  GetExpandedInteger(N->getOperand(0), InL, InH);

  SDValue NVBitsNode = DAG.getConstant(NVTBits, ShTy);
  SDValue Amt2 = DAG.getNode(ISD::SUB, dl, ShTy, NVBitsNode, Amt);
  SDValue Cmp = DAG.getSetCC(dl, TLI.getSetCCResultType(ShTy),
                             Amt, NVBitsNode, ISD::SETULT);

  SDValue Lo1, Hi1, Lo2, Hi2;
  switch (N->getOpcode()) {
  default: llvm_unreachable("Unknown shift");
  case ISD::SHL:
    // ShAmt < NVTBits
    Lo1 = DAG.getConstant(0, NVT); // Low part is zero.
    Hi1 = DAG.getNode(ISD::SHL, dl, NVT, InL, Amt); // High part from Lo
part.

    // ShAmt >= NVTBits
    Lo2 = DAG.getNode(ISD::SHL, dl, NVT, InL, Amt);
    Hi2 = DAG.getNode(ISD::OR, dl, NVT,
                      DAG.getNode(ISD::SHL, dl, NVT, InH, Amt),
                      DAG.getNode(ISD::SRL, dl, NVT, InL, Amt2));

    Lo = DAG.getNode(ISD::SELECT, dl, NVT, Cmp, Lo1, Lo2);
    Hi = DAG.getNode(ISD::SELECT, dl, NVT, Cmp, Hi1, Hi2);
    return true;
  case ISD::SRL:
    // ShAmt < NVTBits
    Hi1 = DAG.getConstant(0, NVT); // Hi part is zero.
    Lo1 = DAG.getNode(ISD::SRL, dl, NVT, InH, Amt); // Lo part from Hi
part.

    // ShAmt >= NVTBits
    Hi2 = DAG.getNode(ISD::SRL, dl, NVT, InH, Amt);
    Lo2 = DAG.getNode(ISD::OR, dl, NVT,
                     DAG.getNode(ISD::SRL, dl, NVT, InL, Amt),
                     DAG.getNode(ISD::SHL, dl, NVT, InH, Amt2));

    Lo = DAG.getNode(ISD::SELECT, dl, NVT, Cmp, Lo1, Lo2);
    Hi = DAG.getNode(ISD::SELECT, dl, NVT, Cmp, Hi1, Hi2);
    return true;
  case ISD::SRA:
    // ShAmt < NVTBits
    Hi1 = DAG.getNode(ISD::SRA, dl, NVT, InH, // Sign extend high
part.
                       DAG.getConstant(NVTBits-1, ShTy));
    Lo1 = DAG.getNode(ISD::SRA, dl, NVT, InH, Amt); // Lo part from Hi
part.

    // ShAmt >= NVTBits
    Hi2 = DAG.getNode(ISD::SRA, dl, NVT, InH, Amt);
    Lo2 = DAG.getNode(ISD::OR, dl, NVT,
                      DAG.getNode(ISD::SRL, dl, NVT, InL, Amt),
                      DAG.getNode(ISD::SHL, dl, NVT, InH, Amt2));

    Lo = DAG.getNode(ISD::SELECT, dl, NVT, Cmp, Lo1, Lo2);
    Hi = DAG.getNode(ISD::SELECT, dl, NVT, Cmp, Hi1, Hi2);
    return true;
  }

  return false;
}

[Proposed]
/// ExpandShiftWithUnknownAmountBit - Fully general expansion of integer
shift
/// of any size.
bool DAGTypeLegalizer::
ExpandShiftWithUnknownAmountBit(SDNode *N, SDValue &Lo, SDValue &Hi) {
  SDValue Amt = N->getOperand(1);
  MVT NVT = TLI.getTypeToTransformTo(N->getValueType(0));
  MVT ShTy = Amt.getValueType();
  unsigned NVTBits = NVT.getSizeInBits();
  LLVM_ASSERT(isPowerOf2_32(NVTBits) &&
         "Expanded integer type size not a power of two!");
  DebugLoc dl = N->getDebugLoc();

  // Get the incoming operand to be shifted.
  SDValue InL, InH;
  GetExpandedInteger(N->getOperand(0), InL, InH);

  SDValue NVBitsNode = DAG.getConstant(NVTBits, ShTy);
  SDValue NVSizeSubAmt = DAG.getNode(ISD::SUB, dl, ShTy, NVBitsNode, Amt);
  SDValue AmtSubNVSize = DAG.getNode(ISD::SUB, dl, ShTy, Amt, NVBitsNode);
  SDValue Cmp = DAG.getSetCC(dl, TLI.getSetCCResultType(ShTy),
                             Amt, NVBitsNode, ISD::SETULT);

  SDValue Lo1, Hi1, Lo2, Hi2;
  switch (N->getOpcode()) {
  default: LLVM_ASSERT(0 && "Unknown shift");
  case ISD::SHL:
    // ShAmt < NVTBits
    Lo1 = DAG.getNode(ISD::SHL, dl, NVT, InL, Amt);
    Hi1 = DAG.getNode(ISD::OR, dl, NVT,
                      DAG.getNode(ISD::SHL, dl, NVT, InH, Amt),
                      DAG.getNode(ISD::SRL, dl, NVT, InL, NVSizeSubAmt));

    // ShAmt >= NVTBits
    Lo2 = DAG.getConstant(0, NVT); // Low part is
zero.
    Hi2 = DAG.getNode(ISD::SHL, dl, NVT, InL, AmtSubNVSize); // High part
from Lo part.

    Lo = DAG.getNode(ISD::SELECT, dl, NVT, Cmp, Lo1, Lo2);
    Hi = DAG.getNode(ISD::SELECT, dl, NVT, Cmp, Hi1, Hi2);
    return true;
  case ISD::SRL:
    // ShAmt < NVTBits
    Lo1 = DAG.getNode(ISD::OR, dl, NVT,
                     DAG.getNode(ISD::SRL, dl, NVT, InL, Amt),
                     DAG.getNode(ISD::SHL, dl, NVT, InH, NVSizeSubAmt));
    Hi1 = DAG.getNode(ISD::SRL, dl, NVT, InH, Amt);

    // ShAmt >= NVTBits
    Lo2 = DAG.getNode(ISD::SRL, dl, NVT, InH, AmtSubNVSize); // Lo part
from Hi part.
    Hi2 = DAG.getConstant(0, NVT); // Hi part is
zero.

    Lo = DAG.getNode(ISD::SELECT, dl, NVT, Cmp, Lo1, Lo2);
    Hi = DAG.getNode(ISD::SELECT, dl, NVT, Cmp, Hi1, Hi2);
    return true;
  case ISD::SRA:
    // ShAmt < NVTBits
    Lo1 = DAG.getNode(ISD::OR, dl, NVT,
                     DAG.getNode(ISD::SRL, dl, NVT, InL, Amt),
                     DAG.getNode(ISD::SHL, dl, NVT, InH, NVSizeSubAmt));
    Hi1 = DAG.getNode(ISD::SRA, dl, NVT, InH, Amt);

    // ShAmt >= NVTBits
    Lo2 = DAG.getNode(ISD::SRA, dl, NVT, InH, AmtSubNVSize); // Lo part
from Hi part.
    Hi2 = DAG.getConstant(0, NVT); // Hi part is
zero.

    Lo = DAG.getNode(ISD::SELECT, dl, NVT, Cmp, Lo1, Lo2);
    Hi = DAG.getNode(ISD::SELECT, dl, NVT, Cmp, Hi1, Hi2);
    return true;
  }

  return false;
}

[snip]

Please use "svn diff" to generate proposed patches; the form you used
is extremely difficult to read.

-Eli

Hi Eli,

I'm not familiar with how to use SVN that way plus my code is based off 2.4
so I can't compare with TOT. Attached are Unix diff and HTML reports of the
changes to ExpandShiftWithUnknownAmountBit using a different tool (Araxis
Merge). I hope this suffices.

Thanks,
Javier

Hello,

I'm working in adding support for 64-bit integers to my target. I'm

using

LLVM to decompose the 64-bit integer operations by using 32-bit

registers

ExpandShiftDiffReport.html (63 KB)

ExpandShiftDiffReport.txt (2.78 KB)

Hi,

I'm working in adding support for 64-bit integers to my target. I'm using
LLVM to decompose the 64-bit integer operations by using 32-bit registers
wherever possible and emulating support where not. When looking at the bit
shift decomposition I saw what seems to be a bug in the implementation. The
affected function is ExpandShiftWithUnknownAmountBit in
LegalizeIntegerTypes.cpp. Below is the original code and the proposed fix.
Could someone please review the changes? If they are correct how do I go
about submitting a patch?

can you please describe the problem and how you fix it (in words, not code).

Thanks,

Duncan.

Hi Duncan,

The problem is the implementation of the expansion. Perhaps an example can help illustrate better. Take the case of a 64-bit integer shifted left by say 6 bits and is decomposed using 32-bit registers. Because 6 is less than the 32 (the register size) the resulting low part should be equal to the source low part shifted left by 6 bits. The current implementation places a zero instead. All other values have similar errors. Below is the current and proposed expansion in pseudo C++ for all shift functions. Please let me know if I something is unclear.

- SHL:
[Current]
dst.lo = (shift < regsize) ? 0 : src.lo << shift;
dst.hi = (shift < regsize) ? src.lo << shift : ((src.hi << shift) | (src.lo >> regsize - shift));

[Proposed]
dst.lo = (shift < regsize) ? src.lo << shift : 0;
dst.hi = (shift < regsize) ? ((src.hi << shift) | (src.lo >> shift - regsize)) : src.lo << shift - regsize;

- SRL/SRA
[Current]
dst.lo = (shift < regsize) ? src.hi >> shift : ((src.lo >> shift) | (src.hi << regsize - shift));
dst.hi = (shift < regsize) ? 0 : src.hi >> shift;

[Proposed]
dst.lo = (shift < regsize) ? ((src.lo >> shift) | (src.hi << regsize - shift)) : src.hi << shift - regsize;
dst.hi = (shift < regsize) ? src.hi << shift : 0;

Thanks,
Javier

Hi Javier,

The problem is the implementation of the expansion. Perhaps an example can help illustrate better. Take the case of a 64-bit integer shifted left by say 6 bits and is decomposed using 32-bit registers. Because 6 is less than the 32 (the register size) the resulting low part should be equal to the source low part shifted left by 6 bits. The current implementation places a zero instead. All other values have similar errors. Below is the current and proposed expansion in pseudo C++ for all shift functions. Please let me know if I something is unclear.

I'm not sure what you are saying, here is the code for shift-left. In
the case of your example, Amt = 6, VTBits = 64 and NVTBits = 32. I've
added comments at the end of various lines, like this: <== Comment.

   if (N->getOpcode() == ISD::SHL) { <== This branch is taken
     if (Amt > VTBits) { <== False
... not executed ...
     } else if (Amt > NVTBits) { <== False
... not executed ...
     } else if (Amt == NVTBits) { <== False
... not executed ...
     } else if (Amt == 1 &&
                TLI.isOperationLegalOrCustom(ISD::ADDC, TLI.getTypeToExpandTo(*DAG.getContext(), NVT))) { <== False
... not executed ...
     } else { <== This branch is taken
       Lo = DAG.getNode(ISD::SHL, dl, NVT, InL, DAG.getConstant(Amt, ShTy)); <== Source low part is shifted left by 6 bits
       Hi = DAG.getNode(ISD::OR, dl, NVT,
                        DAG.getNode(ISD::SHL, dl, NVT, InH,
                                    DAG.getConstant(Amt, ShTy)),
                        DAG.getNode(ISD::SRL, dl, NVT, InL,
                                    DAG.getConstant(NVTBits-Amt, ShTy))); <== Source high part is shifted left by 6 bits, and the top 6 bits of the low part is or'd in
     }
     return;
   }

In short, the current code seems to be correct.

Ciao,

Duncan.

Duncan,

It seems that the code you pasted came from the function
ExpandShiftByConstant and indeed it looks correct. In my example I used 6
as the shift amount but forgot to mention that it's stored in a register.
Otherwise ExpandShiftWithUnknownAmountBit wouldn't get called. Below is the
execution of DAGTypeLegalizer::ExpandIntRes_Shift() using my example
showing how ExpandShiftWithUnknownAmountBit gets called. My email is about
the logic of this function.

if (ConstantSDNode *CN = dyn_cast<ConstantSDNode>(N->getOperand(1))) { <==
False
  ...
}

if (ExpandShiftWithKnownAmountBit(N, Lo, Hi)) { <== The call returns False
  ...
}

if (N->getOpcode() == ISD::SHL) { <== Branch taken
    PartsOpc = ISD::SHL_PARTS;
} else if (N->getOpcode() == ISD::SRL) {
  ...
} else {
  ...
}

if ((Action == TargetLowering::Legal && TLI.isTypeLegal(NVT)) ||
      Action == TargetLowering::Custom) { <== False
  ...
}

if (N->getOpcode() == ISD::SHL) { <== Branch taken
} else if (N->getOpcode() == ISD::SRL) {
  ...
} else {
  ...
}

if (LC != RTLIB::UNKNOWN_LIBCALL && TLI.getLibcallName(LC)) { <== Returns
False as I set the lib call name to NULL from the target selection lowering
constructor
  ...
}

if (!ExpandShiftWithUnknownAmountBit(N, Lo, Hi)) { <== Expand returns true
  ...
}

Thanks,
Javier

Hi Javier,

It seems that the code you pasted came from the function
ExpandShiftByConstant and indeed it looks correct. In my example I used 6
as the shift amount but forgot to mention that it's stored in a register.

I see, sorry I didn't read your email more carefully. It does look like
ExpandShiftWithUnknownAmountBit is bogus - at a glance it looks like the
Cmp condition is inverted. I may have time to take a closer look tomorrow.

Best wishes,

Duncan.

Hi Duncan,

Just wondering if you had a chance to look at the expansion.

Thanks,
Javier

Hi Javier,

It seems that the code you pasted came from the function
ExpandShiftByConstant and indeed it looks correct. In my example I used

6

as the shift amount but forgot to mention that it's stored in a

register.

I see, sorry I didn't read your email more carefully. It does look like
ExpandShiftWithUnknownAmountBit is bogus - at a glance it looks like the
Cmp condition is inverted. I may have time to take a closer look

tomorrow.

Hi Javier,

Just wondering if you had a chance to look at the expansion.

I wrote my own fix and committed it in revision 90482. Do you
have a testcase I can add to the testsuite (preferably not an
execution test, but rather something that checks the assembler
being generated)?

Ciao,

Duncan.

Duncan,

Thanks for committing a fix. I see that you fixed a bug in my patch with
the HiL in the case of an SRA.

The issue with a test case is that it will depend on the target to expose
it. Like you noted in the check-in comment x86 doesn't expose the bug. The
target I'm working on isn't public. Short of writing a new one for the
purpose of a test case I don't know what else to do. Do you have a
suggestion?

Thanks,
Javier

Hi Javier,

The issue with a test case is that it will depend on the target to expose
it. Like you noted in the check-in comment x86 doesn't expose the bug. The
target I'm working on isn't public. Short of writing a new one for the
purpose of a test case I don't know what else to do. Do you have a
suggestion?

in that case I guess we will have to live without a testcase :slight_smile:
Also, I noticed a subtle bug. I added a comment about it in commit
90564. Does this impact your target? It's not clear how best to
fix it (it is easy to fix inefficiently).

Ciao,

Duncan.

PS: For a small optimization, in the case where Amt is bigger than 32
(or whatever NVTBits is) you might want to use an "and" to mask off
the top bits of Amt rather than subtracting 32 (if Amt is 64 or greater
then the result of the shift was undefined anyway, so it is ok to mask
off all the upper bits).

Hi Duncan,

I don't know if the optimization would help us much. Our architecture
performs integer shifts and subtractions at the same speed. I think the
optimization is limited to the case where NVBits is a power of two. That is
the case for all current value types but there's a separate thread where
someone is proposing adding i20 as a native type.

In your previous email you mentioned some comments added to check-in 90564.
In our architecture shifting by 0 doesn't cause any problems but the
comment might be valid for others.

Thanks,
Javier

Duncan Sands wrote:

Hi Javier,

The issue with a test case is that it will depend on the target to expose
it. Like you noted in the check-in comment x86 doesn't expose the bug. The
target I'm working on isn't public. Short of writing a new one for the
purpose of a test case I don't know what else to do. Do you have a
suggestion?

in that case I guess we will have to live without a testcase :slight_smile:

It would be fantastic if someone could take the lead and create our first codegen or target unittest. You could write a small C++ program under unittests/ which #include "../../lib/Target/whatever" as needed to call directly into the code with the bug.

Nick

Hi Javier,

I don't know if the optimization would help us much. Our architecture
performs integer shifts and subtractions at the same speed. I think the
optimization is limited to the case where NVBits is a power of two.

it is irrelevant whether it is a power of two or not. Anyway, since you
are the only user, and this is not helpful for you, I guess we can forget
it :slight_smile:

In your previous email you mentioned some comments added to check-in 90564.
In our architecture shifting by 0 doesn't cause any problems but the
comment might be valid for others.

Actually it's the shifting by 32 (if NVBits is 32) which is a problem.
If Amt is zero then a shift by 32 is generated at commented line.

Ciao,

Duncan.

Hi Duncan,

Maybe I'm beating a dead horse but I wanted to see if understood correctly
your optimization to avoid doing a subtraction.

The way I understand it the idea is to mask out the high bits of Amt which
in some cases is equivalent to subtracting a value from Amt. The cases
where this holds is when the value subtracted from Amt is a power of two.
This value is half the original register size. For 64-bit integers 32 is
subtracted from Amt which is equivalent to masking out the top 59 bits
leaving the bottom 5. For a 40-bit operation decomposed using two 20-bit
registers I don't know if a mask that can be ANDed to subtract 20 from Amt.

It's no biggie. I'm just curious as to why you thought that the register
size didn't matter. I's always good to learn something new :slight_smile:

Thanks,
Javier

Hi Javier,

Maybe I'm beating a dead horse but I wanted to see if understood correctly
your optimization to avoid doing a subtraction.

I was just wrong to think it didn't require a power of two size. I blame
it on a lack of coffee :slight_smile:

Ciao,

Duncan.