Bug in visitSIGN_EXTEND in DAGCombiner.cpp?

visitSIGN_EXTEND() in DAGCombiner.cpp generates an ISD::SELECT even if VT is a vector, which causes ExpandSELECT() to assert during legalization.
I think what's required is to have visitSIGN_EXTEND generate a VSELECT if VT is a vector…

I've tried a local change that cures this particular assert, but uncovers another assert later, so I'm a bit uncertain if I'm heading off in the wrong direction.
The subsequent assert comes in ExpandVSELECT, which expects (for no good reason as far as I can tell) that the "mask" (really the 'selector') and the operands will be the same width, rather than trying to make them the same width (in our problem cases, the mask is an i32 produced from getSetCCResultType() and the ops are i8 or i16 - all vectors of the same length, of course.)

I don't really want to fix that, as the generated XOR/AND/AND/OR sequence isn't desirable at all for our target, which has an instruction that I think does precisely VSELECT…

Richard

Hi Richard,

visitSIGN_EXTEND() in DAGCombiner.cpp generates an ISD::SELECT even if VT is a vector, which causes ExpandSELECT() to assert during legalization.
I think what’s required is to have visitSIGN_EXTEND generate a VSELECT if VT is a vector…

ISD::SELECT should be used for cases where the selector is a scalar, even if the operands are vector. If you found a case where SELECT is used with a vector operand then this is a bug. We should probably add an assert in SelectionDAG::getNode().

I’ve tried a local change that cures this particular assert, but uncovers another assert later, so I’m a bit uncertain if I’m heading off in the wrong direction.

I think that replacing SELECT with VSELECT is the right thing to do.

The subsequent assert comes in ExpandVSELECT, which expects (for no good reason as far as I can tell) that the “mask” (really the ‘selector’) and the operands will be the same width, rather than trying to make them the same width (in our problem cases, the mask is an i32 produced from getSetCCResultType() and the ops are i8 or i16 - all vectors of the same length, of course.)

If I remember correctly the trick of expanding VSELECT into a sequence of XOR/AND/OR only works if the mask is known to be all-one or all-zero.

Hi Richard,

visitSIGN_EXTEND() in DAGCombiner.cpp generates an ISD::SELECT even if VT is a vector, which causes ExpandSELECT() to assert during legalization.
I think what’s required is to have visitSIGN_EXTEND generate a VSELECT if VT is a vector…

ISD::SELECT should be used for cases where the selector is a scalar, even if the operands are vector. If you found a case where SELECT is used with a vector operand then this is a bug.

I did… It originates from an icmp ne <2x i8>, zero initializer followed by a sext of the result 2x i1 to 2x i8. When we visit the SIGN_EXTEND, we generate the ISD::SELECT even though the selector and both operands are vectors.

We should probably add an assert in SelectionDAG::getNode().

I’ve tried a local change that cures this particular assert, but uncovers another assert later, so I’m a bit uncertain if I’m heading off in the wrong direction.

I think that replacing SELECT with VSELECT is the right thing to do.

VT.isVector() ? ISD::VSELECT : ISD::SELECT in lieu of the fixed ISD::SELECT seems to do the trick for me. Except that it sometimes hits the assert in ExpandVSELECT…

The subsequent assert comes in ExpandVSELECT, which expects (for no good reason as far as I can tell) that the “mask” (really the ‘selector’) and the operands will be the same width, rather than trying to make them the same width (in our problem cases, the mask is an i32 produced from getSetCCResultType() and the ops are i8 or i16 - all vectors of the same length, of course.)

If I remember correctly the trick of expanding VSELECT into a sequence of XOR/AND/OR only works if the mask is known to be all-one or all-zero.

Well, that’s not a problem since the mask was generated from a sign extend from an i1… :wink: The problem comes about because the i32 mask and i8 or i16 operands are different widths and expandVSELECT is not equipped to deal with that case.

I don’t really want to fix that, as the generated XOR/AND/AND/OR sequence isn’t desirable at all for our target, which has an instruction that I think does precisely VSELECT…

It turns out that for i8/i16 types, we’d have to promote them to i32 types to take advantage of our instruction… so the sequence isn’t as bad as I thought at first.

Hi Richard,

I did… It originates from an icmp ne <2x i8>, zero initializer followed by a sext of the result 2x i1 to 2x i8. When we visit the SIGN_EXTEND, we generate the ISD::SELECT even though the selector and both operands are vectors.

It sounds like a bug in the dag combine optimization. If you send me the line number I will take a look.

We should probably add an assert in SelectionDAG::getNode().

I’ve tried a local change that cures this particular assert, but uncovers another assert later, so I’m a bit uncertain if I’m heading off in the wrong direction.

Adding assertions to getNode is the right thing to do. I am not sure that it will solve your problem, but it will help us some some problems.

I think that replacing SELECT with VSELECT is the right thing to do.

VT.isVector() ? ISD::VSELECT : ISD::SELECT in lieu of the fixed ISD::SELECT seems to do the trick for me. Except that it sometimes hits the assert in ExpandVSELECT…

Okay. Which assertion in ExpandVSELECT fail ? Maybe our assumptions there are incorrect.

The subsequent assert comes in ExpandVSELECT, which expects (for no good reason as far as I can tell) that the “mask” (really the ‘selector’) and the operands will be the same width, rather than trying to make them the same width (in our problem cases, the mask is an i32 produced from getSetCCResultType() and the ops are i8 or i16 - all vectors of the same length, of course.)

If I remember correctly the trick of expanding VSELECT into a sequence of XOR/AND/OR only works if the mask is known to be all-one or all-zero.

Well, that’s not a problem since the mask was generated from a sign extend from an i1… :wink: The problem comes about because the i32 mask and i8 or i16 operands are different widths and expandVSELECT is not equipped to deal with that case.

Can you write down the input SDNode ? What types are inputs ?

Thanks,
Nadav

Hi Richard,

I did… It originates from an icmp ne <2x i8>, zero initializer followed by a sext of the result 2x i1 to 2x i8. When we visit the SIGN_EXTEND, we generate the ISD::SELECT even though the selector and both operands are vectors.

It sounds like a bug in the dag combine optimization. If you send me the line number I will take a look.

Line 4501 in trunk DAGCombiner.cpp… I changed the ISD::SELECT to the VT.isVector() ? ISD::VSELECT : ISD::SELECT…

We should probably add an assert in SelectionDAG::getNode().

I’ve tried a local change that cures this particular assert, but uncovers another assert later, so I’m a bit uncertain if I’m heading off in the wrong direction.

Adding assertions to getNode is the right thing to do. I am not sure that it will solve your problem, but it will help us some some problems.

I doubt adding an assertion will “solve” a problem. :wink: But I agree it is the right thing to do and it will help detect the problem sooner.

I think that replacing SELECT with VSELECT is the right thing to do.

VT.isVector() ? ISD::VSELECT : ISD::SELECT in lieu of the fixed ISD::SELECT seems to do the trick for me. Except that it sometimes hits the assert in ExpandVSELECT…

Okay. Which assertion in ExpandVSELECT fail ? Maybe our assumptions there are incorrect.

The assert at line 676 of LegalizeVectorOps.cpp… the Mask (operand 0) type is an i32, but Op1 is an i8.

The subsequent assert comes in ExpandVSELECT, which expects (for no good reason as far as I can tell) that the “mask” (really the ‘selector’) and the operands will be the same width, rather than trying to make them the same width (in our problem cases, the mask is an i32 produced from getSetCCResultType() and the ops are i8 or i16 - all vectors of the same length, of course.)

If I remember correctly the trick of expanding VSELECT into a sequence of XOR/AND/OR only works if the mask is known to be all-one or all-zero.

Well, that’s not a problem since the mask was generated from a sign extend from an i1… :wink: The problem comes about because the i32 mask and i8 or i16 operands are different widths and expandVSELECT is not equipped to deal with that case.

Can you write down the input SDNode ? What types are inputs ?

0x107046d10: v2i8 = vselect 0x107046c10, 0x107046b10, 0x107045e10 [ID=-3]

0x107046c10: v2i32 = setcc 0x107045c10, 0x107045e10, 0x107045f10 [ID=-3]

0x107046b10: v2i8 = BUILD_VECTOR 0x107046a10, 0x107046a10 [ID=-3]

0x107046a10: i8 = Constant<-1> [ID=-3]

0x107045e10: v2i8 = BUILD_VECTOR 0x107045d10, 0x107045d10 [ORD=10] [ID=-3]

0x107045d10: i8 = Constant<0> [ORD=10] [ID=-3]

Richard

Line 4501 in trunk DAGCombiner.cpp… I changed the ISD::SELECT to the VT.isVector() ? ISD::VSELECT : ISD::SELECT…

Thanks. From the commit message I think that we should only run this optimization on scalars.

Can you write down the input SDNode ? What types are inputs ?

0x107046d10: v2i8 = vselect 0x107046c10, 0x107046b10, 0x107045e10 [ID=-3]

0x107046c10: v2i32 = setcc 0x107045c10, 0x107045e10, 0x107045f10 [ID=-3]
0x107046b10: v2i8 = BUILD_VECTOR 0x107046a10, 0x107046a10 [ID=-3]
0x107046a10: i8 = Constant<-1> [ID=-3]
0x107045e10: v2i8 = BUILD_VECTOR 0x107045d10, 0x107045d10 [ORD=10] [ID=-3]
0x107045d10: i8 = Constant<0> [ORD=10] [ID=-3]

It looks like a type-legalization problem. Is v2i8 a legal type on your architecture ? In any case, we can’t XOR the mask with the operands because they are not the same width. You can custom-lower VSELECTS to work around this problem.

Thanks,
Nadav

Line 4501 in trunk DAGCombiner.cpp… I changed the ISD::SELECT to the VT.isVector() ? ISD::VSELECT : ISD::SELECT…

Thanks. From the commit message I think that we should only run this optimization on scalars.

I

Can you write down the input SDNode ? What types are inputs ?

0x107046d10: v2i8 = vselect 0x107046c10, 0x107046b10, 0x107045e10 [ID=-3]

0x107046c10: v2i32 = setcc 0x107045c10, 0x107045e10, 0x107045f10 [ID=-3]
0x107046b10: v2i8 = BUILD_VECTOR 0x107046a10, 0x107046a10 [ID=-3]
0x107046a10: i8 = Constant<-1> [ID=-3]
0x107045e10: v2i8 = BUILD_VECTOR 0x107045d10, 0x107045d10 [ORD=10] [ID=-3]
0x107045d10: i8 = Constant<0> [ORD=10] [ID=-3]

It looks like a type-legalization problem. Is v2i8 a legal type on your architecture ?

Sort of…
v2 certainly is. But we have no actual i8 (or i16) registers and I think our current description (via .td files) says we do. We also don’t have an i8 or i16 and/or/xor… but again I think we say we do.
I tweaked ExpandVSELECT() to generate a TRUNCATE if the mask has more bits than the operands… That works for me.

But then I also had to tweak SplitRes_SELECT() in LegalizeTypesGeneric.cpp to avoid the assert there that tests that the SELECT has a MVT::i1 condition. I use the Cond.getValueType().getVectorElementType() to generate the split vector type instead of using a hard-coded MVT::i1. (This is necessary for v8 and v16 types, since our target only natively supports v4 types.)

With the whole suite of 3 minor (? :wink: changes, I’m able to get through my test suite (a large subset of the OpenCL conformance suite) with my target (AMDIL), a target that worked with LLVM 3.1…

In any case, we can’t XOR the mask with the operands because they are not the same width. You can custom-lower VSELECTS to work around this problem.

Thanks. I was hoping to avoid that… :wink:

I have tried making just that change… it definitely helps quite a bit. I submitted the simple patch to llvm-commits for approval.

It doesn’t quite clear up ALL the asserts I see in my test runs… I’m still getting the assert in ExpandVSELECT in a few cases, but I suppose the custom lowering is the “easy” way to solve those, though I’m sure the TRUNCATE would do the trick as well.

Thanks for your help.

Richard