Unsigned int displaying as negative

I’m curious why ‘unsigned short w = 0x8000’ is displayed as -32768 in the IR?

This propagates to the DAG and hence tablegen, where I am missing matching on some immediates because the immediate is not being looked at as unsigned? For example, we have no issue if instead of 0x8000 we use 0x7FFF, then everything is fine, the match is fine.

I can understand that it’s just being printed as ‘signed’ even when it’s unsigned due to the bit pattern (2s complement) but it seems to affect matching.

We’d like;

unsigned short x, y;
int main() {
unsigned short w = 0x8000;
y = w - x;
return 0;
}

To match to something like ‘sub16u $0x8000, x, y’ (if I set w = 0x7FFF, then we get sub16u $0x7FFF, x, y’ but not when using 0x8000).

We have some code to determine if the operation is a signed or unsigned operation in tablegen. Can anyone suggest a good way to get around this?

Thanks,
Ryan

LLVM IR integers have no sign. You can’t reliably tell whether an add or subtract was signed or unsigned when generating code.

Right, I understand that.

So why is 0x7FFF matching fine but not 0x8000 both fit in 16 bit?

Thanks.

Where do in-tree archs handle the difference in this example? The difference being printed either the signed or unsigned value.

Thanks.

Hi Ryan,

It is important to get clear about that LLVM IR integers (and operations if they don't have to) have no sign. But IR integers have to be printed somehow and it was decided to print them as being signed.

I'm not a SelectionDAG and tablegen expert really, but I'm sure it is the same in the code generator. Sometimes the signedness is important for an instruction because flags are affected. But I'm ignoring that for now, as they would be represented as llvm.*.with.overflow in the IR with explicit signedness.

In cases where flags don't matter, just select the best instruction. I'd advise against trying to reconstruct the signedness of an operation. That's impossible to do in general and there's no good reason to do that.

-Manuel

Thanks for your reply.

We are propagating sign info to tablegen currently using BinaryWithFlagsSDNode.Flags.hasNoSignedWrap atm.

I imagine (I have not looked) they are printed according to instruction in AsmPrinter.cpp (pure speculation).

I’m still confused as to why 0x7FFF is ok to match 16 bit int but not 0x8000?

Thanks.

It’s hard to say why your pattern isn’t matching without actually seeing the pattern. See imm0_4095 in ARMInstrThumb2.td for an example of the sort of pattern you want. -Eli

Thanks for your reply.

We are propagating sign info to tablegen currently using
BinaryWithFlagsSDNode.Flags.hasNoSignedWrap atm.

Note that this flag doesn't indicate signedness of the operation. It just means that the optimizer or code generator can assume that no signed overflow will happen during the operation. To get a better understanding of why this flag is not suitable for reconstructing the signedness of an operation (which is actually inherently signedness-agnostic), imagine an instruction that has both the NoSignedWrap and NoUnsignedWrap flags set. What would be the "signedness" of this instruction? This question doesn't have an answer, because adds don't have "signedness" when using two's complement.

I imagine (I have not looked) they are printed according to instruction in
AsmPrinter.cpp (pure speculation).

I'm not quite sure what you're referring to.

I'm still confused as to why 0x7FFF is ok to match 16 bit int but not
0x8000?

I can't answer this question without knowing how your patterns look like exactly, but possibly this happens specifically because you try to propagate sign info (which doesn't really work, as explained above).

I see. If I put simm16 and immSExt16x in place of uimm16 and immZExt16x respectively, the imm matches but it prints out -32768 (which is invalid for sub16u). We are using uimm16 not match unsigned but for PrintMethod, effectively uimm16 and simm16 are both Operand. I’m still unclear why simm16 matches and uimm16 does not. Here is the pattern if that helps at all.

So just as a reference:

def simm16 : Operand {
let DecoderMethod= “DecodeSimm16”;
let OperandType = “OPERAND_IMMEDIATE”;
}

def uimm16 : Operand {
let PrintMethod = “printUnsignedImm”;
let OperandType = “OPERAND_IMMEDIATE”;
}

def immSExt16x : ImmLeaf<i16, [{ return isInt<16>(Imm); }]>;

def immZExt16x : ImmLeaf<i16, [{ return isUInt<16>(Imm); }]>;

defm SUB16u_ : ABD_NonCommutative<“sub16u”, unsignedSub, LOADRegs, GPRRegs, DSTRegs, i16, i16, i16, simm16, immZExt16x>;

multiclass ABD_NonCommutative<string asmstr, SDPatternOperator OpNode, RegisterClass srcAReg, RegisterClass srcBReg,
RegisterClass dstReg, ValueType srcAType, ValueType srcBType, ValueType dstType,
Operand ImmOd, ImmLeaf imm_type>
{

def IMM_MEM_MEM : SetABDIn<asmstr, ImmOd, memhx, memhx,
[(directStore (dstType (OpNode imm_type:$srcA, (srcBType (load addr16:$srcB)))), addr16:$dstD)]>;

}

class SetABDIn<string asmstr, DAGOperand srcA, DAGOperand srcB, DAGOperand dstD, list pattern>
: A_B_D<(outs), (ins srcA:$srcA, srcB:$srcB, dstD:$dstD),
!strconcat(asmstr, “\t$srcA, $srcB, $dstD”), pattern, IIAlu>
{
let mayStore = 1;
let mayLoad = 1;
}

Sorry, it should be:

defm SUB16u_ : ABD_NonCommutative<“sub16u”, unsignedSub, LOADRegs, GPRRegs, DSTRegs, i16, i16, i16, uimm16, immZExt16x>;

Where does the unsignedSub come from?

"Imm" in ImmLeaf is an int64_t, sign-extended from your immediate type (in this case, int16_t). You'd need to insert an explicit cast to uint16_t to get the behavior you want.

I'm not sure why you're doing this, though; every 16-bit integer immediate fits into a 16-bit integer, so a correctly implemented "immZExt16x" is just equivalent to "imm".

-Eli

I believe that the idea of simm16 and uimm16 was taken over from MipsInstrInfo.td. It’s entirely possible that the concepts were misunderstood then.

So, a broader question, what is the best way to map down to an unsigned/signed sub/add (is this even possible)? How to add signed/unsigned immediates?

Thanks.

Can you please give me an example (C code) where the nsw does not exist for signed operation?

Thanks.

Or some C code where nsw would show up in an unsigned operation?

For every example we’ve seen, nsw is very consistent with signed operation. I understand it’s a ‘potential for signed overflow’ flag but it seems very consistent with signedness.

If you compile with -frwapv that's what you get. Earlier passes could
also quite easily add or remove those flags for whatever reason.

Tim.

If you want to see what happens, the transforms/* tests have examples where nsw disappears.

I still think this is the basic problem. Unless you're on a really
weird architecture it really doesn't matter whether an
originally-written operation was signed or unsigned (with the
already-represented exception of sdiv/udiv). It's probably best to
take a step back and reassess this rather than ploughing on trying to
preserve long-departed information.

Tim.

Tim, yes, I am on a very unique architecture, just about every instruction has a signed and unsigned operation (ie, adds, addu, subs, subu, etc…) and we handle signed and unsigned somewhat differently.

I’m not sure how we’ll handle this yet, very worst case scenario is to propagate the info from clang but that’s not ideal, obviously.

Thanks for all the replies!