Signedness of scalars built from APInt(s)

While adding support for 512-bit integers in `Scalar`, I figured I
could add some coverage.

TEST(ScalarTest, Signedness) {
auto s1 = Scalar(APInt(32, 12, false /* isSigned */));
auto s2 = Scalar(APInt(32, 12, true /* isSigned */ ));
ASSERT_EQ(s1.GetType(), Scalar::e_uint); // fails
ASSERT_EQ(s2.GetType(), Scalar::e_sint); // pass
}

The result of `s1.GetType()` is Scalar::e_sint.
This is because an APInt can't distinguish between "int patatino = 12"
and "uint patatino = 12".
The correct class in `llvm` to do that is `APSInt`.

I think there are at least a couple of possibilities to fix this:
1) Change the constructor in Scalar to always get an APSInt. This
would be fairly invasive but we could always query isSigned() to make
sure we instantiate the correct scalar type.
2) Change the implementation of Scalar(APInt v) to look at the sign
bit, and if that's set, treat the value as signed (and unsigned
otherwise). This will be definitely more correct than the current
implementation which always construct signed types from APInt(s), but
I'm not entirely sure of all the implications.

What do you think?

Please note that this is also broken in the case where you have
APInt(32 /* bitWidth */, -323);
because of the way the constructor is implemented.

If I understand the situation correctly I think we should do both. I’d start by doing (2) to improve the current behavior and add a constructor for APSInt. We can then audit the call sites and migrate to APSInt where it’s obvious that the type is signed. That should match the semantics of both classes?

I don’t think #2 is a correct change. Just because the sign bit is set doesn’t mean it’s signed. Is the 4-byte value 0x10000000 signed or unsigned? It’s a trick question, because there’s not enough information! If it was written “int x = 0x10000000” then it’s signed (and negative). If it was written “unsigned x = 0x10000000” then it’s unsigned (and positive). What about the 4-byte value 0x1? Still a trick! If it was written “int x = 1” then it’s signed (and positive), and if it was written “unsigned x = 1” then it’s unsigned (and positive).

My point is that signedness of the type does not necessarly imply signedness of the value, and vice versa.

APInt is purely a bit-representation and a size, there is no information whatsoever about whether type type is signed. It doesn’t make sense to say “is this APInt negative?” without additional information.

With APSInt, on the other hand, it does make sense to ask that question. If you have an APSInt where isSigned() is true, then you can use the sign bit to determine whether it’s negative. And if you have an APSInt where isSigned() is false, then the “sign bit” is not actually a sign bit at all, it is just an extra power of 2 for the unsigned value.

This is my understanding of the classes, someone correct me if I’m wrong.

IIUC though, the way to fix this is by using APSInt throughout the class, and delete all references to APInt.

I don’t think #2 is a correct change. Just because the sign bit is set doesn’t mean it’s signed. Is the 4-byte value 0x10000000 signed or unsigned? It’s a trick question, because there’s not enough information! If it was written “int x = 0x10000000” then it’s signed (and negative). If it was written “unsigned x = 0x10000000” then it’s unsigned (and positive). What about the 4-byte value 0x1? Still a trick! If it was written “int x = 1” then it’s signed (and positive), and if it was written “unsigned x = 1” then it’s unsigned (and positive).

My point is that signedness of the type does not necessarly imply signedness of the value, and vice versa.

APInt is purely a bit-representation and a size, there is no information whatsoever about whether type type is signed. It doesn’t make sense to say “is this APInt negative?” without additional information.

With APSInt, on the other hand, it does make sense to ask that question. If you have an APSInt where isSigned() is true, then you can use the sign bit to determine whether it’s negative. And if you have an APSInt where isSigned() is false, then the “sign bit” is not actually a sign bit at all, it is just an extra power of 2 for the unsigned value.

This is my understanding of the classes, someone correct me if I’m wrong.

IIUC though, the way to fix this is by using APSInt throughout the class, and delete all references to APInt.

I think we share the same understanding. If we know at every call site whether the type is signed or not then I totally agree, we should only use APSInt. The reason I propose doing (2) first is for the first scenario you described, where you don’t know. Turning it into an explicit APSInt is as bad as using an APInt and looking at the value. The latter has the advantage that it conveys that you don’t know, while the other may or may not be a lie.

Option 3? Add APSInt as a new member?

Current:

Scalar::Type m_type;
llvm::APInt m_integer;
llvm::APFloat m_float;
bool m_ieee_quad = false;

Option #3:

Scalar::Type m_type;
llvm::APInt m_uint;
llvm::APSInt m_sint;
llvm::APFloat m_float;
bool m_ieee_quad = false;

I don’t know enough about APInt and APSInt to know if one or the other can correctly be used for mixed operations.

Do we ever not know though? And if so, then why don’t we know whether the type is supposed to be signed or unsigned? Because guessing is always going to be wrong sometimes.

It seems like we have 3 uses of this constructor in LLDB.

IRInterpreter.cpp: Constructs a Scalar for an llvm::Constant.
IRForTarget.cpp: Constructs a Scalar for an llvm::Constant.
ClangASTContext.cpp: bitcasts an APFloat to an APInt.

The first two we should just treat constants in LLVM IR as signed, so we could construct an APSInt at the call-sites with signed=true.

The third seems like a bug, we should just have a Scalar constructor that takes an APFloat directly. We already have one that takes a float and it just sets the APFloat m_float member variable, I don’t know why we’re jumping through this bitcast hoop (which is probably wrong for negative floats anyway).

OK, I think there might be subtleties we're missing right now but the
best way is just try and see what happens.
I'll send patches early next week.