Signless "semantics"; `IntegerAttr` vs. `SignlessIntegerAttr`

cue Seinfled voice: what’s the deal with how IntegerAttr walks/talks/works:

// mlir/lib/IR/BuiltinAttributes.cpp:352
int64_t IntegerAttr::getInt() const

// mlir/include/mlir/IR/Builders.h:126
IntegerAttr getI8IntegerAttr(int8_t value);
IntegerAttr getI16IntegerAttr(int16_t value);
IntegerAttr getI32IntegerAttr(int32_t value);
IntegerAttr getI64IntegerAttr(int64_t value);

vs. what SignlessIntegerAttr codifies

// mlir/include/mlir/IR/CommonAttrConstraints.td:240
class TypedSignlessIntegerAttrBase<I attrValType, string retType, string descr>
    : SignlessIntegerAttrBase<attrValType, descr> {
  let returnType = retType;
  let convertFromStorage = "$_self.getValue().getZExtValue()";
}

def I1Attr  : TypedSignlessIntegerAttrBase<
    I1,  "bool",     "1-bit signless integer attribute">;
def I8Attr  : TypedSignlessIntegerAttrBase<
    I8,  "uint8_t",  "8-bit signless integer attribute">;
def I16Attr : TypedSignlessIntegerAttrBase<
    I16, "uint16_t", "16-bit signless integer attribute">;
def I32Attr : TypedSignlessIntegerAttrBase<
    I32, "uint32_t", "32-bit signless integer attribute">;
def I64Attr : TypedSignlessIntegerAttrBase<
    I64, "uint64_t", "64-bit signless integer attribute">;

In particular I’m pointing out that the generated getters for signless attributes all return unsigned int types but IntegerAttr getters, even for signless attributes, take and return int types.

Is this intentional? or legacy and propagated forward to prevent BC? A little experiment: changing $_self.getValue().getZExtValue()$_self.getValue().getSExtValue() and I*Attr to return int*_t basically works fine i.e., surprisingly no in-tree test fails after only two minor changes.

Now there is this TODO for Builtin_IntegerAttr:

// mlir/include/mlir/IR/BuiltinAttributes.td:707
// TODO: Change callers to use getValue instead.
int64_t getInt() const;

which suggests that the getters on IntegerAttr are legacy but I’m not sure :person_shrugging:.

Code pointers:

BuiltinAttributes.cpp:352

CommonAttrConstraints.td:L240

BuiltinAttributes.td:L707

There are two problems: truncation, and unspecified signedness behavior.

We’ve previously talked about removing the corresponding getSInt() and getUInt() because they have the same truncation problem, but haven’t done that. The issue is that IntegerAttr has a bit of a mixed role: it needs to provide a universal /represention/ for various integer IR nodes, but the various clients often have external knowledge about what they mean.

For example, in a dialect that uses 32-bit integers only can use these methods safely, and having to work with an AP[S]Int directly would be a pain. I think it has worked out fine to have these, and just require any dialect that uses unknown width code to use getValue()/getAPSInt() methods.

That said, I think renaming all clients of getInt() → getSInt() would be strictly better, and we could resolve the TODO at the same time.

-Chris

I don’t think there is any practical effect here though, right? Changing the creation methods to take unsigned integers instead of signed ones wouldn’t change the clients. What is your concrete suggestion?

Well I’m not proposing changing the creation methods but changing the returns of the generated getters for all users of signless attribute constraints, i.e., all getters be made to return int64_t instead of uint64_t to more closely match what IntegerAttr suggestively represents (that signless integer attributes are built from ints).

That said, I think renaming all clients of getInt() → getSInt() would be strictly better, and we could resolve the TODO at the same time.

That’s a change in the other direction (making IntegerAttr agree more closely with the generated getters), not the one I would prefer1, but it’s a reasonable choice/approach for reconciliation.

1 Because I would prefer to continue being sloppy in my own project :slight_smile: and misusing signless.

1 Like

Oh ok, I see. Yes, I’d support moving to return unsigned types, I don’t see a downside to that. I’d still love to see getInt() go away, and have all callers replaced with getUInt or getSInt at their preference.

Just to be crystal clear: they’re already returning unsigned (uint16_t, uint32_t, uint64_t, etc) e.g., the I32Attr:$k for AMDGPU_MFMAOp:

uint32_t MFMAOp::getK() {
  auto attr = getKAttr();
  return attr.getValue().getZExtValue();
}

and I’m (weakly) advocating for changing them to return signed, i.e., int16_t, int32_t, int64_t, etc. And just underscoring - I’m not insisting or anything - it’s just an idea for how to reconcile the disparity.

1 Like

My two cents.

The way I understand the issue, the fundamental problem is that there are no signless integers in C++ so, whenever we need to access a signless value, we have to make a choice that is not safe.

To make things slightly more confusing, we have two classes, a tablegen one (TypedSignlessIntegerAttrBase) and the C++ equivalent (IntegerAttr) that seem to be making the opposite choice, which leads to inconsistencies in how signless cases are handled.

E.g.: class AllocLikeOp in MemRef Dialect has a parameter constraint in the form:

    ConfinedAttr<OptionalAttr<I64Attr>, [IntMinValue<0>]>

Because I64Attr will return unsigned types when accessing the attribute value, this will generate an “always false” error check. This is not common at all, but it shows up here and there. Not very concerned about it either because it provides extra documentation value at practically zero cost, but this is the sort of consequence that comes from hiding default signlessness handling.

A slightly more concerning implication comes from things like:

%0 = linalg.matmul(%a, %b : tensor<4x8xi8>, tensor<8x4xi8>) outs(%c : tensor<4x4xi32>) -> tensor<4x4xi32>

when we run linalg-generalize-named-op, we obtain:

%0 = linalg.generic {
            indexing_maps = [#map, #map1, #map2],
            iterator_types = ["parallel", "parallel", "reduction"]}
        ins(%a, %b : tensor<4x8xi8>, tensor<8x4xi8>)
        outs(%c : tensor<4x4xi32>) {
    ^bb0(%ai: i8, %bi: i8, %ci: i32):
      %1 = arith.extsi %ai : i8 to i32
      %2 = arith.extsi %bi : i8 to i32
      %3 = arith.muli %1, %2 : i32
      %4 = arith.addi %ci, %3 : i32
      linalg.yield %4 : i32
    } -> tensor<4x4xi32>

Making another implicit choice, in this case treating singless as singed. I don’t think there’s a solution to this because the underlying issue is a fundamental one, but the best second thing we can do is picking one option consistently.

And just to be clear, I understand that treating it one way or the other is context dependent and users should be allowed the choice to pick one or the other, I’m talking about the cases when we want to be sign agnostic. If the answer is “we should never be sign agnostic” then I’d argue that “then we don’t need signless integers”.

If anybody is wondering why we’ve been thinking (waaaay too much) about this, it’s because when assuming an I32Attr would return an int, we’re getting warnings about “narrowing” in many places, but not all places. It’s hard to know what’s the right thing to do when we seem to be choosing it randomly.