Explicitly signed / unsigned integers in std

Continuing the discussion from D76136 [1] and D76137 [2]

D72533 [3] introduced signed / unsigned integer types to MLIR with the goal to provide common infrastructure for dialects that wish to model represent signedness explicitly.

D76136 [1] and D76137 [2] started adding support for such types for certain ops of std, based on the misinterpretation of D72533 [3] as the introduction of signed / unsigned integer types in general. The MLIR Rationale explicitly states that integer types in std are signless [4].

However, for projects that wish to model signedness and that rely heavily on std, such as the Teckyl Tensor Expression Frontend [5], from which the patches originate, this means that they either need to re-implement a certain number of operations from std or they need to convert operands to signless integers and back. The former is quite redundant, while the latter probably requires some work on integer types, as there does not seem to be any infrastructure for the conversion between signless and signed / unsigned integers.

I understand the motivations to keep signless integers in std wherever possible, but I would like to open the discussion on the above-mentioned issues. Any hints on how to work around this are highly appreciated.



[1] https://reviews.llvm.org/D76136
[2] https://reviews.llvm.org/D76137
[3] https://reviews.llvm.org/D72533
[4] Rationale - MLIR
[5] GitHub - andidr/teckyl: An MLIR frontend for tensor expressions

1 Like

We ran into this problem too when trying to implement a dialect which translates into Rust code which distinguishes between signed and unsigned. The solutions we considered were:

  • Introduce a Signed-attribute to our operations which signifies that the values they produce are signed (not sure if this works).
  • Re-implement all StdOps and types to support signed values.
  • Include type casts between signed and unsigned in the generated Rust code.

I’m wondering, should this problem be addressed by user-defined dialects or the standard dialect, and what could be possible solutions for both cases?

I have implemented signed and unsigned types in my dialect as well and re-implemented all the standard ops on this dialect. I even did this before the signedness was added to the std dialect. In contrast to @segeljakt I put the signedness on the type, so I have myIR.Int<32u> and myIR.Int<32s> and custom operations on these types. Lowering to the standard dialect is a 1-to-1 replacement for e.g. myIR.add and a select on the signedness for myIR.div et al.

I have already considred switching to std integer types now since they have signs but what hinders me is that I do not have sign-aware std.div and std.cmp ops. Personally I do not like this design a lot since I can have e.g. divi operating on an unsigned integer (I admit that I have not checked if the verifier produces an error in that case).

Does it really bring an advantage to have a signless integer type in MLIR? This seems strange to me since I think that most frontends producing MLIR will have to know something about the sign. As soon as divisions, bit-extensions or comparisons are needed, the sign has to be chosen someway. One can only neglect the sign if a sub-set of integer operations is needed (maybe for indices if remi/remu is not needed).

Some ideas to float:

  • The signless integers in std could be removed altogheter. divi/divu should be merged into a sign-aware div (same with zero-extend/sign-extend, right-shift, remi/remu, the comparisons and possible others that I forgot now). Lowering these operations to LLVM selects the corresponding low-level op depending on the signedness of the input types.
  • A sign-cast op can be added to the std dialect which just converts between signed and unsigned types of the same bit-width.
  • If front-end really does not care about the sign, it could just decide to use always-signed or always-unsigned (it actually does not matter). This should be equivalent to using a signless integer, IMO.
  • If a frontend really needs signless integers, it can add them in a custom dialect (just the reverse what we are doing now). How to correctly lower e.g. divisions has to be decided per dialect then.

The important part for the standard dialect is not to make it easier to model your frontend language (you create your own dialect for this) but to make it easier to manipulate. Not having to handle all the possible combination of types and the associated cast (explicit or implicit) is important.
The MLIR Rationale has a section on this, including the history of LLVM on this topic.

@mehdi_amini Thanks for clarifying this a bit further.

My main issue is that using explicitly signed / unsigned types basically locks you out of std. Would it make sense to you to at least provide conversions to / from signless types? This would allow dialects to do whatever they want with explicitly signed / unsigned types, while still being able to take advantage of operations from std. And except the conversions, std could stay completely signless.

What would these conversion be exactly? If you can provide some motivating example it’d help.

There is a previous discussion on no-op casts like this.