LLVM dialect constant on unsigned/signed ints

I noticed that LLVM dialect allows the following constant declaration:

%1 = llvm.mlir.constant(42 : si32) : !llvm.i32
%2 = llvm.mlir.constant(42 : ui32) : !llvm.i32

Since LLVM doesn’t make a distinction between signed and unsigned types, I am wondering if having signed/unsigned integer in attribute is the intended behaviour?

Following the post by @bondhugula on unsigned int constantOp in standard dialect, I assume that is the issue with the verifier?
This bug propagates from standard dialect as well: being unnoticed in standard its absolutely fine to do this:

$ cat bin/std.mlir

func @test() {
	%2 = constant 42 : si32
	%3 = constant 42 : ui32
	return
}


$ bin/mlir-opt bin/std.mlir -convert-std-to-llvm

module {
  llvm.func @test() {
    %1 = llvm.mlir.constant(42 : si32) : !llvm.i32
    %2 = llvm.mlir.constant(42 : ui32) : !llvm.i32
    llvm.return
  }
}

This seems like a bug, the verifier for llvm.mlir.constant should require a signless attribute.

There are very few checks in the LLVM dialect verifiers. The initial thinking was that LLVM itself would complain if one tries to construct wrong IR, and we did not want to replicate LLVM’s work. You can, for example, roundtrip llvm.mlir.constant("foo") : !llvm.i32. The translation to LLVM IR will likely complain about that, but not the verifier.

Thanks! However, I am not really sure I know how the roundtrip can complain about these cases? As far as I understand, MLIR constants are not translated to actual LLVM?
For example I have this

$ cat bin/llvm.mlir

module {
  llvm.func @const() {
    %0 = llvm.mlir.constant(0 : i32) : !llvm.i32
    %1 = llvm.mlir.constant("foo") : !llvm.i32
    llvm.return
  }
}


$ bin/mlir-translate -mlir-to-llvmir bin/llvm.mlir

; ModuleID = 'LLVMDialectModule'
source_filename = "LLVMDialectModule"

declare i8* @malloc(i64)

declare void @free(i8*)

define void @const() !dbg !3 {
  ret void, !dbg !7
}

!llvm.dbg.cu = !{!0}
!llvm.module.flags = !{!2}

!0 = distinct !DICompileUnit(language: DW_LANG_C, file: !1, producer: "mlir", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug)
!1 = !DIFile(filename: "LLVMDialectModule", directory: "/")
!2 = !{i32 2, !"Debug Info Version", i32 3}
!3 = distinct !DISubprogram(name: "const", linkageName: "const", scope: null, file: !4, line: 2, type: !5, scopeLine: 2, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0, retainedNodes: !6)
!4 = !DIFile(filename: "...", directory: "...")
!5 = !DISubroutineType(types: !6)
!6 = !{}
!7 = !DILocation(line: 5, column: 6, scope: !8)
!8 = !DILexicalBlockFile(scope: !3, file: !4, discriminator: 0)

This seems like a case where we may want to revise the original thinking? Since MLIR’s integer types are tri-state (signed, unsigned and signless) and LLVM can only map signless (by design), I feel that I more expect the verifier to complain vs the converter. This is different than letting verification be open for types wholly unknown to the LLVM dialect, imo, where making legality be defined in terms of convertibility replicates fewer checks.

In either case, wherever the check should happen, we need to make sure it does.

Yeah, I agree - early verification is better than last verification for a lot of reasons. I interpret Alex’s comment as saying “this has never been a priority so it hasn’t been done” – @ftynse I assume you’d be ok with others improving verification here over time?

By roundtrip, we usually understand mlir-opt filename.mlir that doesn’t do any transformation. Just parsing, verification and printing. The verifier should complain about invalid IR.

In LLVM IR, the constants are a separate class of values that are created and uniqued in the context (unlike MLIR, where they are just values created by a special operation). They won’t be printed unless there is an operation that uses them, and unused constants are essentially dropped if you print and re-parse the module. If you return the constant from the function, you’ll get

define i32 @const() !dbg !3 {
  ret [3 x i8] c"foo", !dbg !7
}

which LLVM (quite surprisingly for me) allows to be constructed, but which is rejected by LLVM’s opt with value doesn't match function result type 'i32.

Indeed, I’m providing the context for why we are were we are today. I am very happy if somebody can work on this! The first-party LLVM types RFC is actually a step in that direction, I expect most operations will have their type constraints encoded in ODS, which is easier to change. (Well, to be honest, this can be done even with the existing type system, but will be more brittle because of locking, in case one compares types by creating a new type, and verbose). Constants are a special case, I don’t think they will interfere much with the proposed type change.

Interestingly, we have the exact opposite of this. Types outside of the LLVM dialect will fail the verifier on any op in the dialect.

This is actually also an issue with the std.constant verifier which isn’t supposed to allow ui32. This bug is open: https://bugs.llvm.org/show_bug.cgi?id=46222

In addition, I also posted the following on the other thread:

"For TF models coming into xla_hlo, it’s a common pattern in nearly every model to obtain the dimension size of a tensor as an unsigned int and then use that as a divisor later. The divisor folds to an unsigned int constant. When lowering to std , one could use diviu , but there has to be a zero extension to i64 which is available for an i32 but not ui32 - so, are we missing a cast from ui32 to i32 and it’s not clear to me where it should live."