Bugs in --convert-std-to-llvm

The first bug is related to atan2. The following code is rejected by mlir-opt --convert-std-to-llvm:

func @f()->(f32) {
  %r = constant 1.0 : f32 
  %a = atan2 %r, %r : f32
  return %a:f32
}

The error is like:

> mlir-opt --convert-std-to-llvm complex.mlir
complex.mlir:51:8: error: 'std.atan2' op requires the same type for all operands and results
  %a = atan2 %r, %r : f32
       ^
complex.mlir:51:8: note: see current operation: %1 = "std.atan2"(%0, %0) : (!llvm.float, !llvm.float) -> f32

The second bug goes deeper. The following function will have a similar typing error on mulf.

func @h(%size:index)->tensor<?xf32>{
  %pi = constant 3.14159265358979 : f32
  %o = dynamic_tensor_from_elements %size {
  ^bb0(%i : index):
    %itmp = std.index_cast %i:index to i32
    %if = std.sitofp %itmp:i32 to f32
    // %if = constant 0.0 : f32
    %x1 = mulf %pi, %if : f32
    std.yield %x1:f32
  } : tensor<?xf32>
  return %o:tensor<?xf32>
}

Even weirder, if you replace the definition of %if with the one that’s commented, you can see a more informative error message, which shows a half-lowered body of dynamic_tensor_from_elements.

Another example, which seems to indicate a broken symbol table. The error disappears if I remove dynamic_tensor_from_elements. I get no error on the tensors…

func @cosf(f32)->f32
func @h(%size:index,%x:f32)->tensor<?xf32>{
  %z = call @cosf(%x) : (f32)->f32
  %o = dynamic_tensor_from_elements %size {
  ^bb0(%i:index):
    std.yield %x:f32
  } : tensor<?xf32>
  return %o : tensor<?xf32>
}

The error here says @cosf is not a valid function.

It would seem that this last error is related to the incomplete bufferization. If a tensor subsists in the function, then I will get it. But if I compile with mlir-opt --std-bufferize --func-bufferize --convert-scf-to-std --convert-std-to-llvm it disappears. Pretty weird, however, that the symbol table is corrupted.

FYI: I have updated and recompiled llvm/mlir yesterday from the main branch.

Dumitru

BTW: I could put all these bugs in the same place. The problem is that I don’t have time right now to make full bug reports.

Conversion rejecting an op is not a bug. It rejects any op it is cannot handle.

In this particular case, std.constant could be converted but std.atan2 could not, since there is no corresponding LLVM operation. The conversion is set up as partial, so it does not immediately fail when an op fails to convert, but does eventually fail on type mismatch between converted and unconverted ops.

Ideally, we should be able to inject an llvm.mlir.cast before and after unconverted ops, but that had been only partially supported.

Ok, so for std.atan2 the problem is not the absence of a definition, but the totally misleading error message. I guess it’s the same for the second problem, because dynamic_tensor_from_elements does not have a lowering if it still uses tensors.

Dumitru

Conversion to llvm does not support tensors and does not intend to.

The error message is not that misleading if you understand how the conversion process works. A partial conversion applies patterns for ops that are not legal in the target state. If there are no illegal ops as a result, it succeeds. (Note that legality is tri-state). So this produces

%0 = llvm.mlir.constant(...) : !llvm.float
"std.atan2"(%0, %0) : (!llvm.float, !llvm.float) -> f32

The pass manager then runs the verifier, which detects the type mismatch between operands and results of atan2 and reports it.

The conversion does not know anything about the consumers of the operation it converted, and it does not have to because otherwise it won’t be able to handle unknown ops. There may be an op that would be perfectly valid in place of atan2. The verifiers are not necessarily local, so it’s dubious to run them on individual ops. In this particular case, we could consider inserting casts like I mentioned above.

This is similar to the behavior of any pass, based on the conversion infra or not, that produces invalid IR. The verifier in the pass manager will complain about the new state of the IR. Propositions on how to make error messages better in this general case are very welcome.

The @h function is not converted because it has an unconvertible type and contains unconvertible ops. So the call @cosf persists. The @cosf function is converted. The verifier complains that @cosf is not a valid standard function because call does not expect llvm functions. The symbol table is not corrupted.

I can understand you core MLIR dev perspective. From my perspective, which is that of a user, there’s however a usability problem. And, indeed, I don’t see an easy solution. One could be to systematically add to this sort of error message an indication of where it could come frome (either from mis-specification, or from incomplete lowerings).

In any case, I do appreciate the support of all people here. Thanks.

Well, given my background and principal contribution areas, I wouldn’t qualify myself as somebody insensitive to usability problems, including developer’s experience. So I am really supportive of any effort to improve it, provided a good justification and reasonable complexity trade-offs.

mlir-opt is not intended for the end users. It is a testing tool for compiler developers. There is an implicit expectation of the level of understanding that compiler developers have of the compiler’s inner workings. Yes, we can provide better hints, but it would be unreasonable to, for example, halve the compilation speed for this reason. Core MLIR does not have a frontend. If it had had one, it would have been the responsibility of the frontend to set up the pass pipeline so as to shield the user from IR validity concerns.

It can be also due to any IR manipulation that rendered the IR invalid. This manipulation is not necessarily a lowering, a legalization or even a pass. It can be as simple as atanop.setOperand(0, builder.create<ConstantOp>(loc, builder.getIndexType(), 42)). So the invalid operation could come from either construction/parsing or from an IR manipulation. This describes ~90% of the codebase (the remaining 10% being analyses that are not supposed to modify the IR anyway). Stating that an invalid operation is due to its construction or modification sounds like stating an obvious truth.

That being said, we can teach the pass manager to report which pass produced invalid IR, unless it can already be achieved with debug flags.

I have an idea on how to make the behavior of this pass less surprising, but it requires reconsidering what the standard dialect is, and that has been an extremely contentious topic during the last year.

1 Like

I’m curious why the conversion is set up as being “partial”. I seem to remember it being “full”? If it was “full”, then the diagnostic would be a lot better in this case.

For bufferization, we are very careful that when we do partial conversions, we set up the proper materializations so that the dialect conversion framework will never create invalid IR (it will insert appropriate materializations to bridge source and target types if any source types remain after conversion).

It has always been partial, introduced by Refactor DialectConversion to support different conversion modes. · llvm/llvm-project@2b9855b · GitHub a year and a half ago.

We (well, not everybody, to be honest) have always wanted to be more progressive in lowerings, in a sense that we could call linalg-to-llvm, then vector-to-llvm, then std-to-llvm without having invalid IR in between, instead of calling linalg-and-vector-and-some-scf-and-maybe-affine-to-std-cfg-then-llvm that is currently called linalg-to-llvm. The cast injection mechanism I described above, together with partial conversions and cast canonicalizations, is the way to achieve this. They are work in progress though, see e.g. https://github.com/llvm/llvm-project/blob/master/mlir/lib/Transforms/Utils/DialectConversion.cpp#L2302. I reckon this is an extension of the mechanism bufferization uses.

We can fix the specific case of reporting an non-lowered atan2, or any standard op for that matter, by declaring the entire standard dialect illegal. This would make even the partial conversion fail because it would not be able to convert an illegal op. However, this wouldn’t address the equivalent of this case with an op from a dialect other than standard that has similar requirements.

Great that we are aligned like this! Sounds like the Nov 19 ODM where I’m presenting “Type conversions the not-so-hard way: MLIR’s new composable Bufferize passes” will have some good discussions :slight_smile: