Failure converting `scf.yield` operation

I have implemented a custom dialect that is ultimately targeting LLVM. It contains a LLVM lowering pass that is a identical copy of the LLVMLoweringPass from the toy tutorial.

My dialect contains an operation to allocate memory and fill it with a user-defined value that is most often zero. The actual allocation lowers to memref.alloca and the overriding of the memory will be implemented by an scf.parallel. A minimal reproduction case is

func.func @main() -> i64 {
  %tensor = memref.alloca() : memref<2x2x2xf64>

  %lower = arith.constant 0 : index
  %upper = arith.constant 2 : index
  %step = arith.constant 1 : index
  scf.parallel (%arg0, %arg1, %arg2) = (%lower, %lower, %lower) to (%upper, %upper, %upper) step (%step, %step, %step) {
    %zero = arith.constant 2.0 : f64 %zero, %tensor[%arg0, %arg1, %arg2] : memref<2x2x2xf64>

  %i1 = arith.constant 1 : index
  %i2 = arith.constant 0 : index
  %i3 = arith.constant 1 : index
  %element = memref.load %tensor[%i1, %i2, %i3] : memref<2x2x2xf64>

  %r = arith.fptosi %element : f64 to i64
  return %r : i64

I invoke my equivalent of mlir-opt and instruct it to lower the code above to LLVM. The added passes are

  RewritePatternSet patterns(&getContext());
  mlir::index::populateIndexToLLVMConversionPatterns(typeConverter, patterns);
  mlir::arith::populateArithToLLVMConversionPatterns(typeConverter, patterns);
  mlir::populateMemRefToLLVMConversionPatterns(typeConverter, patterns);
  mlir::populateFuncToLLVMConversionPatterns(typeConverter, patterns);

Unfortunately, the conversion of the scf.parallel fails. On the contrary, mlir-opt --convert-scf-to-cf repeat.mlir manages to replace the scf.parallel operation.

To see what is going on, I added the -debug-only=dialect-conversion option to pin down the issues.

Legalizing operation : 'scf.parallel'(0x7fb74270abf0) {
  * Fold {
  } -> FAILURE : unable to fold

  * Pattern : 'scf.parallel -> ()' {
    ** Insert  : 'scf.yield'(0x7fb743305990)
    ** Insert  : 'scf.for'(0x7fb743305ee0)
    ** Insert  : 'scf.yield'(0x7fb743306000)
    ** Insert  : 'scf.for'(0x7fb743306050)
    ** Insert  : 'scf.yield'(0x7fb7433061a0)
    ** Insert  : 'scf.for'(0x7fb7433061f0)
    ** Erase   : 'scf.yield'(0x7fb74270a980)
    ** Replace Argument : '<block argument> of type 'index' at index: 0'(in region of 'scf.parallel'(0x7fb74270abf0)
    ** Replace Argument : '<block argument> of type 'index' at index: 1'(in region of 'scf.parallel'(0x7fb74270abf0)
    ** Replace Argument : '<block argument> of type 'index' at index: 2'(in region of 'scf.parallel'(0x7fb74270abf0)
    ** Replace : 'scf.parallel'(0x7fb74270abf0)

    Legalizing operation : 'scf.yield'(0x7fb743305990) {
      "scf.yield"() : () -> ()

      * Fold {
      } -> FAILURE : unable to fold
    } -> FAILURE : no matched legalization pattern
  } -> FAILURE : failed to legalize generated operation 'scf.yield'(0x00007FB743305990)
} -> FAILURE : pattern failed to match
} -> FAILURE : no matched legalization pattern

It seems that it fails to legalize scf.yield. But does this even mean? How can such a conversion fail with the conversion patterns I mentioned above? And why is mlir-opt able to lower successfully?

I think that the issue here is that lowering needs to be done in multiple steps because scf.yield does not have conversion pattern: it is handled by the conversion of the parent op instead.

Alternatively, it is possible that just declaring scf.yield as legal would do the trick here though?

I see… But the parent op of scf.yield is scf.parallel in my case and it is the operation that is currently converted :face_with_monocle:

Nevertheless, I will try to declare scf.yield as legal and give feedback if it works. However, this approach would be very counterintuitive. The LLVMConversionPass should be a full conversion pass that allows only ops from the llvm dialect in the end.

Furthermore, even if it works, why does the lowering succeed in the tutorial?

Yes, but it is converted to a scf.for, which itself requires a yield as well.

Yes, but the input to this conversion can’t be just “anything”, in particular you are assembling patterns of conversion including populateSCFToControlFlowConversionPatterns.
This why I mentioned the need to it “by steps” instead.

Okay, got it. Even though I’m still not sure why it works in the tutorial though.

I tried adding the scf.yield operation as legal to the pass, but the error message remains the same.

What do you mean by “in steps” exactly? Should the LLVM pass be split into multiple passes?

Are there scf.parallel operations in the tutorial?

Ah, you’re completely correct. In the tutorial, the custom toy.print operation is lowered to “only” scf.for. I will check if the lowering works when only scf.for is used instead of scf.parallel.

If you’ve already got the scf.parallel operation, why do you even need to convert that into scf.for. As @mehdi_amini said, you finally will convert it into cf dialect using the ConvertSCFToControlFLow Pass. So for me it looks like it’s not a concern at all :slight_smile:

Conversion infra works by applying a pattern to legalize the operation, and then attempting to legalize any operations created by that pattern “recursively”. scf.yield is created by the pattern and needs to be legalized, hence the initial failure. Could you post the output of -debug-only=dialect-conversion when scf.yield is declared as legal?

Because there is no direct conversion from scf.parallel to cf. Converting to scf.for first is the intended way. I could also argue that scf.parallelscf.for is a semantics-changing transformation, and not just a lowering (imagine a loop where iteration 1 writes a value to a memory location and iteration 0 spinlocks until that same location has the expected value; it works when iterations are executed in parallel and doesn’t when they are sequentialized). So it must be performed as a separate, explicit step.

I’d argue that this wouldn’t be a valid uses of scf.parallel though: I believe we should be allowed to execute the iteration with a single thread and in any order.

I meant you could do a first step to handle the scf.parallel operation in isolation:

  RewritePatternSet patterns(&getContext());
  // Configure conversion to lower out SCF operations.
  ConversionTarget target(getContext());
  if (failed(
          applyPartialConversion(getOperation(), target, std::move(patterns))))

Before running your conversion.

1 Like

Oh, I thought there is one working like this llvm-project/mlir/test/Conversion/SCFToControlFlow/convert-to-cfg.mlir at main · llvm/llvm-project · GitHub

The pass will perform the conversion, but it won’t be a direct conversion, as Alex mentioned it’ll be scf.parallel → scf.for -> .... You can figure it out if you run the pass with -debug-only=dialect-conversion.
The way this pass operates is by marking a specific set of op illegal, which does not make scf.yield illegal.

I agree about the order, but I’ve seen downstreams (incorrectly) interpreting parallel as “must be concurrent”.

Nope, it does create an scf.for first, llvm-project/mlir/lib/Conversion/SCFToControlFlow/SCFToControlFlow.cpp at c5ed93f975830b4ed52f1899bfc9d8c89bf81c38 · llvm/llvm-project · GitHub. The two patterns compose to produce the result seen in the test.

Oh I see! I didn’t quite look deep into the ConversionPattern.