Extending `tileConsumerAndFuseProducer` to handle more patterns

To get the above tiling work with transform.structured.fuse, I had to make the following modifications to tileConsumerAndFuseProducers:

--- a/mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp
+++ b/mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp
@@ -1045,7 +1045,9 @@ mlir::scf::tileConsumerAndFuseProducersUsingSCF(
   };

   std::deque<tensor::ExtractSliceOp> candidates;
-  addCandidateSlices(tiledAndFusedOps.back(), candidates);
+  for (auto *op : tiledAndFusedOps)
+    addCandidateSlices(op, candidates);
+
   OpBuilder::InsertionGuard g(rewriter);
   while (!candidates.empty()) {
     // Traverse the slices in BFS fashion.
@@ -1087,7 +1089,8 @@ mlir::scf::tileConsumerAndFuseProducersUsingSCF(
             fusedResult->tiledAndFusedProducer.getDefiningOp()) {
       fusedProducers.insert(fusedResult->origProducer.getDefiningOp());
       tiledAndFusedOps.insert(tiledAndFusedOp);
-      addCandidateSlices(tiledAndFusedOp, candidates);
+      for (auto *op : fusedResult->tiledOps)
+        addCandidateSlices(op, candidates);
     }
   }

While this does not seem to cause any failures in the existing lit test-suite, @qed mentioned that it might be a potential foot-gun.

@MaheshRavishankar mentioned that we might need to extend SCFTilingResult to also contain a list of the tensor.extract_slices created by tiling.