Hi all,

We are trying to implement your suggestions, but we got stuck.

To make this discussion more concrete, this is the IR we are trying to simplify:

```
#map = affine_map<(d0, d1)[s0, s1, s2] -> (d0 * s1 + s0 + d1 * s2)>
#set0 = affine_set<(d0, d1) : (d0 >= 0, -d0 + 3 >= 0, d1 - d0 >= 0, -d1 + 3 >= 0)>
#set1 = affine_set<(d0, d1) : (d0 >= 0, -d0 + 3 >= 0, d0 - d1 >= 0, -d1 + 3 >= 0)>
module {
func.func @test_structure(%arg0: memref<4x4xf32, #map>, %arg1: memref<4x4xf32, #map>, %arg2: memref<4x4xf32, #map>) -> memref<4x4xf32, #map> {
%cst = arith.constant 0.000000e+00 : f32
affine.for %arg3 = 0 to 4 {
affine.for %arg4 = 0 to 4 {
%0 = affine.if #set0(%arg3, %arg4) -> f32 {
%3 = affine.load %arg0[%arg3, %arg4] : memref<4x4xf32, #map>
affine.yield %3 : f32
} else {
affine.yield %cst : f32
}
%1 = affine.if #set1(%arg3, %arg4) -> f32 {
%3 = affine.load %arg1[%arg3, %arg4] : memref<4x4xf32, #map>
affine.yield %3 : f32
} else {
affine.yield %cst : f32
}
%2 = arith.addf %0, %1 : f32
affine.store %2, %arg2[%arg3, %arg4] : memref<4x4xf32, #map>
}
}
return %arg2 : memref<4x4xf32, #map>
}
}
```

It is a sum between a upper triangular and a lower triangular matrix. I expect the internal loop to be split in three:

- One for copying
`arg0`

into `arg2`

- One for copying
`arg1`

into `arg2`

- One for computing
`arg0`

+ `arg1`

and storing it into `arg2`

. This would be the loop along the diagonal.

My idea was to `intersect`

the loop iteration space with the set of each of the `if`

and then rewrite the loop. But this is where I have an issue. To be even more concrete, in my pass I do something like:

```
LogicalResult matchAndRewrite(AffineIfOp ifOp,
PatternRewriter &rewriter) const override {
FlatAffineValueConstraints cst;
// Step 1 - get the presburger set for the surrounding for loops
Operation *curOp = ifOp;
while (isa<AffineForOp>(curOp->getParentOp())){
AffineForOp forOp = dyn_cast<AffineForOp>(curOp->getParentOp());
cst.appendDimVar(forOp.getInductionVar());
curOp = forOp;
}
cst.addAffineForOpDomain(dyn_cast<AffineForOp>(curOp));
// Step 2
// newset = cst.intersect(ifOp->getSet());
// Step 3
// generate_loop_for(new_set)
}
```

My issue is in `Step3`

. How can I go from a presburger set (or any polyhedral object) to a loop? Can this be done in FPL (or in any other way in MLIR)?

Thanks,

Giuseppe

cc @JoeyYe @bondhugula @Groverkss