Should linalg.indexed_generic allow for affine operations on its body?

Fair enough, looked deeper into the doc of the trait, and this makes sense. Is it accurate to say an AffineScope is roughly the inverse/complement of the traditional polyhedral SCoP (static control part, section 2.3.1)?

I am unclear why this needs to leak out of Affine though.

From what I see, you seem to prefer to define the unit of applicability of affine transformations in a subtractive fashion rather than additive (traditionally used with polyhedral SCoPs). I imagine there are good reasons for going contrary to established terminology. I imagine there could be some advantage when mixing “static-control parts” and non-constrained parts: do you have a motivating example?

More concretely, is there something that would be more difficult if we used an additive process?
affine.for, affine.if and other future ops would have a trait “X”. Consecutive nested regions with “X” would form bands of “X” that form the unit of affine applicability. I am calling this trait “X” and not “static control part” because I am assuming there are good reasons to not use standard polyhedral terminology (SCoP).

With the above, region-carrying ops in other dialects would not have to worry about declaring to the world that it is in fact ok to hold an affine op. AFAICT only FuncOp and shape.function_library have AffineScope. This seems to suggest every new op with a region in every dialect should have AffineScope, which does not sound desirable from a SW-design perspective (i.e. what prevents this from blowing up with more traits and more ops).

The current design seems to break every notion of separation of concerns…

Let’s not define simpler things in terms of vastly more complex things! :slight_smile: AffineScope is just what its doc says: it defines a new “top-level” for MLIR symbols.

Well I am trying not to redefine the concept here, I am asking how AffineScope relates to polyhedral SCoP (if at all). I don’t think it’s too far fetched for people versed in the polyhedral literature to see AffineScope and think of a polyhedral SCoP. They would be quite mistaken indeed, since these seems to be “inverse” concepts :slight_smile: .

I am unclear why this needs to leak out of Affine though.

I didn’t understand. If it makes sense for an op, it can use the trait. The trait is in turn used/looked at by affine/polyhedral utilities - consider other traits or operation interfaces as an analogy. Please see further below for an example.

Again, I can’t tell what you mean by “additive” and “subtractive” here.

I can only guess what “additive” is. In general, it’s convenient from a compiler dev experience standpoint if you can see in the IR what the units of optimization are and no in-memory inferences are being made.

But that’s how MLIR’s op traits work - AutomaticAllocationScope, IsolatedFromAbove, etc. AffineScope doesn’t say whether it’s okay to hold an affine op or not, nor does it restrict what you can hold. It creates a new scope for the purpose of definition of symbols for affine or other purposes. It allows the use of SSA values as valid symbols; as a result of that, you can have more affine ops instead of unrestricted forms of those ops which also and by design exist since that’s how the affine ops would have to be lowered.

To provide an analogy, if you use the IsolatedFromAbove trait, you can’t refer to SSA symbols from the “top outside” of the op. So, the trait would prevent the use of operands that are defined outside in inside ops just like AffineScope would allow the use of values defined at its top level as symbolic operands. In both cases, a violation of the trait would lead to verification failures.

Right, that’s why I meant by “reverse the trait for affine operations”.
I.e. we could say that all regions are AffineScope unless a trait is on the operation (and the trait would be set for adding.for, affine.if, etc.). This would have the merit of having affine ops work in all regions by default I think.
(and I think I’m saying the same thing as Nicolas here, when he mentions “additive”/“subtractive”)

I now see what you (and Nicolas also perhaps) meant! Note that “reverse the trait for affine operations” really didn’t convey the same to me because there is no trait currently attached to affine operations to reverse! Instead, what you meant was to use the reverse of the AffineScope trait (perhaps call it “ExtendsAffineScope” for expanding or extending an existing affine scope) for affine operations and so the remaining region-holding operations would by default have the AffineScope trait. This definitely makes sense to me.

The original thinking was that we would just go ahead and add the AffineScope trait to the few region holding operations where it makes to hold affine dialect operations (sort of as needed). Because if something isn’t depending on the affine dialect (say the TF or HLO region holding ops) having that property is good as a dead trait. But we can negate this trait for sure - the names I can think of are “ExtendsAffineScope” or “ExpandsAffineScope”, i.e., extending/continuing an existing affine scope that started “above” (in one of the parents) and continued up until.

Yes, thanks for bringing this one up. Assuming 1 single region to simplify the discussion, I think what I find counterintuitive is that:

  • SSA values defined outside can be used within the region unless it has the IsolatedFromAbove trait.
  • SSA values defined outside cannot be used as symbols in affine.for/affine.if/affine.load/affine.store within the region unless it has the AffineScope trait.

… Picking this up after a few days, I can now make this more concise given @bondhugula’s last answer.

The additive / subtractive discussion relates to the determination of what is the unit of polyhedral / affine analyses and transformations:

  • SCoPs have additive semantics: roughly, (contiguous) unions of SCoPs result in bigger SCoPs and more global optimization opportunities.
  • the current AffineScope trait intuitively feels “subtractive”:
The polyhedral scope defined by an operation
with this trait includes all operations in its region excluding operations that
are nested inside of other operations that themselves have this trait.

In the current implementation, what could/should be an analysis on a certain number of affine dialect ops with traits is instead specified as a verified IR constraint.

Where I see opportunities to relax traditional polyhedral limitations (e.g. put a * in the dependence analysis), the current implementation enforces more stringent structural constraints: the IR is plainly invalid.

Not going to dig deeper into this for now, it’s great there seems to be agreement to invert the trait, I think it will be a first steps towards making affine composable with the rest (see the laundry list here, mostly related to canonicalizations) and go beyond traditional polyhedral.

Thanks!

Just to clarify for the trait part - inverting the trait by itself doesn’t bring any more representational power or composing than what’s there - it only changes the default behavior in the absence of a trait as we agree. Right now, if you add AffineScope to ops where you see affine dialect ops being used in conjunction, you will get the same result. (linalg.* region ops are the only ones upstream I’ve come across - yet to see scf.for and scf.if being mixed with affine, but those are the other two candidates.) I also see no harm at this point for all region holding ops except affine.if, affine.for, affine.execute_region to have the reverse trait (ExtendsAffineScope). So +1 to change the default.

For the original post on this thread, adding the AffineScope trait to linalg.indexed_generic is indeed the desired fix for now and I don’t think anything more than a single line patch is needed to make that composing work.

1 Like

Thank you for defending my point of view. I am new to polyhedral theory and was unsure if this request would be valid.

As suggested, I added AffineScope trait to class GenericOpBase here:

https://github.com/llvm/llvm-project/blob/7479a2e00bc41f399942e5106fbdf9b4b0c11506/mlir/include/mlir/Dialect/Linalg/IR/LinalgStructuredOps.td#L508

And this allowed me to lower linalg.indexed_generic operations into the affine dialect

Before:

#transpose_accesses = [
  affine_map<(H,W)->(H,W)>,
  affine_map<(H,W)->(W,H)>
]

#trans_trait = {
  indexing_maps = #transpose_accesses,
  iterator_types = ["parallel","parallel"]
}

// Transpose Matrix `out` inplace
#set0 = affine_set<(d0,d1) : (d0-d1>=0)>
func @transpose(%out: memref<?x?xf32>){

  linalg.indexed_generic #trans_trait
    outs(%out, %out : memref<?x?xf32>, memref<?x?xf32>) {
    ^bb0(%i: index, %j: index, %a: f32, %b:f32):

    %r1, %r2 = affine.if #set0(%i,%j) -> (f32,f32) {
         affine.yield %b,%a : f32, f32
    } else {
         affine.yield %a,%b : f32, f32
    }

    linalg.yield %r1, %r2 : f32 ,f32
  }
  return
}

After:

  #set0 = affine_set<(d0,d1) : (d0-d1>=0)>

  func @transpose(%arg0: memref<?x?xf32>) {
    %c0 = constant 0 : index
    %c1 = constant 1 : index
    %0 = dim %arg0, %c0 : memref<?x?xf32>
    %1 = dim %arg0, %c1 : memref<?x?xf32>
    affine.for %arg1 = 0 to %0 {
      affine.for %arg2 = 0 to %1 {
        %2 = affine.load %arg0[%arg1, %arg2] : memref<?x?xf32>
        %3 = affine.load %arg0[%arg2, %arg1] : memref<?x?xf32>
        %4:2 = affine.if #set(%arg1, %arg2) -> (f32, f32) {
          affine.yield %3, %2 : f32, f32
        } else {
          affine.yield %2, %3 : f32, f32
        }
        affine.store %4#0, %arg0[%arg1, %arg2] : memref<?x?xf32>
        affine.store %4#1, %arg0[%arg2, %arg1] : memref<?x?xf32>
      }
    }
    return
  }

However, I am facing issues with -lower-affine

// mlir-opt tmp2.mlir -lower-affine -o tmp3.mlir
tmp2.mlir:15:16: error: failed to legalize operation 'affine.if' marked as erased
        %4:2 = affine.if #set(%arg1, %arg2) -> (f32, f32) {
               ^
tmp2.mlir:15:16: note: see current operation: %9:2 = "affine.if"(%arg1, %arg2) ( {
},  {
}) {condition = affine_set<(d0, d1) : (d0 - d1 >= 0)>} : (index, index) -> (f32, f32)
tmp2.mlir:20:9: note: found live user of result #0: store %9#0, %arg0[%arg1, %arg2] : memref<?x?xf32>
        affine.store %4#0, %arg0[%arg1, %arg2] : memref<?x?xf32>

Looking into solutions, I found this thread that suggested changing affine.yield by load/stores as a workaround, however I dont have this flexibility due to the constraints to the linalg.indexed_generic op.

Is this a bug with -lower-affine pass and how it handles affine.if that affine.yield results? Or is it a restriction?

I could not find any tests exemplifying affine.if/yield being used together in the same way as scf.if/yield are used.

Being creative, the above could be implemented as follows, but it is not as intuitive to write.

// Transpose Matrix `out` inplace
#set0 = affine_set<(d0,d1) : (d0-d1>=0)>
func @transpose4(%out: memref<?x?xf32>){

  // Transpose inplace - with AffineScope trait added in the compiler
  linalg.indexed_generic #trans_trait
    outs(%out, %out : memref<?x?xf32>, memref<?x?xf32>) {
    ^bb0(%i: index, %j: index, %a: f32, %b:f32):
    %tmp_r1 = alloc() : memref<1xf32>
    %tmp_r2 = alloc() : memref<1xf32>
    affine.if #set0(%i,%j) -> () {
        affine.store %b, %tmp_r1[0] : memref<1xf32>
        affine.store %a, %tmp_r2[0] : memref<1xf32>
    } else {
        affine.store %a, %tmp_r1[0] : memref<1xf32>
        affine.store %b, %tmp_r2[0] : memref<1xf32>
    }

    %r1 = affine.load %tmp_r1[0] :memref<1xf32>
    %r2 = affine.load %tmp_r2[0] :memref<1xf32>

    linalg.yield %r1, %r2 : f32 ,f32
  }
  return
}

Well, I misunderstood the true meaning of AffineScope myself, sorry about that :slight_smile:
For the short-term, I think it is fine to have the trait to unblock you, the trait inversion is preferred as soon as it becomes available.

It may be that the lowering of affine.if does not yet support yields. I dont remember implementing it and I don’t think @ftynse did either?

I haven’t implemented that either. Should be straightforward to do.

affine.if → scf.if in the presence of yield values remains unimplemented - it’s really straightforward and was I guess missed. affine.for → scf.for exists in all cases for example.

@bondhugula and @nicolasvasilache,
I finally had the time to implement the change and test it :slight_smile:
I have created a patch in Phabricator and added you as reviewers: https://reviews.llvm.org/D98756

Thank you for the suggestion on how to implement the affine.if lowering. It worked as expected.

What is currently failing is a test that uses the -linalg-fusion-for-tensor-ops pass:

It started happening after I added the AffineScope trait to the GenericOpBase class.

This causes mlir/test/Dialect/Linalg/reshape_fusion.mlir to fail, by generating an affine_map with symbols instead of dims.

Expected:

#map7 = affine_map<(d0, d1) -> (d0 + d1 * 4)
//...
%5 = affine.apply #map7(%arg3, %arg2)

What is generated:

#map7 = affine_map<()[s0, s1] -> (s0 * 4 + s1)
//...
%5 = affine.apply #map7()[%arg2, %arg3]

Unfortunately I could not fix this yet and would appreciate some guidance.

Again, it is my first time using Phabricator and I appreciate any feedback.
Where is the best place to have this discussion?
Should I have waited to fix the failing test before submitting the change to Phabricator?
Should I have made it in two different patches?

This affine.if lowering has been just fixed by https://reviews.llvm.org/D98760

1 Like

Thank you for the reference. The lowering implementation was indeed more elegant than mine. I have rebased my branch to include these changes.

With regards to the implications of the AffineScope in the LinalgGenericOp class.
I spent some time today trying to narrow down why dims are getting replaced by symbols in the generated affine_map when the AffineScope is present, but I am having problems identifying the issue.

My investigations indicate the this is the call that generates the affine.apply and affine_map:

Precisely in this for loop:

At first glance, can you identify what is the problem?

Thank you in advance

Discussion the code aspect in the patch is fine, if the design discussion started here, it is likely best to keep the high-level aspects in one thread though.

Not necessarily, you did just fine! Folks frequently send Phabricator revision for early prototype or ideas, as an illustration to feed a design discussion, even with test failures.

In general, if you can, you likely should :wink:
Are there two independent features in the patch? Can they be tested independently? Then yes add one on top of each others (Phab supports “linking” revision as dependent from each other).

1 Like

A revision on this is finally out:
https://reviews.llvm.org/D112891

2 Likes