Should linalg.indexed_generic allow for affine operations on its body?

While working on some implementations with the linalg dialect, I had the need to use some conditionals based on the indices of the iteration, linalg.indexed_generic works great as long as I use the indeces with operations of scf or std dialects. If I try to use them with affine.if I get an error.

Example to reproduce:

// True if in Upper Triangular region
#set0 = affine_set<(d0,d1) : (d0-d1>=0)>

#transpose_accesses = [
  affine_map<(H,W)->(H,W)>,
  affine_map<(H,W)->(W,H)>
]

#trans_trait = {
  indexing_maps = #transpose_accesses,
  iterator_types = ["parallel","parallel"]
}

// Transpose Matrix `out` inplace
func @transpose(%out: memref<?x?xf32>){

  linalg.indexed_generic #trans_trait
    outs(%out, %out : memref<?x?xf32>, memref<?x?xf32>) {
    ^bb0(%i: index, %j: index, %a: f32, %b:f32):

    // This does not work: this region indeces cannot be used
    // as inputs to the affine set. Check error below.
    %r1, %r2 = affine.if #set0(%i,%j) -> (f32,f32) {
         affine.yield %b,%a : f32, f32
    } else {
         affine.yield %a,%b : f32, f32
    }
    linalg.yield %r1, %r2 : f32 ,f32
  }
  return
}

Gives me the following error:

generic/transpose.mlir:123:16: error: 'affine.if' op operand cannot be used as a dimension id
    %r1, %r2 = affine.if #set0(%i,%j) -> (f32,f32) {
               ^
generic/transpose.mlir:123:16: note: see current operation: %0:2 = "affine.if"(%arg1, %arg2) ( {
  "affine.yield"(%arg4, %arg3) : (f32, f32) -> ()
},  {
  "affine.yield"(%arg3, %arg4) : (f32, f32) -> ()
}) {condition = affine_set<(d0, d1) : (d0 - d1 >= 0)>} : (index, index) -> (f32, f32)

In constrast, these work ok:

// Transpose Matrix `out` inplace using scf operations
func @transpose2(%out: memref<?x?xf32>){

  linalg.indexed_generic #trans_trait
    outs(%out, %out : memref<?x?xf32>, memref<?x?xf32>) {
    ^bb0(%i: index, %j: index, %a: f32, %b:f32):
      // Are the indexes on the diagonal/upper triangle?
      %cond = cmpi ugt, %i, %j : index
      %r1, %r2 = scf.if %cond -> (f32,f32) {
           scf.yield %b,%a : f32, f32
      } else {
           scf.yield %a,%b : f32, f32
      }
      linalg.yield %r1, %r2 : f32 ,f32
  }
  return
}

// Transpose Matrix `out` inplace using std operations
func @transpose3(%out: memref<?x?xf32>){

  // Transpose inplace
  linalg.indexed_generic #trans_trait
    outs(%out, %out : memref<?x?xf32>, memref<?x?xf32>) {
    ^bb0(%i: index, %j: index, %a: f32, %b:f32):

    %1 = cmpi sge, %i, %j : index
    %r1 = select %1, %b, %a : f32
    %r2 = select %1, %a, %b : f32

    linalg.yield %r1, %r2 : f32 ,f32
  }
  return
}

Shouldn’t linalg.indexed_generic be marked with the AffineScope trait? If not, can you help me understand why there is a limitation?

1 Like

Hello @agostini01,

I would say it depends on what you would like to achieve:

  • just increasing expressiveness to kick things down the line so that later affine passes can pick some of this up makes sense and I think is doable with symbols, see below.
  • having linalg abide by the afffine mandate on dims is less clear.

In general,lLinalg does not want to limit itself to affine constraints and is evolving towards supporting more general iterators and data types (see in particular @aartbik’s ongoing sparse work).

A more future-proof solution for affine to interplay more nicely with the rest of the system is proposed here: Steps towards generalizing vectorization in Affine - #9 by nicolasvasilache. There is no concrete progress on this AFAIK.

You could alternatively try embedding some AffineScope region (maybe just a local op that does nothing and has the trait) within the linalg op and you’d be able to use scf.if on symbols.
If you wanted to manipulate dims better then you’d probably want to lower linalg to affine.for, which is something that exists but is not actively used yet.

Orthogonally, there is also ongoing work by @gysit to create a linalg.index-like op that could help unify linalg.generic and linalg.indexed_generic. We expect to know more how this will play out within the next 1-4 weeks.

Thank you for the reply and links to the other discussion.

Similar to the author of the post, we would like to use as much as possible of the optimization infrastructure implemented in the affine dialect.

I expect that analysis/optimizations of the affine dialect do not reason well about scf.if or std.select,cmpi (implementing affine.max,min). Is this correct?

That would be our use case.

It was a bit counter intuitive to have -convert-linalg-to-affine-loops failing because the body of the generic_indexed was using an affine.if op. Considering that linalg.generic_indexed ops already use affine_map<> to describe iterations/access and has an implemented lowering to affine, I was expecting it to work.

Uhmm… so I should not be using -convert-linalg-to-affine-loops?

Yes! This makes total sense. I am not following all sparse discussions, but do you think that supporting affine.if,max,min would interact poorly with with these new (sparse) representations? Why would it be different with the suggested arith.affine_apply/min/max.

I will investigate this solution and see if results on better optimizations down the line.

I look forward to learn more about this work.

It was not my use case, but I now realize that you may have additional values, from outside of the linalg.generic* region, taking part on the computation in the the region’s body and not being passed as ins/outs to the linalg.generic op. Being that the case, I see that some of these values can potentially not satisfy any of the conditions in Affine: Restrictions on Dimensions and Symbols.

Is this what is preventing linalg.generic_indexed to implement the AffineScope trait?

Isn’t declaring an AffineScope something that can be done for basically any op holding a region when the op itself “does not want to limit itself to affine constraints”?
I actually always thought that we could consider any region an “AffineScope” by default and reverse the traits for affine operations.

1 Like

It looks like the original post’ question is being misunderstood - there is a simple answer to it.

Absolutely, such region holding ops should actually be marked with the AffineScope trait unless there the intent is to perform optimizations across the outside and the inside of such an op, which isn’t the case here.

This is completely unrelated to the original post’s question on the AffineScope trait! AffineScope trait REMOVES constraints and does NOT add constraints, i.e., you would not be restricted from using affine ops in several cases inside the regions - simply because: with that trait there are more things that can be symbols than before! (not less) For eg. one could do a affine store to load forwarding inside such regions out of the box if the traits are marked properly. So the OP is right and the answer I think for now is a simple yes.

This is exactly right @mehdi_amini

Not always - you may not want to always consider a region holding op as an AffineScope: as a trivial example, an affine.for or an affine.if’s region is not marked AffineScope - because you would make optimizations “local” to it and miss out on optimizing any larger pieces of IR (i.e., every for and if would wind up being optimized in isolation). But where one is clear on the boundaries to apply for polyhedral optimization, adding AffineScope only “unconstrains” things and allows more affine optimizations: say all function-like ops, scf.for, scf.if, and most region holding ops.

Fair enough, looked deeper into the doc of the trait, and this makes sense. Is it accurate to say an AffineScope is roughly the inverse/complement of the traditional polyhedral SCoP (static control part, section 2.3.1)?

I am unclear why this needs to leak out of Affine though.

From what I see, you seem to prefer to define the unit of applicability of affine transformations in a subtractive fashion rather than additive (traditionally used with polyhedral SCoPs). I imagine there are good reasons for going contrary to established terminology. I imagine there could be some advantage when mixing “static-control parts” and non-constrained parts: do you have a motivating example?

More concretely, is there something that would be more difficult if we used an additive process?
affine.for, affine.if and other future ops would have a trait “X”. Consecutive nested regions with “X” would form bands of “X” that form the unit of affine applicability. I am calling this trait “X” and not “static control part” because I am assuming there are good reasons to not use standard polyhedral terminology (SCoP).

With the above, region-carrying ops in other dialects would not have to worry about declaring to the world that it is in fact ok to hold an affine op. AFAICT only FuncOp and shape.function_library have AffineScope. This seems to suggest every new op with a region in every dialect should have AffineScope, which does not sound desirable from a SW-design perspective (i.e. what prevents this from blowing up with more traits and more ops).

The current design seems to break every notion of separation of concerns…

Let’s not define simpler things in terms of vastly more complex things! :slight_smile: AffineScope is just what its doc says: it defines a new “top-level” for MLIR symbols.

Well I am trying not to redefine the concept here, I am asking how AffineScope relates to polyhedral SCoP (if at all). I don’t think it’s too far fetched for people versed in the polyhedral literature to see AffineScope and think of a polyhedral SCoP. They would be quite mistaken indeed, since these seems to be “inverse” concepts :slight_smile: .

I am unclear why this needs to leak out of Affine though.

I didn’t understand. If it makes sense for an op, it can use the trait. The trait is in turn used/looked at by affine/polyhedral utilities - consider other traits or operation interfaces as an analogy. Please see further below for an example.

Again, I can’t tell what you mean by “additive” and “subtractive” here.

I can only guess what “additive” is. In general, it’s convenient from a compiler dev experience standpoint if you can see in the IR what the units of optimization are and no in-memory inferences are being made.

But that’s how MLIR’s op traits work - AutomaticAllocationScope, IsolatedFromAbove, etc. AffineScope doesn’t say whether it’s okay to hold an affine op or not, nor does it restrict what you can hold. It creates a new scope for the purpose of definition of symbols for affine or other purposes. It allows the use of SSA values as valid symbols; as a result of that, you can have more affine ops instead of unrestricted forms of those ops which also and by design exist since that’s how the affine ops would have to be lowered.

To provide an analogy, if you use the IsolatedFromAbove trait, you can’t refer to SSA symbols from the “top outside” of the op. So, the trait would prevent the use of operands that are defined outside in inside ops just like AffineScope would allow the use of values defined at its top level as symbolic operands. In both cases, a violation of the trait would lead to verification failures.

Right, that’s why I meant by “reverse the trait for affine operations”.
I.e. we could say that all regions are AffineScope unless a trait is on the operation (and the trait would be set for adding.for, affine.if, etc.). This would have the merit of having affine ops work in all regions by default I think.
(and I think I’m saying the same thing as Nicolas here, when he mentions “additive”/“subtractive”)

I now see what you (and Nicolas also perhaps) meant! Note that “reverse the trait for affine operations” really didn’t convey the same to me because there is no trait currently attached to affine operations to reverse! Instead, what you meant was to use the reverse of the AffineScope trait (perhaps call it “ExtendsAffineScope” for expanding or extending an existing affine scope) for affine operations and so the remaining region-holding operations would by default have the AffineScope trait. This definitely makes sense to me.

The original thinking was that we would just go ahead and add the AffineScope trait to the few region holding operations where it makes to hold affine dialect operations (sort of as needed). Because if something isn’t depending on the affine dialect (say the TF or HLO region holding ops) having that property is good as a dead trait. But we can negate this trait for sure - the names I can think of are “ExtendsAffineScope” or “ExpandsAffineScope”, i.e., extending/continuing an existing affine scope that started “above” (in one of the parents) and continued up until.

Yes, thanks for bringing this one up. Assuming 1 single region to simplify the discussion, I think what I find counterintuitive is that:

  • SSA values defined outside can be used within the region unless it has the IsolatedFromAbove trait.
  • SSA values defined outside cannot be used as symbols in affine.for/affine.if/affine.load/affine.store within the region unless it has the AffineScope trait.

… Picking this up after a few days, I can now make this more concise given @bondhugula’s last answer.

The additive / subtractive discussion relates to the determination of what is the unit of polyhedral / affine analyses and transformations:

  • SCoPs have additive semantics: roughly, (contiguous) unions of SCoPs result in bigger SCoPs and more global optimization opportunities.
  • the current AffineScope trait intuitively feels “subtractive”:
The polyhedral scope defined by an operation
with this trait includes all operations in its region excluding operations that
are nested inside of other operations that themselves have this trait.

In the current implementation, what could/should be an analysis on a certain number of affine dialect ops with traits is instead specified as a verified IR constraint.

Where I see opportunities to relax traditional polyhedral limitations (e.g. put a * in the dependence analysis), the current implementation enforces more stringent structural constraints: the IR is plainly invalid.

Not going to dig deeper into this for now, it’s great there seems to be agreement to invert the trait, I think it will be a first steps towards making affine composable with the rest (see the laundry list here, mostly related to canonicalizations) and go beyond traditional polyhedral.

Thanks!

Just to clarify for the trait part - inverting the trait by itself doesn’t bring any more representational power or composing than what’s there - it only changes the default behavior in the absence of a trait as we agree. Right now, if you add AffineScope to ops where you see affine dialect ops being used in conjunction, you will get the same result. (linalg.* region ops are the only ones upstream I’ve come across - yet to see scf.for and scf.if being mixed with affine, but those are the other two candidates.) I also see no harm at this point for all region holding ops except affine.if, affine.for, affine.execute_region to have the reverse trait (ExtendsAffineScope). So +1 to change the default.

For the original post on this thread, adding the AffineScope trait to linalg.indexed_generic is indeed the desired fix for now and I don’t think anything more than a single line patch is needed to make that composing work.

1 Like

Thank you for defending my point of view. I am new to polyhedral theory and was unsure if this request would be valid.

As suggested, I added AffineScope trait to class GenericOpBase here:

https://github.com/llvm/llvm-project/blob/7479a2e00bc41f399942e5106fbdf9b4b0c11506/mlir/include/mlir/Dialect/Linalg/IR/LinalgStructuredOps.td#L508

And this allowed me to lower linalg.indexed_generic operations into the affine dialect

Before:

#transpose_accesses = [
  affine_map<(H,W)->(H,W)>,
  affine_map<(H,W)->(W,H)>
]

#trans_trait = {
  indexing_maps = #transpose_accesses,
  iterator_types = ["parallel","parallel"]
}

// Transpose Matrix `out` inplace
#set0 = affine_set<(d0,d1) : (d0-d1>=0)>
func @transpose(%out: memref<?x?xf32>){

  linalg.indexed_generic #trans_trait
    outs(%out, %out : memref<?x?xf32>, memref<?x?xf32>) {
    ^bb0(%i: index, %j: index, %a: f32, %b:f32):

    %r1, %r2 = affine.if #set0(%i,%j) -> (f32,f32) {
         affine.yield %b,%a : f32, f32
    } else {
         affine.yield %a,%b : f32, f32
    }

    linalg.yield %r1, %r2 : f32 ,f32
  }
  return
}

After:

  #set0 = affine_set<(d0,d1) : (d0-d1>=0)>

  func @transpose(%arg0: memref<?x?xf32>) {
    %c0 = constant 0 : index
    %c1 = constant 1 : index
    %0 = dim %arg0, %c0 : memref<?x?xf32>
    %1 = dim %arg0, %c1 : memref<?x?xf32>
    affine.for %arg1 = 0 to %0 {
      affine.for %arg2 = 0 to %1 {
        %2 = affine.load %arg0[%arg1, %arg2] : memref<?x?xf32>
        %3 = affine.load %arg0[%arg2, %arg1] : memref<?x?xf32>
        %4:2 = affine.if #set(%arg1, %arg2) -> (f32, f32) {
          affine.yield %3, %2 : f32, f32
        } else {
          affine.yield %2, %3 : f32, f32
        }
        affine.store %4#0, %arg0[%arg1, %arg2] : memref<?x?xf32>
        affine.store %4#1, %arg0[%arg2, %arg1] : memref<?x?xf32>
      }
    }
    return
  }

However, I am facing issues with -lower-affine

// mlir-opt tmp2.mlir -lower-affine -o tmp3.mlir
tmp2.mlir:15:16: error: failed to legalize operation 'affine.if' marked as erased
        %4:2 = affine.if #set(%arg1, %arg2) -> (f32, f32) {
               ^
tmp2.mlir:15:16: note: see current operation: %9:2 = "affine.if"(%arg1, %arg2) ( {
},  {
}) {condition = affine_set<(d0, d1) : (d0 - d1 >= 0)>} : (index, index) -> (f32, f32)
tmp2.mlir:20:9: note: found live user of result #0: store %9#0, %arg0[%arg1, %arg2] : memref<?x?xf32>
        affine.store %4#0, %arg0[%arg1, %arg2] : memref<?x?xf32>

Looking into solutions, I found this thread that suggested changing affine.yield by load/stores as a workaround, however I dont have this flexibility due to the constraints to the linalg.indexed_generic op.

Is this a bug with -lower-affine pass and how it handles affine.if that affine.yield results? Or is it a restriction?

I could not find any tests exemplifying affine.if/yield being used together in the same way as scf.if/yield are used.

Being creative, the above could be implemented as follows, but it is not as intuitive to write.

// Transpose Matrix `out` inplace
#set0 = affine_set<(d0,d1) : (d0-d1>=0)>
func @transpose4(%out: memref<?x?xf32>){

  // Transpose inplace - with AffineScope trait added in the compiler
  linalg.indexed_generic #trans_trait
    outs(%out, %out : memref<?x?xf32>, memref<?x?xf32>) {
    ^bb0(%i: index, %j: index, %a: f32, %b:f32):
    %tmp_r1 = alloc() : memref<1xf32>
    %tmp_r2 = alloc() : memref<1xf32>
    affine.if #set0(%i,%j) -> () {
        affine.store %b, %tmp_r1[0] : memref<1xf32>
        affine.store %a, %tmp_r2[0] : memref<1xf32>
    } else {
        affine.store %a, %tmp_r1[0] : memref<1xf32>
        affine.store %b, %tmp_r2[0] : memref<1xf32>
    }

    %r1 = affine.load %tmp_r1[0] :memref<1xf32>
    %r2 = affine.load %tmp_r2[0] :memref<1xf32>

    linalg.yield %r1, %r2 : f32 ,f32
  }
  return
}

Well, I misunderstood the true meaning of AffineScope myself, sorry about that :slight_smile:
For the short-term, I think it is fine to have the trait to unblock you, the trait inversion is preferred as soon as it becomes available.

It may be that the lowering of affine.if does not yet support yields. I dont remember implementing it and I don’t think @ftynse did either?