[RFC] Introducing memref aliasing attributes

What: introducing set of aliasing metadata to memref dialect, similar to LLVM ones LLVM Language Reference Manual — LLVM 22.0.0git documentation

Motivation: LLVM provides a set of aliasing metadata for fine-grained control which read/write operations can (not) alias, which is important to optimizations. Unfortunately, currently there is no way to generate it going from memref dialect.

We propose to introduce similar constructs to the memref dialect.

Example:

```
#alias_scope1 = #memref.alias_scope<id = distinct[0]<>, description = "scope">
#alias_scope2 = #memref.alias_scope<id = distinct[1]<>>

func.func @memref_alias_scope(%arg1 : memref<?xf32>, %arg2 : memref<?xf32>, %arg3: index) -> f32 {
  %0 = memref.alias_domain_scope "The Domain" -> f32 {
    %val = memref.load %arg1[%arg3] { alias = #memref.aliasing<alias_scopes=[#alias_scope1], noalias=[#alias_scope2]> } : memref<?xf32>
    memref.store %val, %arg2[%arg3] { alias = #memref.aliasing<alias_scopes=[#alias_scope2], noalias=[#alias_scope1]> } : memref<?xf32>
    memref.alias_domain_scope.return %val: f32
  }
  return %0 : f32
}
```

Any meref load/store op can have AliasingAttr attached, consisting of alias_scopes and noaliaslists. The semantics is (from LLVM doc):

When evaluating an aliasing query, if for some domain, the set of scopes with that domain in one instruction’s alias.scope list is a subset of (or equal to) the set of scopes for that domain in another instruction’s noalias list, then the two memory accesses are assumed not to alias.

The notable difference is that aliasing domain is modeled using region op instead of attribute. Modeling domain using attributes (as it currently done in LLVM dialect) have issues with inlining, as you have to create new domain and update all attributes each time you inline the function and it doesn’t compose well with MLIR inlining infra. Associating domain with the region op solves it naturally as domain op is cloned during the inlining. Using regions for domain also limits the API somewhat as unlike LLVM you can only have a single domain attached to the op.

The change is backward compatible as aliasing attributes are optional and omitting them corresponds with current behavior (load/store op may alias with any other load/store op). Dropping alias attributes during transformation is safe and can only result in suboptimal codegen but not in miscompilations. We are planning to update existing transformations incrementally to support new attributes.

While mostly modeled against LLVM, we believe the concept is generic enough to be introduced into middle-level dialect like memref. Lowering to LLVM is straightforward and if other backend doesn’t support such constructs (e.g. SPIR-V doesn’t have such fine-grained control) it’s always safe to ignore/drop the metadata during lowering.

The initial PR [mlir][memref] Alias attributes for memref dialect by Hardcode84 · Pull Request #154991 · llvm/llvm-project · GitHub

4 Likes

+1 to aliasing information on memrefs.

I’m assuming multiple concurrent scopes can be modelled as nested scopes:

  %0 = memref.alias_domain_scope "A Domain" -> f32 {
    %1 = memref.alias_domain_scope "B Domain" -> f32 {
      %A = memref.load %arg1[%arg2] { alias = #memref.aliasing<alias_scopes=[#alias_scope1], noalias=[#alias_scope2, #alias_scope3]> } : memref<?xf32>
      %B = memref.load %arg3[%arg4] { alias = #memref.aliasing<alias_scopes=[#alias_scope2], noalias=[#alias_scope1, #alias_scope3]> } : memref<?xf32>
      %C = some.op %A, %B : memref<?xf32>
      memref.store %C, %arg5[%arg6] { alias = #memref.aliasing<alias_scopes=[#alias_scope3], noalias=[#alias_scope1, #alias_scope2]> } : memref<?xf32>
      memref.alias_domain_scope.return %val: f32
    }
  }
  return %0 : f32

This neatly solves the inlining problem, but it creates a fusion problem. Just metadata, like LLVM IR, is “loose” in the regions and can be reordered, loop-fused, hoisted, etc. Now, I need to keep track of all my scopes and nest them correctly every time I do code rearrangement.

Another question is: how do I pass this info across function boundaries? Say I want to create two versions of a function, one that never alias (and is super fast) and one that may alias. To replace the call in the caller, I’d have to recursively scan the code up the call chain to detect aliasing. If I can propagate that on the type (or as a parameter attribute), I would only need to scan the immediate call sites.

Finally, how do we generate this from tensor? This has to be an opinionated bufferization, that assumes knowledge on what a tensor really is and where it lays in memory. It’s not different than it is today, but we’re starting to give guarantees to the compiler, which needs stronger semantics on either tensor or bufferization logic.

Easy solution is to just say: “bufferization is what it is, use at your own peril” and we do “whatever works” there, but that’s not a good upstream solution. Another way would be to talk about tensor scope, or to assume every tensor has its own scope by construction. All very opinionated “solutions”, and that would invariably change how we think about these types.

So, in the end, how we solve this particular problem would need us to think about the overall semantics of all surrounding parts.

1 Like

Two general questions:

  • How does having persistent scope in IR help if the metadata is still propagated through attributes which can be freely dropped?
  • Since we have richer type system in MLIR, could the alias metadata be a part of resource i.e., memref type itself?
1 Like

I’m assuming multiple concurrent scopes can be modelled as nested scopes:

Nested scopes are allowed and actually required for inlining to be composable, but when querying alias properties only the innermost domain scope is taken into account.

in the regions and can be reordered, loop-fused, hoisted, etc.There

There is a potential problem with hoisting load/store ops out of domain regions, but it’s not unique to this specific proposal, i.e. memref.alloca_scopewill also break if we hoist alloca out of it, I guess we need some sort of hoisting check interface? Something along the lines RegionOpInterface::canBeHoisted(Operation* op) (and also the opposite check, when we move ops inside the region).

Another question is: how do I pass this info across function boundaries? Say I want to create two versions of a function, one that never alias (and is super fast) and one that may alias. To replace the call in the caller, I’d have to recursively scan the code up the call chain to detect aliasing. If I can propagate that on the type (or as a parameter attribute), I would only need to scan the immediate call sites.

Not sure I’m follow, can you provide a code example? In general, I don’t think this interface is safe to use across function boundaries, for the same reasons inlining is not safe without updating the domains.

Finally, how do we generate this from tensor? This has to be an opinionated bufferization

In our current pipeline we are not going through the tensor and planning to apply these attributes directly on memref level. Bufferization won’t set any aliasing attributes either as for now and it’s the most conservative option (missing attribute means anything may alias with anything else) so it will still work. We may explore propagating it from tensor level later, but it’s out of scope for this proposal.

How does having persistent scope in IR help if the metadata is still propagated through attributes which can be freely dropped?

Attributes can still be freely dropped and it’s safe, the reason to model domain as region it to avoid generating invalid code during the inlining.

Since we have richer type system in MLIR, could the alias metadata be a part of resource i.e., memref type itself?

Modelling it as types is probably possible, but I’m not sure we will get much win from it

func @foo(%arg1 : memref<?xf32>, %arg2: index, %arg3: index) {
    memref.alias_domain_scope -> f32 {
        %val = memref.load %arg1[%arg2] { alias = #memref.aliasing<alias_scopes=[#alias_scope1], noalias=[#alias_scope2]> } : memref<?xf32>
        memref.store %val, %arg1[%arg3] { alias = #memref.aliasing<alias_scopes=[#alias_scope2], noalias=[#alias_scope1]> } : memref<?xf32>
    }
}

If you want to translate this code to purely type-based approach you will have to add memref casts for load and store and you still have a question how to model the domain, because if you make the domain part of the memref type you will need to update all memref types during the inlining.

1 Like

Can we use an op interface instead of fragile discardable attributes?
Op can then decide to use the attribute (inherent hopefully) or some other annotations. Someone could design it with information retrieved from the type possibly.
Actually hopefully the design would account for that: I would like to see this not tied to the memref dialect but be more generically usable. The pointer dialect seems like a better place for this to live (it can still be used by the memref dialect of course).

How have you thought about the extensibility aspect of it?

1 Like

It’s a proper attribute, and I’m adding the op interface [mlir][memref] Alias attributes for memref dialect by Hardcode84 · Pull Request #154991 · llvm/llvm-project · GitHub as well, but we will still need to update all the memref transforms to propagate this attribute.

I’d like to propose an entirely different approach here that should be sufficient for most purposes.

Instead of trying to replicate alias_scope/…, we instead introduce the operation (name bikesheddable)

%r1, ... = memref.distinct_objects %x0, %x1, ... none_are(%y0, %y1, ...) : (memref<...>, memref<...>, ..) -> (type(%r1), ...), %memref<...>,  {
^bb0(%arg0 : memref<...>, %arg1: memref<...>, ...):
  ...
  memref.yield ...
)

The semantics of this operation are that it executes the region, binding %x0, %x1, … to %arg0, %arg1, ..., and it encodes the assumption, that within that region, the %argN` all refer to distinct, non-overlapping regions of memory and don’t alias each other. It’s always safe to inline this operation (just replace the arguments with their init values) but that’s a pessimization since you lose the alias info.

The none_are arguments are not passed into the region, and are a signal that none of the %xN alias any of the objects referred to by any of the %yN. For example, if you have

@x = ...
func.func @foo(%arg0: memref<...>, %arg1: memref<...>, ...) {
  memref.distinct_objects(%arg0, %arg1) none_are(@x) {
    ...
  }
}

, this means that you know that neither %arg0or%arg1will alias the memory of@x`.

You can then traverse use-def chains on the arguments to collect memory operations into alias sets, and you could lower this so an alias_scope for the region + alias domains for each argument when doing to LLVM (all you’d really need is some bookkeeping for which memref op became which load, store, or intrinsic)

This also lowers nicely to noalias on function arguments - if the input is a function argument, you can paint the target’s noalias/restrict on the relevant pointer.

Having the region defined like this also eliminates hoisting issues - you can’t hoist the user of a region argument out of a region, and gets rid of any problems with inlining (different copies of the region make different objects distinct). You might need manual work to identify when it’s safe to move something into the region, but that’s already the case with other instances where fusion is of interest.

My current thought is that, while this costs a bit more implementation work to lower, it’s a somewhat better behaved design

2 Likes

Thanks @krzysz00 ! I like this approach a lot more than the previous one.

@krzysz00 do you have any thoughts on how the example above with a single memref would be represented?

func @foo(%arg1 : memref<?xf32>, %arg2: index, %arg3: index) {
    memref.alias_domain_scope -> f32 {
        %val = memref.load %arg1[%arg2] { alias = #memref.aliasing<alias_scopes=[#alias_scope1], noalias=[#alias_scope2]> } : memref<?xf32>
        memref.store %val, %arg1[%arg3] { alias = #memref.aliasing<alias_scopes=[#alias_scope2], noalias=[#alias_scope1]> } : memref<?xf32>
    }
}

I suppose you could make subviews, then mark those distinct?

This looks like a direct mapping of the “noalias” annotation on function arguments?
It’s however limited in that you can’t defined relationships like a pair of argument arg1 and arg2 are not aliasing each other but may alias arg3 and arg4 (where the pair isn’t aliasing with each other).

Wouldn’t you just capture the same object twice, materializing it as two different SSA value in the region?

2 Likes

How would you handle sub-views with the distinct approach? A sub-view of A is always distinct from a sub-view of B if A and B are distinct. But two sub-views from A can also be distinct. Just marking things as distinct doesn’t allow you to reason about where they are. You’d have to propagate the property throughout the entire program and keep it up to date upon transformations.

Alternative is the approach from memory safe schemes. Memrefs are “regions” and sub-views are “sub-regions” that inherit the same “properties”. But since memrefs can behave like pointers (especially on function arguments), we could have an additional property that the arguments must be distinct. So, if you call such function with a “may alias”, that’s a compiler error.

Then the function is free to optimize memory access, as it’s guaranteed that internally, they never alias. This can be flexible with multi-versioning, too.

Agreed that there are limitations to a region that lets you put noalias on things as a concept - the example where you have a load and store from the same memref where you know (but can’t easily prove) the indices are distinct is a good example of where my proposal falls short and the alais scope stuff makes sense.

As far as subviews, my initial theory of them was that you can put the memref.subview operations outside of the distinct_objects region and then mark each subview as distinct/non-overlapping by using said region. One potential problem is that you often want to rewrite away subviews (such as by folding them into loads/stores). However, that’s solvable with the rewrite

%s1 = memref.subview %a [...1]
%s2 = memref.subview %a [...2]
memref.distinct_objects (%s1, %s2) {
%arg0, %arg1:
  ...(%arg0)
  ...(%arg1)
}

to

memref.distinct_objects (%a, %a) {
%arg0, %arg1:
  %s1 = memref.subview %arg0[...1]
  %s2 = memref.subview %arg1[...1]
  ...(%s1)
  ...(%s2)
}

And as for distinguishing between subviews of A and some other memref B … the alias analysis doesn’t stop at the boundary of the distinct_objects region. If you’re trying to test if %argX and %y alias, and the distinct_objects op doesn’t indicate the relationship between those two values, you then look at whether the %x that %argX came from might alias %y.


However, as for the alias scope proposal - which I’m not fundamentally opposed to, I’m just trying to explore the design space - I think it’s a property of that alias_scope region that if you have

opPre
%x = memref.alias_domain ... {
  %y = opIn
  yield %y
}
%z = opPost

can always become

memref.alias_scope ... {
  opPre
  %y = opIn
  %z = opPost(s/%x/%y)
  yield %z
}

and I think it can also swallow up control flow.

On top of that, if we allow this region to define multiple alias scopes instead of doing it with nesting (like the original proposal should), there’s an argument that the canonical form of this thing is to grow until it encompasses the body of the function it lives in, absorbing other alias scope-defining regions along the way. Does that seem like a bad idea?

(Heck, if we allow the inliner to handle attributes on the function being inlined, we could just completely get rid of the region and make the alias domain definition a function attribute)

The main benefit of the region I see so far is to defer domain resolution step to a separate rewrite. Such that it can be independent of the inlining itself.

Maintaining these separate IR regions would interfere with aforementioned transformations such as hoisting which could highly benefit from a robust alias analysis.
Also, I’m a bit skeptical on how well this metadata region approach scales with more solutions like that. Multiple deeply nested and, potential unrelated, regions?

Could ultimately all alias domain regions be consumed/combined and the alias information preserved in a less invasive way?

Indeed, memory regions don’t always nest perfectly, and then you have runtime offsets, which make it much harder to reason about.

This doesn’t work when the nest is for unrelated regions like my example. You need to know about both A and B regions to tell they don’t alias.

This doesn’t work when using dynamic offsets. Two dynamic subviews of the same region may alias unless you can reason about the offset calculation.

Right, multiple is simpler, but you still need to handle overlapping regions. Or just bundle them all and pay the small cost of making sure which are the active scopes for each op. This goes back to the function attribute, which then is the trivial scope.

Well, no, distinct_object works exactly for relaxing the assumption that subviews with dynamic offsets may alias.

Let’s consider operations O1 and O2 on subviews S1 and S2, both of which are subviews of A (with dynamic offsets).

If we pass those two subviews to a distinct_objects block and put O1 and O2 in the block, the aliasing queries will notice that the memref arguments to O1 and O2 point to distinct arguments of a distinct_objects region, and therefore conclude that they don’t alias. If it were possible that S1 and S2 could alias … the programmer shuoldn’t’ve plumbed them through a distinct_objects region like that.

Similarly, if you sink the subviews into the distinct_objects region, you’ll be doing alias analysis and notice you have two subviews that don’t share a base memref - because they’re subviewing distinct block arguments. Then, when checking if those two base objects might alias … you’ll find again that they’re distinct arguments to a distinct_objects region, which means that they definitionally don’t alias.

Overlapping regions in what sense?

(But yeah, I think the function attribute - if we can find a way to duplicate it during inlining / merge it with it’s parent / … - which might involve a alias_domain op that’d then rewrite into altering its parent function)

So question:

What is the behavior of casting memref to/from index/integer types with alias scopes?

Take two memrefs that may alias and two subviews of those memrefs. Your distinct flag can be applied to the subviews (if correct) and that solves the problem. But I still need to know when to apply, which could be a runtime decision.

While it’s easier to have run time checks with distinct than using regions, you “still need to handle them” (my original meaning) by inserting them throughout the code.

The notation doesn’t help abstract away that complexity, but to my view, it’s still more powerful than nested regions.

I think this would be a much simpler first step, yes. Inside the function you already know disjoint regions by using alloc and alloca, and may alias based on subviews, so the big problem is merging those through function inlining.

Yeah, subview distinct is probably not quite the right move, and I don’t think it’s part of any of the really live proposals. (Though if I wanted it to be a thing, I’d go for “no other memref in scope shall be used to access the memory referred to by this subview”)

Yes, distinct_object op solves both the inlining and hoisting problems (I would prefer the free standing op instead of region in this case)

%2, %3 = memref.distinct_objects %0, %1
memref.load %2
memref.store %3

There are some more complicated cases which are harder to express:

%1 = ... 
memref.load %1
scf.for ... (%2 = ...) {
    memref.load %2 // %2 is loop carried var, how to express those loads does not alias?
}

none_are(@x) in the krzysz00 proposal is supposed to handle it but it feels ad hoc.

There is also problem that it can also block some optimizations (merging subview ops into loads/stores mentioned before).

There is another problem that we will now need to go through use-def chains (and potentially have a full-blown dataflow analysis) to actually translate them to LLVM attributes.

Also, it is not entirely clear at which place these memref.distinct_objectshould be inserted, from the user perspective it may be more convenient where memrefs are created:

%1 = memref.alloc
%2 = memref.view %1 offset(0)
%3 = memref.view %1 offset(1024)
%4 = memref.view %1 offset(2048)
%5, %6, %7 = memref.distinct_objects %2, %3, %4
// use %5, %6, %7  in the rest of the program

But in this case we will definitely need a dataflow analysis to propagate this info to the actual loads/stores.

Regarding taking a subview of distinct memrefs, this is more of a question of the semantics we want to specify. The most useful IMO will be “No matter what (pointer-calculation-related, memref.subview, memref.view, extract_strided_metadata + reinterpret_cast, to_int/from_int) operations are done on two distinct memrefs they will stay distinct“.

I guess the working pipeline may look like something like this:

  1. memref.distinct_objects are inserted by user in potentially arbitrary places
  2. Use dataflow analysis to find all potential sources (memref.alloc(a), memref.distinct_objects, memref.global, function arguments) for each load/store op.
  3. Use the collected sources to actually generate temp aliasing attributes on memref load/store ops, similar to LLVM ones (or use LLVM attributes directly), it is no longer safe to inline at this point, until memref dialect is converted to LLVM
  4. Translate memref ops and attributes to LLVM

This approach is quite complicated but it’s looks more generic and probably can work.