Generic Alias Analysis

Hi all. We are currently working on a generic buffer-assignment (BA) pass that automatically inserts dealloc and copy operations at the right places and will optimize all used buffers in the future. The intended BA phase moves allocs/deallocs (or removes allocs/deallocs) based on liveness information and alias-analysis insights. However, finding and tracking aliases isn’t trivial on MLIR programs. Consider the following code snippet:

module {
  func ....(%arg0: memref<tensor<f32>>, %arg1: memref<tensor<f32>>) -> ... {
    %0 = "my_amazing_swap" (%arg0, %arg1) :
        (memref<tensor<f32>>, memref<tensor<f32>>) -> memref<tensor<f32>>
    br ^bbXX(%0 : memref<tensor<f32>>)

The dialect-specific “my_amazing_swap” operation has the semantics that it swaps the two given values and returns one of them. Without having any knowledge about this operation we have to treat it as “opaque” and either ignore potential aliases that can be introduced by this operation or have to assume the worst case: all passed pointer values can be returned in this case. This will lead to a huge number of potential aliases and limit optimization potential significantly.

To overcome these limitations, we propose an AliasAnalysis class with several functions to give us more information about potential aliases. It’s the dialect expert who is in charge of instantiating the BA phase in a proper way. In other words: a dialect expert has to pass domain-specific information to the alias analysis to improve the optimization potential (similar to “addIllegalOp” that tells the rewriting system that a certain operation will not be supported). Without having precise information about such cases we always have to assume that all passed values can be potential aliases.

What do you think?

This part isn’t clear to me. I’d expect an alias analysis to define some interfaces to be implemented by the ops?

Yes, this reads like it is asking for an operation interface. Is there anyone else working on alias analysis and a corresponding interface? Otherwise, a good first step would be to think about what such interface should look like/what is required for the current use case.

In the context of HLO, we currently assume no aliasing of inputs and outputs for all operations.

1 Like

@herhut Exactly. @mehdi_amini Creating an interface generally sounds like a good idea. However, I think we should spend some time discussing how such an interface could look like to cover all (potential/current) use cases.

This is interesting in the context of a system like MLIR.

It is true that in a module with only HLO operations, each HLO operation can be modeled as pure and not side effecting.

However, if we want to combine LHLO and HLO in the same module then HLO cannot be (at least naively) modeled as pure since an LHLO operation could be writing to a buffer that the HLO operation is reading under the hood.

In particular, a pure operation typically means that it can be reordered with any other operation. But HLO operations cannot be reordered with arbitrary LHLO operations.

1 Like

This is currently modeled using tensor_load and tensor_store operations that load a tensor value and store a tensor value from/into a memref, respectively. So if you have an HLO operation that consumes the result of an LHLO from a memref, you get a tensor_load which has sideeffects and hence cannot be reordered with other LHLO operations on the same memref (or an alias thereof). The tensor_load has copying semantics, so it takes a snapshot of the buffer at that point in time. When lowering to memrefs, the loads hence need to become copy operations.

We do not transform the loads into copies, yet, as we do not reuse buffers at the moment. As soon as we start to introduce reuse before we have fully lowered to LHLO, we will need to turn tensor_load into a copy. Likewise, a tensor_store becomes a copy unless the target buffer has not other conflicting use and hence the copy can be elided.

Thanks, I think the tensor_load & tensor_store based scheme works.

@herhut Exactly. Once we reuse buffers, we may need to convert tensor_load to a copy operation unless it becomes redundant elsewhere.