Alias Analysis Framework in MLIR

Hi all,

I’d like to discuss the start of a proper alias analysis framework in MLIR. The goal is to have a single general alias analysis “AliasAnalysis” that can be used by any analysis/transformation to answer alias queries. Below I delve into a few of the high level details on the structure of the framework. The main intention of this post is to get some consensus on building out this infra, and not to get everything right in the first try. A lot of things are left somewhat opaque or overly conservative so that we can sketch out those specific pieces after we have something suitable as a base.

High Level Structure

At a high level, this analysis functions very similarly to how the AAManager is structured in LLVM. More specifically, the AliasAnalysis class in MLIR is merely an aggregate of different alias analysis implementations, presented to the user with a single API. It does not actually contain much(if any) logic related to computing aliasing behavior. This allows for supporting a wide variety of different implementations for alias analysis, which is especially important for MLIR given the wide scope of abstractions defined by dialects. Each of these implementations override a set of hooks that tell if two given memory locations alias, and in what way. In general, these hooks provide an AliasResult that specifies if the locations have Partial, Must, May, or No alias behavior. The definitions of each result roughly corresponds to that of literature, and matching that of LLVM. To keep things simple, only one hook is currently necessary for alias implementations to override. A rough example of putting this together is shown below:

/// An example alias analysis implementation.
/// Note: We use concept based polymorphism for alias analysis implementations, meaning that 
/// they don’t use direct inheritance for the hooks.
struct MyAliasAnalysis { 
  /// Given two values, return their aliasing behavior.
  AliasResult alias(Value lhs, Value rhs);

/// This class represents the main alias analysis interface in MLIR. It
/// functions as an aggregate of various different alias analysis
/// implementations. This aggregation allows for utilizing the strengths of
/// different alias analysis implementations that either target or have access
/// to different aliasing information. This is especially important for MLIR
/// given the scope of different types of memory models and aliasing behaviors.
/// For users of this analysis that want to perform aliasing queries, see the
/// `Alias Queries` section below for the available methods. For users of this
/// analysis that want to add a new alias analysis implementation to the
/// aggregate, see the `Alias Implementations` section below.
class AliasAnalysis {
  AliasAnalysis(Operation *op);

  // Alias Implementations

  /// Add a new alias analysis implementation `AnalysisT` to this analysis
  /// aggregate. This allows for users to access this implementation when
  /// performing alias queries. Implementations added here must provide the
  /// following:
  ///   * AnalysisT(AnalysisT &&)
  ///   * AliasResult alias(Value lhs, Value rhs)
  ///     - This method returns an `AliasResult` that corresponds to the
  ///       aliasing behavior between `lhs` and `rhs`.
  template <typename AnalysisT>
  void addAnalysisImplementation(AnalysisT &&analysis);

  // Alias Queries

  /// Given two values, return their aliasing behavior.
  AliasResult alias(Value lhs, Value rhs);

Alias Analysis Implementations

As noted above, the framework supports the ability for users to provide their own alias analysis implementations. I see implementations being provided via two main sources: dialects and pass pipelines. For dialects, we can provide an interface to allow for dialects to hook into. This would, e.g., allow for dialects to implement their own TBAA-esque alias analysis without the need to develop something that covers every possible dialect. For pass pipelines, the use case is the same as LLVM in that it allows for pass pipelines to configure which alias analyses they support or their relative priority. It also enables researchers to test their own implementations against those used within MLIR for accuracy/efficiency/etc. I haven’t sketched out the exact API for the dialect interface/pass pipeline usage because while I feel that these are very important use cases for the framework to support, I don’t have a driving use case at present(mainly because we don’t have an alias analysis framework at all right now) and I would prefer to design these pieces with proper motivation.

I’ve sent out that adds the initial framework and starts building out a local alias analysis(essentially BasicAA in LLVM parlance) using some of the interfaces already available in MLIR.

What is a Memory Location in MLIR?

The sections above mention the concept of a “Memory Location”, which generally corresponds to “everything I know about a specific piece of memory”. This is an incredibly useful concept because it allows for performing a more exact alias analysis, because if you know the size of the memory access, for example, you can more easily detect when two memory accesses don’t overlap. If you ask what this means in MLIR, the only real answer I can give at the moment is “I don’t know”. MLIR has many different abstractions that have different amounts of information available, and I don’t think a “one size fits all” approach works in general here. My general inclination is that we will likely end up with different “types’’ of memory locations for different dialect types. For example, would the memory location we have for LLVM be the same for MemRef? Circt types? TensorFlow resources? Given the open endedness here, this is the type of thing that I’d like to build out after we have a framework that exists.

Help Me

MLIR is missing a lot of important pieces of infra surrounding general alias analysis, dependence analysis, etc. The purpose of this RFC is to start building those pieces out and to seek out those who want to collaborate on building it. There are many interesting problems in this area of the compiler, e.g. things like memory dependence(MemorySSA in MLIR??), that are hard problems to solve even when your infra isn’t extensible. If anyone is interested, I’d love to chat more about these problems and how we should go about solving them.


Big +1 from me. I wish I had the background to offer more than encouragement – I feel as though the lack of these pieces of basic compiler infra does create a drag on the project (both in the category of things not being attempted and sub-optimal designs that could be simplified if they had existed).

I’ll be really interested in participating, and particularly interested in how memory locations and aliasing information is represented and exploited in mid-level and high-level dialects. There are already several passes waiting to make use of this information so that they become sound in general.

+1 We are about to write a some new alias analysis and buffer placement optimization in our dialects. I’d be interested in using the new framework.

Yup, big +1 from me as you already know. There are a few places around view/subview/linalg that will greatly benefit from a better retargetable design. Isn’t that a reasonable first motivating example? Lots of things to discuss given that we have higher-level structured types than LLVM and are likely going to go significantly deeper in 2021.

Given LLVM’s Must/May/No is based on pointers and their starting address + size, I am unclear what Partial is. I can foresee a bunch of other cases such as A == B, A fully encompasses B, A and B are complementary in C etc.