Guidance for what Patterns can/should be upstreamed

In [mlir][tensor] add tensor insert/extract op folders by asraa · Pull Request #142458 · llvm/llvm-project · GitHub (and [mlir][tensor] move tensor insert canonicalization to pattern by asraa · Pull Request #142671 · llvm/llvm-project · GitHub) my colleague @asraa added some relatively simple tensor patterns upstream intended to fold some uses of tensor insert/extract.

A comment of @mehdi_amini suggested dissatisfaction with the practice of exposing upstream patterns (or populateXPatterns static functions) for downstream users to figure out how to combine consistently.

I’d like to ask the community for some guidance here:

  1. Why is having lots of patterns upstream bad? My biggest issue with it is actually the lack of a discovery mechanism, and otherwise I like having lots of generally useful patterns available upstream that I can assemble into passes out-of-tree to meet my needs.
  2. Is there any established guidance or folklore that I can rely on to determine if a pattern is suitable to be upstreamed?
  3. Is there an implicit expectation that new upstream patterns are provided via a pass (or an op folder, or canonicalizer)?
  4. Is there some future state of MLIR (organizationally, or featurewise, or auxiliary tooling) that would make it acceptable to have lots of upstream patterns? In other words, is a preference to not have lots of upstream patterns an acute pain or a systemic design preference?

+1 on this one - not only is it the case that what you say is accurate (exposing patterns let’s people mix and match according to their downstream needs), the indiscriminate use of anonymous namespaces for the patterns themselves actually contravenes other existing MLIR features.

This is not discoverable, not documentable, hardly testable. Of course patterns can be unit-tested in isolation, but in the grand scheme of thing they are meant to be combined with each other, and as such the unit of testing/documentation/… and more importantly design around which to form a mental model is a larger granularity (aka, most of the time, “a pass”).
Having a zoo of APIs that are disconnected from each other looks completely impractical for anyone downstream except the person who added the patterns. This would make MLIR a dump of code IMO instead of a collection of components that are well thoughts to work together.

There is none because as mentioned above, a pattern in isolation isn’t the right unit to think about IMO. We should think in terms of code transformation in a larger sense. What are we trying to achieve at various stages of transformation of a program and expose collection of components that are modular enough to achieve what is needed in concrete use-cases.

Historically, passes are the only way to test transformations (it does not have to be a folder/canonicalizer: we have transformations outside of this, it also does not have to be a “pattern” in the first place!!).
Beyond passes, we also have the transform dialect now, but IMO it is meant to provide more control on how to apply a sequence of transformation on a specifically matched piece of IR. It is not mean to just get to the granularity of applying one pattern at a time for the sole purpose of testing. That is it breaks the pass granularity which is limited to a module/function historically (it’s not exact, but this approximation works well enough here) to be able to apply transformations more selectively (for example chain transformation on a given loop nest).

There is nothing limiting about the number of patterns, however I strongly believe that for most of them (that is: there can always be exceptions), we need to have these grouped into larger components that are self-consistent and documented.

So ultimately, the thread title asks about “what can/should be upstreamed” when the real issue at hand here is rather “how” should it be done.

2 Likes

Bolding mine.

I challenge this claim - I don’t know where it originates from, I don’t know if it is some kind of implicit assumption, but it is not documented, and much rationale documentation to the contrary exists:

Declaration and Validation [Parsimony and Traceability]

a compiler infrastructure is only as good as the transformations it supports. Common transformations should be implementable as rewrite rules expressed declaratively, in a machine analyzable format to reason about properties of the rewrites such as complexity and completion. Rewriting systems have been studied extensively for their soundness and efficiency, and applied to numerous compilation problems, from type systems to instruction selection. Since we aim for unprecedented extensibility and incremental lowering capabilities, this opens numerous avenues for modeling program transformations as rewrite systems.

How is MLIR relevant?

MLIR […] purpose is to provide mechanics for describing
and transforming programs and computations in a flexible way. It provides common
compiler infrastructure for things like constant folding, dead code elimination,
graph rewriting, and others - which are independent of the representational
choices picked by a given dialect (e.g. its concurrency semantics). It was built
with a specific focus on compile time and memory efficiency, accurate
propagation of source location information (important for reporting high quality
errors and warnings) and is designed for testability.

Unified Graph Rewriting Infrastructure

This is still a work in progress, but we have sightlines towards a general rewriting infrastructure for transforming DAG tiles into other DAG tiles, using a declarative pattern format.
DAG to DAG rewriting is a generalized solution for many common compiler
optimizations, lowerings, and other rewrites and having an IR enables us to
invest in building a single high-quality implementation.

Declarative pattern rules are preferable to imperative C++ code for a number of
reasons: they are more compact, easier to reason about, can have checkers
written against them, and new tools can be built that inspect and manipulate the
declarative patterns in interesting ways - e.g. applying theorem provers to
them. It will be exciting to see this ecosystem develop as the infrastructure
matures.

Generic DAG Rewriter Infrastructure Rationale

There are many different types of compiler transformations, but this document
focuses on a particularly important class of transformation that comes up
repeatedly at scale, and is important for the goals of MLIR: matching one DAG of
operations, and replacing with another. This is an integral part of many
compilers […], as well as a useful abstraction to implement
optimization algorithms for optimization algorithms for IR at multiple levels.

Bolding mine.

So an “textual interpretation” of the founding documents inarguably indicates the value proposition of MLIR is an infrastructure for using/reusing/composing/testing/manipulating patterns. Now one might argue that infrastructure and repertoire are not equivalent/synonymous but undoubtedly, while the value and utility of MLIR grows in direct proportion to the investment in reusable infrastructure, just like any other open source project, it grows in super proportion to the reusable implementations (widely applicable dialects and transformations thereof, i.e., patterns).

This is not discoverable

This is just the status quo and not a fundamental/platonic fact. Consider that passes sans a pass registration system are no more “discoverable”. As you mention, transform dialect exists and again produces/presents a proof by contradiction to your claim: [Transform Dialect - MLIR] already serves as a partial documentation for a subset of patterns exactly because it effectively “registers” those patterns (albeit under a non-C++ native invocation/instantiation mechanism). Thus, were we to prioritize discoverability of patterns (which I believe we should!), all it would take is a pattern registration system. At minimum this register would collect doc strings/summaries, enabling pattern authors to document and pattern users to consult. At maximum, it could enable composing patterns into larger transformations (i.e., passes) arbitrarily.

I could get on board with focusing on passes, though the place where it gets confusing for me is that there are both upstream patterns that aren’t included in any pass (maybe for lack of a coherent grouping?) and upstream passes that mainly seem to group together rewrite patterns for no other reason than that putting them in canonicalize would bloat that pass and make people depend on it in implicit ways for correctness.

I’m thinking of passes like affine-simplify-structures, fold-tensor-subset-ops, scf-for-loop-range-folding, fold-memref-alias-ops.

Is it reasonable to continue to make passes like <dialect>-simplify-<concept> or <dialect>-fold-<some name for a related set of ops>? That doesn’t smell like a good design in the way you described, because its mental model would just be “simplify via whatever patterns happen to be included.” The alternative seems closer to wrapping individual patterns in a pass, which doesn’t strike me as much better than exposing individual patterns.

If that is unreasonable, then it seems a bit restrictive (at least relative to the guidance to not put all generally-useful simplification patterns into canonicalize) to say patterns can only live upstream when they are part of a larger coherent pass.

1 Like

Unfortunately, a lot of the principles behind the design and development of the project around this kind of things is tribal knowledge of the people who spent hours debating this. Some of it in chatrooms (hard to search to find the needle in the haystack), some of it on Discourse (less of a haystack, but not always easy to pinpoint the right discussion), some of it in code review (another haystack), other in MLIR open meetings (recordings are online, but not searchable of course, maybe with a LLM?), and finally a lot of it in face-to-face meetings and whiteboard discussions when a majority of the MLIR team was sitting in the same open-space for the first few years of the project. While we tried to write rationale document on important aspects, it’s not everything that can be documented to this level of details in practice.
So it’s not great, but it’s the unfortunate reality. And I don’t think I have ever worked on a project (OSS or proprietary) where that isn’t the case (MLIR is far from the worst in my experience!).
A large part of the questions on this forum are actually calling to the expertise of people involved with the project for a long time (I tried to answer Clarifying the semantics of LTO post-link optimization recently for example, my knowledge is somehow outdated on the project, but it’s another example where documentation is lacking and tribal knowledge shared on the forum is the only way to figure some things out). Maybe a LLM processing all the history of the code, and all the code-review done from the beginning could help? (even then it would miss on some design docs lost online somewhere, or never shared publicly, as well as live discussion and white boarding sessions).

I don’t follow your reasoning here: canonicalization has some strong consideration which have nothing to do with “bloating” that pass (this blog post is often cited as good intro): it’s rather that not any transformation that is eligible to be a canonicalization in the first place, fundamentally.

Absolutely, that may smell: in general such patterns are borderline for canonicalization, and when adding them in a pass we haven’t done a good enough job (often during code review), to push on having a deeper reflection on the naming.
Just like the 5 whys it sometimes just requires to dig a bit to get to the essence of what we’re trying to achieve and converge to a better description/grouping that provide something more accurate that “simplify” (which is just a vague term in isolation).

Note that folding constant tensors has been split out of canonicalization before of adverse effect: the greedy constant-folding in absence of heuristics is appropriate for “register-like” data (we consider integer to be “free” to materialize) but that’s just not suitable to larger object (tensor constant can hundreds of KB or MB), and we need some more global analysis and cost model to perform constant folding there.

I don’t think that we saw a systemic need for <dialect>-fold-<some name for a related set of ops> passes so far? I should review the 3 or 4 passes we have with “fold” in the name to double check why they exists and if their documentation accurately describe it.

Side note, we’ll soon submit a proposal for a side project in the LLVM umbrella that can help answer some of those questions:

More info at:

In a nutshell, we believe having multiple/many/all? patterns, transforms and passes in MLIR is a good thing, and can be widely tested and discovered via this external project that helps users identify not only what patterns exist but also how they compose with each other, without setting hard dependencies like in the LLVM pass pipeline.

[Edited to add: This does not preclude having tests and discoverability inside the MLIR repo. Thus, a side note.]

We also hope this project will become the starting point of future MLIR based projects, be them tensor compilers or something else (though, we’ll focus on tensor compilers first).

1 Like

I don’t understand the point of this response - you didn’t respond to a single quoted piece of text.

Further, the fact that you didn’t respond to a single piece of quoted text implies

  1. The documents that I quoted, which appear both in the original paper and the current docs/Rationale directory are not demonstrative of the guiding principles/values/goals of MLIR. Here, specifically, with respect to patterns vs. passes. If so, then both references should be updated and everyone should be informed.
  2. That any such clandestine/ephemeral/tribal discussions and decisions, which current contributors weren’t privileged/lucky/randomly privy to participate in, should be authoritative into perpetuity.

These are the only conclusions I can draw as a result of the fact that you didn’t respond to a single piece quoted text. Can you please clarify whether this is the correct understanding of your response.

Absolutely not.

There are many conclusions that can be drawn beyond what you’re doing here.

For example, I just considered that you’ve been taking citation out of context, and trying to infer conclusion about a specific topic here which these sentences were not meant to describe.
Or you’re also quoting entire paragraphs that I can’t see any relevance here. I looked at the three first link you posted and can’t figure a good way to attach any of this to my points.

For example your second link is:

This is the general description of MLIR, so how is a description of the infrastructure an actual rebuttal of my whole position about the granularity and grouping of pattern?

Same for this:

I never wrote this, this is your own made up inference here. Note also that the counter view is not something I would adhere though: that any previous discussion/decision/… about design principles or similar concerns has no bearing and can be freely just ignored by anyone joining the project tomorrow. I believe that the effort put into this over the years is what yield a good and consistent system, and evolving the system outside of these should just be carefully considered.

Then why do they exist as a component of documentation as opposed to history?

Again, I’m kind of at a loss for words for how to respond: my original response bolded the parts of the text that were germaine to the discussion here - I have no inkling for why you didn’t just quote my original response where the emphasis was crystal clear and instead screenshotted those passages anew. I’m inclined to copy-paste again the text with bolding here but it’s literally ~two page lengths up.

That’s what an inference is: a conclusion reached (“made up”) on the basis of evidence and reasoning. The evidence and reasoning was clearly stated: neither in your first responce to my citations nor in this response are you addressing what I’m actually pointing out: that every documented description of MLIR describes rewrite patterns (and flexibility thereof) as the primary object and not passes (which is/was your claim).

That’s fine because no one is proposing any such thing. In fact there’s not actually much being proposed here at all except maybe a further discussion on patterns vs. passes and exposure practices. The only concrete “proposal” I see currently (wrt to the linked PRs) are two incremental, well-reasoned, changes. I believe it is you who is choosing to infer some false-dichotomy/slippery-slope that does not exist.

Sorry I wasn’t clear: I meant your conclusion and inferences are incorrect.

I’m not just not reading any of this here (Note that I authored or coauthored most of these texts), you’ll have to be much more specific. I saw your emphasis, IMO they don’t say anything about the current topic and none of them negates my stated position above.

Okay I’ll copy-paste from 2 page lengths up; please explain to me how one should interpret this from the paper:

Declaration and Validation [Parsimony and Traceability]

Common transformations should be implementable as rewrite rules… Since we aim for unprecedented extensibility and incremental lowering capabilities , this opens numerous avenues for modeling program transformations as rewrite systems .

I read it very plainly as “MLIR is a rewriting system” and “MLIR should be unprecedentedly extensible”. Again, the inference I draw is that patterns should be reusable. I understand it’s not literally in the text but I addressed this as well: it is hard to imagine an OSS project that intends on making the infrastructure reusable but not the produce of that infrastructure.

More here that addresses goals around testing and reuse of patterns:

Unified Graph Rewriting Infrastructure

Declarative pattern rules are preferable to imperative C++ code for a number of
reasons: they are more compact, easier to reason about, can have checkers
written against them, and new tools can be built that inspect and manipulate the
declarative patterns in interesting ways - e.g. applying theorem provers to
them. It will be exciting to see this ecosystem develop as the infrastructure
matures.

Very specifically, in the current state, “new tools [cannot] be built that inspect” any patterns (neither C++ nor DRR, since the tds are not shipped with distros).

Your claims about discovery/documentation of patterns being a blocker were also addressed (and you didn’t respond there either) but let’s first work on establishing the primary text describes what I believe it describes.

Yes, none of this contradicts my points though.
What I read is that MLIR as an infrastructure aims to provide a rewriting system: this is why DRR or PDLL were developed.

Common transformations should be implementable as rewrite rules expressed declaratively, in a machine-analyzable format to reason about properties of the rewrites such as complexity and
completion.

Saying that we aim to write some transformations as rewrite patterns does not mean that each pattern must have a public API that is exposed outside of the enclosed transformation that is implemented by a group of patterns. There is nothing here about the granularity of this all.

Similarly, when we say that the system should be extensible: this is why we have interfaces and other pluggable points. This is why canonicalization patterns can be injected by dialect and mixing dialect together will be canonicalized without overly coupling things. Etc.

You’re bolding “Declarative pattern rules” here, when this sentence is contrasting them to C++ code implementing the same patterns. How does this make a point in this discussion?

This is absolutely not what is written, it’s not any patterns, it is specifically written “manipulate the declarative patterns”. The “declarative” part is key here: the whole point of this section was about DRR/PDL to explain that tooling can be done on such systems in a way that you cannot achieve on C++ blobs. Basically higher-level understanding and analysis of the rewrites.
Again: nothing relevant for the discussion in this thread.

I’m gonna stop iterating here, this derailed this thread enough.

Again, I’m not sure the point of selectively quoting my citations; I quoted two complete sentences that draw a connection between declarative patterns and their desired properties; I demonstrated this connection by bolding several parts of both sentences - here are all the bolded parts again:

can have checkers written against them

manipulate the declarative patterns

It will be exciting to see this ecosystem develop as the infrastructure matures .

Okay let’s suppose this is true - please tell me how I can build a tool downstream that can manipulate in any form or fashion upstream declarative patterns?

I’m gonna stop iterating here, this derailed this thread enough.

I mean that’s fine with me as well but I (and others, evidently) would like some clear consensus on when/how/why patterns (DRR and C++) and populators can be exposed. So how do you propose we resolve this issue conclusively?

Conceptually, is this problem any different for passes? Passes are not used in isolation, either. They are meant to be combined in a pass pipeline.

I share your concern, but see the same problem for passes. The main difference being that there is (in practice) a lower bar for adding patterns.

This was the initial design when I added transform.apply_patterns. I was trying to add some kind of “pattern registry” to the MLIR context. I think we never discussed this idea in-depth and we went for the simpler approach of adding separate “pattern collection” ops. I think it would be worth exploring the pattern registry approach further.

The main benefits that I see:

  • No need for dialect-name.transform.my_patterns ops. Essentially, the transform.apply_patterns op would no longer need a region.
  • A bunch of test passes can be deleted in favor of a single test-patterns="pattern1,pattern2" pass.
  • Auto-generation of pattern documentation from TableGen.
  • Leaner and more consistent header files: we won’t need populate...Patterns functions anymore.

Downsides:

  • Patterns are just one way of sharing code / functionality. Sometimes, it is better to write a manual traversal. Having many patterns (and a pattern registry) could strengthen the notion that patterns are the preferred way to write passes.
4 Likes

I count (+4) + (-1) = +3 :slight_smile:, amongst other things like fixing the bug I linked above and enabling downstream users to slice and dice according to their needs. And of course the effects of the negative that you mention could be addressed with adequate documentation and discussion (in the present tense…).

Right, it’s not different in the absolute: one could try to add exactly one pass for each of the existing patterns, the concern would be the same. But actually I would think that my point here would be even more visible if that happened?
Think about the reverse situation as well: you could also implement almost any pass as a single pattern applied on the same root operation as that pass, but I suspect you wouldn’t be comfortable with reviewing such a PR would you?

So it’s all about granularity in the end: patterns are meant to be as small of a unit as possible, while passes are meant to be coarse-grain. That granularity has non-trivial consequence for what I described earlier in this thread.

I think much of this has been discussed, but indeed the answer is “it depends”. Having patterns that form some part of a coherent pass/transform makes sense. There is a maintenance cost (of course) to patterns and so patterns that are more generally useable, is more sensible to include. E.g., I could have a backend where constant 0 cannot be expressed, so I always lower to some arbitrary constant minus itself a constant 0, but that pattern is just a cost upstream as it benefits only me. Renato pointed out an effort to make for a larger “here is how some opinionated pipelines work” where we can start to clean up and refine.

I would consider the generality (either application or pervasiveness of problem) when considering patterns. And here, to Mehdi’s point, where one has a pass that is part of a codegen flow/shows a particular layer and importance, it is much easier to appreciate and discover (I agree with Matthias and Maks that we can & should improve discoverability, if for no other reason than it avoids duplication already). Perhaps we should even expand it so that patterns can have descriptions too …

Some of these may also be an artifact of our folding being bad (or rather, it is not meant for non-trivial/expensive generic folding) and so folks separated it out so that it could be optionally & explicitly run.

While true, I think the issue of Schrödinger patterns are much easier state to land in than with passes. We have very low visibility into what others are doing or using or impact - OSS projects and their usage pipelines being an exception and something that one could point at :slight_smile: but in general. So it can get to state where there is dead code hampering work. I do think it is based on usability/“goodness” of the patterns rather than some blanket rule (and efforts like project lighthouse intended to make quantifying goodness/use more transparent).

MLIR was written from day 1 to be an open community project, before the first line of code landed this was goal. Mehdi is describing, as he wrote, an unfortunate reality that not everything is explicitly documented or discoverable (even if it were). Some are long discussions that end up with simple programming style recommendation, while the origin was a sequence of lessons. There is balance. Relitigating every past decision or ignoring all are neither good or useful. I believe we do have as goal to document/improve documentation/discoverability if a question surfaces multiple times. Goal being to inform, enable discussion and sometimes yes re-evaluation.

1 Like

There are a lot of good points this thread touches upon. Let me first give my perspective on the original post, and then try to unbundle some of the discussion above.


It’s not a question of having lots of patterns, but that of having them visible separately. This is somewhat similar to why we have dialects and not just a flat, extensible set of ops. Arguably, the transform dialect pushes behind that with what I could call “sub-dialects”, so there may be a need for further structuring than a flat collection of dialects.

Maybe we need something similar for patterns. The approach currently taken by transform.apply_patterns is basically using the op registration mechanism to do pattern group registration. I have a downstream project that has its own patterns exposed as individual apply_patterns “sub-ops” that are enabled/disabled by a tuner, for example. I didn’t want to create yet another registration mechanism…

A flat collection is not desirable because the API surface of the overall system increases, which increases the overall complexity of MLIR as a system. My quick-test for adding “public” APIs is whether I want to also go through the effort of adding a C API and Python bindings for it.


Separately from that, I have repeatedly made the point that the notion of “canonicalization” is ill-suited for MLIR. I would rather speak of multiple “normal forms”, e.g., the most concise or the most constant-propagated code.


There seems to be a common communication idiosyncrasy in MLIR that we should in any case document and try to learn and avoid. In this particular case, I don’t think @mehdi_amini is opposed to having the specific pattern upstream or even exposed. What he is likely asking is a generalization to a class of patterns that may likely exhibit similar properties, even if that particular PR doesn’t add more than one pattern. In this particular case, it could be a set of patterns that would be increasing the number of tensor-typed constants.

The opposite extreme of this would be excessive generalization, that MLIR sometimes suffers from, so it’s an difficult, iterative exercise. In this example again, a clear list of other patterns that would fit into the proposed category would help, even better if accompanied by a commitment to eventually add them.


With the point from in mind, I’d rather us not go deep into the discussion of English semantics (MLIR semantics on the other hand are dearly needed!) and what it means for something to be a rewriting system. Documentation, unfortunately, tends to become outdated. We can, and should, revise it. In doing so, both a historic perspective as to why the document is written in a certain way, and new information since then are valuable.

8 Likes