[RFC] arithmetic vs llvm dialect

I haven’t fleshed out a concrete proposal yet, but wanted to prefetch opinions here.

Historically we had our “standard dialect” and the LLVM dialect entirely distinct: the LLVM dialect operated on specific LLVM types. Over time things have been evolving and the LLVM dialect now reuses the builtin MLIR types when possible. The arithmetic dialect was extracted from the “standard dialect” and got some new features like “fast math”.
The arithmetic dialect still diverges from the LLVM dialect in that it handles elements wise operations on tensors for example (but we considered this not desirable right now! It was considered for removal for a while), but also the index type. On the other hand it is a bit under-defined from a semantics point of view.

So I’m wondering: does it still pull its weight? Is there a possible unification between LLVM arithmetic operations and the arith dialect? Some way forward:

  • delete the arithmetic dialect, folks can use the llvm operations: they are well defined and provide everything we need, what’s the downside? Maybe one question is how to handle the index type (one way would be to allow it there but require a legalization before translation?).
  • prune the LLVM dialect from its arithmetic operations, and make the arith dialect the actual subset that can translate directly to LLVM IR. That would position it like nvvm for example: a dialect that is part of the “LLVM API surface”, designed to mix with the llvm dialect and translate directly to LLVM IR. It would still be usable independently.

@KFAF also presented the idea of a “base2” dialect that could generalize all integer computation, but it seems complementary to what I’m describing I think?

I have no opinion about the fate of the arith dialect, as you say, it has several known-problematic aspects of its design. I think that deleting it outright could be too challenging for the MLIR community (there are many users of it AFAIK) but phasing in a replacement and then removing it over time would be great IMO.

I’d be very concerned about your second bullet: removing things from the LLVM dialect. The LLVM dialect is an interface dialect to the rest of the LLVM ecosystem, and its design is constrained by that outer world. It isn’t general goodness operations for math, and shouldn’t (IMO) be used that way. If we want such a thing, it should be in a “base 2” or “math” or “arith” or “comb” whatever dialect, these are conceptually different things.

-Chris

It does seem that there could be duplication here… Personally, if one of them were to go away, I’d probably emphasize the arith dialect, rather than the llvm arithmetic ops. The advantage is that this still maintains an ‘arms-length’ approach from llvm, enabling any mismatches to be handled in lowering. (In effect, this would move handling of any possible mismatches from the arith → llvm lowering).

Maybe some arguments towards keeping the current structure:

  1. does not require the best MLIR representation to match the llvm representation. They can evolve separately.
  2. perhaps provides a cleaner interface to non-llvm backends
  3. Allows more MLIR control about how arithmetic is lowered into LLVM (perhaps leveraging intrinsics or not)
1 Like

Just to clarify, it definitely makes sense to enable other non-llvm code generators :-), but the LLVM dialect is a border dialect, not an abstraction over code generators. Such a dialect could exist of course (this is what arith sortof tries to do), but it wouldn’t negate the need to have a very strong LLVM border dialect. There is a good recent talk describing the need to keep these clearly distinct at EuroLLVM.

-Chris

+1 - I have no specific love or quarrel with the arith dialect, but like many things from the first draft, I expect it is somewhat baked in at this point. I’d be open to a replacement, especially considering the work on the index dialect being a nice rework of that portion and not carrying over legacy features that seemed cool at the time (like mapping over tensors). I’d also be supportive of just seeing about removing those legacy features and continuing to bring it forward in-situ.

Yeah, agreed. This seems like a lowering flow. We also have ArithToSPIRV in tree, where for this usage, SPIRV is similar to the llvm dialect in terms of being an exit to the outer world.

Happy to help marshal improvements to the situation if we see them. It’d be nice to see some of the longstanding TODOs taken on at some point.

Just a quick note: we are currently using arith to represent arithmetical vector operations, including multi-dimensional ones that are not supported in LLVM.

3 Likes

To recall some of the discussions we had about this:

  • The index dialect addresses the semantics issues, especially with respect to folding for unknown index bit width, for arithmetic with the index type. Personally, I believe that !index is properly handled now (although I would like the index dialect to either default to or opt-in some “overflow is UB” clause).

  • The arith dialect is an important front-end dialect for users that don’t care to elaborate semantics. Removing it would not only break a whole bunch of lowering paths, but it would also force them to become more complicated. I remember @ftynse saying this, and that we should probably keep a weakly-defined arithmetic interface around.

  • The llvm dialect has strong semantics, but it does not have folding. With the addition of poison to MLIR, this problem becomes more apparent. It’s sometimes not convenient or even sufficient to leave all aspects of expression rewriting to LLVM.

  • The base2 dialect aims to provide strong semantics for all numeral types z * 2^E, and the ability to interchange them. I also think that this is a special case which is orthogonal to proper integer arithmetic. It was just easier to implement by exploding arith into bit and cyclic (and index) first.

  • Even if bit (bitwise operations w/o interpretation) and cyclic (cyclic group arithmetic) were to be split from arith, removing all integer operations from it, what happens to built-in floats? I don’t think there is a scenario where we can clear out arith entirely.

I would welcome arith becoming a more agnostic, but weakly defined interface. This would preserve the status quo for code generators that don’t use LLVM, or want to replace some parts of that lowering.

I am strongly in favor of removing or sidelining signless integers in that interface, however. They aren’t numbers, and just because usual targets use cyclic group arithmetic I don’t think a front-end should be required to give up perfectly good signedness information at this level.

I understand the value in this, but the knife has been looming over that feature in particular for a long time. I also think it should be kept, which is also why my prototypes always supported it. To give it some extra focus, here are some arguments I heard:

  • What is a product of two tensors even?

    It’s true that outer and inner product are already terms with concrete meaning for tensors, with the latter also requiring additional disambiguation. However, A Hadamard product is also well-defined term. I don’t see an issue with element-wise operations here, especially w.r.t. preserving folding. I think the fact that these are not tensor operations is enough of a distinction.

  • You’ll lower element-wise to linalg.generic anyways, so what’s the point?

    As the above quote shows, they are handy to provide new lowering paths that want to handle these cases separately. Additionally, this breaks perfectly usable folding we can get right now. However, linalg with fusion and such is likely a better starting point for vectorization.

  • It adds confusion with real vector operations.

    Although admittedly having two ways of expressing a “real” vector operation is weird, I think true vector operations can be considered back-end enough to justify having a higher API. Otherwise, the same “LLVM already has it” argument applies for both of them.

Exactly. The LLVM and SPIRV dialects are lowering dialects that connect the semantics of MLIR dialects with their lowering back-ends. They are not arithmetic/math representations, nor are they required to agree with each other on implementation semantics.

Remember that arith is used not only for scalars, but vectors and inside linalg.generic regions. If we mandate LLVM operations inside a linalg.generic, then we’ll have trouble lowering to SPIRV (and vice-versa).

While we could create a whole new dialect with operations on vectors, the inside of a generic still needs scalar ops. Moreover, things like offset calculation after tiling and fusing (for the extract_slice) needs scalar index calculation, which again if lowered too soon to LLVM may not match representation to SPIRV or other lowering back-ends.

In the end, I think we’ll always need a scalar/vector arithmetic dialect. I’d strongly recommend refactoring the current one instead of trying to delete it, to then realize we can’t, and bring it back from the dead later.

Agreed. But they need to be “complete” and “public” of some sort. Both LLVM and SPIRV have enough targets upstream that anyone can use them with the upstream code. I would not add a dialect to the upstream LLVM tree that only lowers to a downstream back-end.

If we think of them as “back-ends”, then they should have similar requirements than today’s LLVM back-end inclusion policy.

Historically, this has been the major argument for, first, creating the LLVM dialect with most operations essentially duplicating the standard dialect and, later, maintaining the two representations. I don’t think there has been much divergence, given that the conversion from Arith to LLVM remains one-to-one for the majority of the ops when operating on scalars - https://github.com/llvm/llvm-project/blob/main/mlir/lib/Conversion/ArithToLLVM/ArithToLLVM.cpp#L33-L104.

Another aspect that I feel is important, but practically not really enforced, is that the translation from the LLVM dialect (and friends like NVVM) remains simple. Ideally, the LLVM dialect corresponds to LLVM IR and the translation is trivial. Extending the expressiveness of the LLVM dialect will make the translation less simple, in particular it would have to deal with type conversion and op expansion for nD vectors that is currently handled by the arith-to-llvm conversion. I would rather focus our efforts on one conversion infrastructure, which we already have, then build a second infrastructure for the LLVM IR translation.

We could have an “extended LLVM” dialect and a “translatable LLVM” dialect with things like nD vector expansion happening as a sort of intra-dialect conversion, but it really starts looking like bringing back the standard dialect but calling it “extended LLVM”. Or, if limited to arithmetic operations, just having Arith without dialect separation.

In general, I would suggest to look at costs we will have in the long run with various outcomes. Currently, there is a cost of (partial) duplication between Arith and LLVM dialects. If we remove one of the two, what additional costs will we have to pay? Lack of clear scoping for the LLVM dialect since it no longers exactly matches LLVM IR? Having LLVM IR changes propagated to most MLIR users with no opportunity to “fix” the design or having the LLVM dialect diverge from LLVM IR?

Intuitively, I tend to agree with the sentiment upthread that we can clean up the arith dialect (get rid of automatic extension to tensors), potentially share the fastmath flags, make UB handling more consistent, etc., but without completely removing it. I feel like I’d rather pay the cost of duplication, which I understand rather well, than the unknown costs resulting from the unification.

I think you are confusing vector and tensor types. There seems to be a strong consensus on disallowing Arith operations on tensors, but keeping it on vectors. There is hardware that operates on 2d vectors, for example, and it is poorly supported by LLVM IR.

3 Likes

My answer does indeed not distinguish between the two because that was the assumption I operate under. Vectors are to be supported.

However, I do argue that there is value in having elementwise tensor operations for the sake of folding. I don’t know if we want to extend linalg in a way where it folds what you’d get out of a convert-elementwise-to-linalg.

I’m -1 on adding any more automatic tensor folding with the current dialects and mechanisms (and would prefer that what is there be removed/made optional). It’s fine to have reference passes that do some of this as needed for folks who don’t need to index on “best” and can live with “ok”.

The constant folding infra in mlir is really only suitable for neutral folding of scalars as a local decision. True folding of large things like tensors requires a non local heuristic/cost model, and it needs to be highly efficient (both in terms of memory and compute). The current mechanics don’t satisfy any of those constraints, making them unsuitable for real use cases we deal with daily. The cost of getting this wrong was always high but in the current era of billions/trillions of parameters is devastating.

Some of the built-in folding patterns in linalg, for example, account for 99% plus of total compilation time and I have seen (if they are used) them cause compilation times in excess of 6 hours that exhausts all but the largest machine memory. Replacing them with a global transform that is more selective and precise in its mechanism can reduce that to seconds, also optimize dags that are not cost model neutral and have a much lower memory footprint.

In our downstream, we “fold” dags of tensor expressions by collecting them all at once into a module, recursively invoking the (CPU) compiler, and then evaluating them all with a level of parallelism that corresponds to the memory budget we have. And still, it is the most costly thing we do at compile time for some of these models.

Sorry for the mini rant :slight_smile: I’ve just been stuck in this salt mine for a while and need ad hoc large tensor foldings in mlir to go away :slight_smile:

Thanks for that! I figured that this was the major problem, but I’ve never heard it said anywhere. I don’t do ML workloads, usually.

I mostly agree, although I’d chalk it up to how attributes are handled and that being a fundamental flaw. I’ve said this before about interpreters. I think Attributes should rule all compile-time constants, but the ones we have really aren’t good enough for that memory and performance wise. Manipulating weights in this way is completely unnecessary suicide.

But, from a non-ML perspective, I do want this most of the time, because my tensors are small.

Folding of large constants could also be degraded to (fancy) pattern rewriting, although I don’t think pattern benefits can be runtime variant. In any case, I’d be happy with it existing as a pass. Except that there is probably no reasonable architecture for that pass that would accomodate everyone’s needs.

In general, removing folding and canonicalization behaviors will make shared analyses more difficult. At least, that’s what I believe. I don’t think it will matter in the tensor case though, I don’t know any upstream analyses making use of this.

Strong +1, from the perspective of lowering to SPIR-V, I would be really concerned if we tied the semantics and their evolution to LLVM. I had an discusion about this issue last year: [RFC] Define precise arith semantics and I thought we aggreed on this.

Since then, the arith opset has already diverged from llvm a bit. We have some ops that are not present in llvm like addui_extended, mului_extended, mulsi_extended.

As for the second alternative, i.e., pruning the llvm dialect, I don’t have a strong opinion or stake in that.

Is that a bug or a feature :wink:? I’ve counted 49 ops, which seems like a manageable amount of work if we wanted to make it better defined. The poison semantics implementation by @Hardcode84 has just enabled us to specify cases that previousely fell outside of the spec but were not outright UB. Perhaps is a very good time to go over the opset and iterate on the semantics?

Do you have a link? I must have missed this and don’t see it on the list of ODM recordings on youtube.

That was at the MLIR hackathon. It was recorded, but I’ve never seen the recordings anywhere :frowning: There’s also a paper on it, to which there is also a recorded presentation… but god knows where. There are links to the repos in the paper.

However, with the new upstreamed poison semantics, and the codebase being pre operation properties anyways, the code is quite out of date. Even before that paper, I had to become focused on delivering an artifact for project goals, so I never got around to fixing that. Will happen soon ™ though.

I went through many iterations so I have more implementations and ideas around, if you think any of that’d help.

Edit: Oh, sorry. If you’d like to hear what I did with bit and cyclic, and why, I’d gladly make a short presentation we could use in an ODM discussion. I don’t believe the base2 part to be relevant, only the “arith consequences”, but I can send anyone interested slides for sure.