[Standard] Add `atan2` to the standard dialect

Like atan, we would like to add atan2 to the standard dialect. Being able to express this in std would allow us to lower from lmho and materialize our approximation in the std dialect. We do not plan to add a standard lowering for atan2 but rather want to allow everyone to apply their own approximating transformation or lower to intrinsics where available.

We will likely want to add more trigonometric operations in the near future. The philosophy would be the same for all of these.

How are these operations being defined? Are the numerics and transformations specified well?

I’m quite concerned about the ops explosion in the standard dialect: this is already causing us issue for mobile deployment of MLIR where we want only a few ops from standard but we bring ~5MB for it.

Is there a broad category of “maths” / “arithmetic” ops which could be split in their own dialect?

1 Like

@clattner, I am nor sure if I understand the question correctly.

An approximation for atan is, e.g., here and on that basis atan2 can be derived like here. These are just approximations though and there are surely many different ways of realising trigonometrics.

The aim here is not to give a standard definition or lowering but rather a representation in the standard dialect that can be transformed as needed. If desired, we could add a set of default approximations in an std to std pass.

CL: https://reviews.llvm.org/D88168

+1 to making a “math” dialect and move a bunch of the std ops to there.

3 Likes

Is it possible to split the Standard dialect ops into different libraries while still having them in the standard dialect? For eg. libStdMathOps, libStdCoreOps, libStdMemOps, … and similarly the td/header files as well. This way you could always elide the dialect prefix for these ops, and one also gets the reduction in build times and library size. I understand no other dialect splits it this way, but the Std dialect also has the most ops among those in tree.

I thought those that aren’t used don’t get linked in. Are the MLIR libraries linked dynamically and thus copied over?

I don’t understand either how this would work, or what is the advantage over splitting it in different dialects?

The unit of granularity in MLIR is very centered around dialects right now. You get all of a dialect as a unit: its ops, the canonicalization patterns, the interfaces, the printer/parsers, etc.

But you can depend on Ops but not depend on its Analyses, Transforms / Passes. I didn’t understand what you meant by “You get all of a dialect”. For eg. you can depend on MLIRAffineOps without depending on MLIRAffineTransforms, likewise LmhloDialect vs LmhloPasses. Similarly, users may want to register the dialect and be aware of the ops without needing anything from the transforms passes.

I’m not sure either whether this could work, but I see several advantages. You can elide the dialect prefix FWIW, and avoid getting into cyclic dependences while implementing canonicalization/rewrites (all of these ops work on the same type system / standard types and thus might use each other) and thus avoid being forced to move things from *Ops into *Analyses/*Transforms. Eg: what if you needed add/mul for some canonicalization of an AllocOp or another one to compute some sizes? Would you move add/mul into the math dialect? That would create a cyclic dep because the math dialect would have to depend on what’s left in the std or whatever one’d refactor it into. If you keep add/mul where it is, then you’d have logical partitioning issues?

I don’t consider MLIRAffineTransforms to be “part of the Dialect”.
I’m referring to the Dialect as in the C++ class and all of its dependency: this is already too big.

+1 for splitting the std dialect into more cohesive units.

Until then std would still be the only reasonable place to add trigonometrics. These ops are super simple so I can’t imagine they impact the binary size noticeably.

Longer term, we could also think about supporting dialect extensions, so that we could still have a single dialect (like standard, gpu, SPIRV) but with sets of operations that can be optionally enabled. This would enable sharing a dialect prefix.

For now, +1 to adding these to standard, as the ops are fairly small and the lowering of them (which I suspect is the majority of size added) is optionally linked (as they lower to approximations or library functions).

1 Like

+1 to splitting the std dialect. I don’t think it’s anyhow more special than other in-tree dialects. Math and memory ops are good candidates to be separate dialects. Potentially with multiple levels of math, but it’s hard to draw the line. addi sounds like a really simple “core” math operation and atan2 sounds much more specific, but not as much as std::cyl_bessel_jf from C++. Is shift_right_unsigned “core math” enough to remain in “std”?

At the same time, discussing the splitting here partially derails the discussion about adding atan2. If there are other trigonometric ops in std, I don’t see a reason not to add this one while we keep discussing the potential splitting.

Personally, I am not really fond of the prefix elision special case, especially if it used as a motivation to keep otherwise disparate ops together. I recall @clattner’s argument that nobody wants to read “std.load” instead of “load” when the dialect splitting was discussed more than a year ago. Today, with “affine.load”, “llvm.load” and likely a handful of other out-of-tree dialects with custom “loads”, I’d actually prefer to see an explicit std. prefix. This sounds like a matter of the dialect mix one is most often exposed to. If the majority of ops are in std, skipping the prefix sounds like an improvement. If there is a deep mix of linalg, scf, affine, std and gpu, not having a prefix on some operations actually breaks the reading flow. If textual IR verbosity is an issue, we can find a mechanism to address that in pretty-printing for all dialects.

-side note-
May be a solution is to have an “int” dialect and a “float” dialect (neither is a reserved name), so we get reasonably readable int.add and float.add and similar multi-level naming schemes like int.unsigned.div and int.unsigned.shift.right. Memory operations, which are actually memref operations can just as well use mem prefix if we can’t make them use memref. What remains are in std after these are predominantly control flow-related operations: atomics, branches, calls, DMA and utilities like constant. Constants and calls sound “core enough” to me to remain in “std”, in which we can also pull ModuleOp and FuncOp that are currently placed in a half-hidden “built-in” dialect.
-end of side note-

I have strong concerns in general with the trend of adding things and making it someone else’s problem to clean it up later. In particular this is a long-lasting problem, and usually preventing making it worse is the only forcing function to make this kind of cleanup happening.

So IMO: the balance between not preventing forward progress and still not accumulating technical debt is to have a strong commitment from the folks that need to take on the tech debt to enable progress to also drive the long term cleanup.

2 Likes

FWIW, I’m not necessarily a big fan of splitting out the libraries. I see some advantages but also some disadvantages: it makes most pieces of IR more verbose, it increases the dependencies that need to be managed, for which there is already a large amount of not-quite-boilerplate necessary to get things to work (between the build system(s), dialect registration, dependent dialects, namespaces, etc.

@mehdi_amini If a truly minimal set of operations is necessary, perhaps you need a specialized deployment dialect that contains only those operations that you want for deployment?

Steve

1 Like

I am a big -1 on such splitting based on types like int / float. We already have the type printed right on the op - both generic and custom, and moreover a suffix or in the op name as well. If it’s ... = addi ... : i32, if it’s not on int, what else would it be on?! The int./float. is only adding redundancy here. I think a more serious downside is that you would be separating ops that should logically be together (due to their similar imperative actions) into separate dialects.

I would be fascinated to see the numbers about this (such as binary size before/after your patch). I don’t have any intuition for how much binary size an op takes!

I did a pass through std, and here is my rough breakdown, with some vague recommendations off the top of my head (please critique!). I think if we can get consensus on the “math” dialect at least then there would be no objections to adding atan2 there?

Control flow (recommendation: stay in std):

  • br, cond_br, return
  • call, call_indirect

Int arithmetic (recommendation: stay in std):

  • addi, cmpi, subi, muli,
  • divi_signed, remi_signed, divi_unsigned, remi_unsigned
  • and, or, xor,
  • shift_left, shift_right_signed, shift_right_unsigned,
  • sexti, zexti, trunci,
  • index_cast,

Basic float arithmetic (~ portable basic ISA instrs) (recommendation: stay in std):

  • addf, cmpf, subf, mulf
  • divf, remf
  • absf, copysign, negf,
  • ceilf, floorf,
  • fpext, fptrunc,
  • fptosi, fptoui, sitofp, uitofp,

Float math special (recommendation: move to “math” dialect):

  • cos, sin, atan, tanh,
  • exp, exp2, log, log10, log2,
  • rsqrt, sqrt,

Complex (recommendation: move to “complex” dialect):

  • addcf, subcf
  • create_complex, im, re

Memref ops (recommendation: move to “memref” dialect):

  • operations that operate on memref data
    • load, store, prefetch
    • atomic_rmw, generic_atomic_rmw, atomic_yield
  • operations that operate on memref descriptors
    • alloc, alloca, dealloc
    • view, subview, assume_alignment, memref_cast

Tensor ops (recommendation: move to TCP when that lands upstream; for now keep in std):

  • tensor_cast
  • extract_element
  • tensor_from_elements
  • dynamic_tensor_from_elements, yield

Tensor/memref bridging ops (recommendation: move to “memref” dialect):

  • tensor_load, tensor_store
    • Note: tensor_store not suitable for cast materialization with conversion framework. Probably needs to be redesigned.

Ambiguous categorization (recommendation: case-by-case basis):

  • constant
    • Note: can be used for multiple different types (int, float, vector, tensor).
  • select
    • Note: can be used for any type (canonicalization creates them from arbitrarily-typed bbargs).
  • dim, rank
    • Note: work on tensors and memrefs
  • splat
    • Note: works on vectors and tensors

System (recommendation: move to “system” dialect; consider adding a generic “print” there too):

  • assert
    • Note: can abort the program. It is the only op in std that has an “external effect”.

I think the more important question if we split is the axis. What is the taxonomy and the rules, rather than specific instances. E.g., log2 is no more math than divf, so if we have a math category, then excluding some math ops but not others seems odd (now a trig, algebraic, transcendental, or some such as categories would be different). Why have some math in std while others in math, why not have math.int and math.float? (Not saying we should that seems bad IMHO, but is a taxonomy which is easy to follow).

I think the proposal of no standard at all, is as ‘standard’ creates an expectation (someone also asked “is builtin dialect more standard than standard?” which I think highlights confusion). Doesn’t mean it has to be finely split (definitely boilerplate is a valid concern), it/dialects should have a defining criteria for what is and isn’t there as much as practically possible. Now I can’t even categorize my own emails to this extend (labels for the win, but most don’t end in one label), but we could get something good enough.

Why ambiguous? If you have a addi then without this, you can’t increment by 1 without this. So what would we save if we moved this elsewhere? (seems more std than divf :slight_smile:) . Or are you proposing constanti, constantf, constantt and so splitting on the type?

So shape dialect as these are related to shape queries.

I think this also shows the problem: what are we trying to achieve with the breakdown of std? More logically consistent components? (What if we could break std into “sub” dialects but then be able to load std dialect and get all of them to avoid boilerplate). Removal of seemingly privileged dialect? (Let’s rename it to Stuff). Or for reduction in size? (In which case, are we attempting to create some dependency diagram of use and do divisions based on often co-occurence? Or just expose generally a way to load parts of a dialect?).

I think with a goal set and a rationale for the why, the what could become easier.

2 Likes