Authors: { @ftynse, @mehdi_amini, @_sean_silva, @River707 } (order not important)
Context
MLIR has a Standard dialect which dates back to the project inception. Originally, MLIR had two kinds of functions (first-level IR concepts back then): LLVM-style CFG functions with branching control flow and polyhedral-inspired ML functions with explicit affine loops and conditionals. A set of “core” operations were available in both kinds of functions. After the two kinds of functions were unified, the hitherto “core” operations became what is currently known as the Standard dialect, and the affine constructs became the backbone of the Affine dialect. Since the dialects had been always intended as a modularity mechanism, this reorganization allowed us to push for a leaner “core” IR with fewer built-in concepts [Section 2 paragraph 1 in the MLIR paper].
The Standard dialect persisted in the MLIR code base and grew significantly over time. New Ops are being proposed for inclusion on the regular basis [1, 2, 3, 4].
Yet, there are concerns about maintaining a single, monolithic Standard dialect that reappear in many discussions proposing new ops. This leads to a kafkaesque situation where a contributor, often relatively new to the community, proposing a new simple operation is quasi-systematically requested by commenters/reviewers to consider the entirety of the Standard dialect with the goal of either splitting it up to accommodate the op or prove that the dialect still makes sense as a unit, depending on the reviewer’s own inclinations.
libMLIRStandardOps.a
is around 2.8M, which may be unacceptable on, e.g., embedded platforms, especially if only a couple of ops are actually required. A recent RFC asked for moving ReturnOp
to the (tinier) built-in dialect for this reason, and PDL/Interp dialect duplicated it to avoid a dependency.
Proposal
We propose to split the Standard dialect into multiple individual components by progressively factoring out well-scoped groups of operations into new dialects. Each new dialect will be a subject of a separate RFC, which will follow the guidelines for new dialects and in particular define the goal of the dialect and criteria for including existing and new operations. Therefore, we are looking for a consensus on the idea and process of the splitting. While we do provide an example of splitting below, this example is not final and only serves as an illustration. We will not discuss the scope of individual dialects in this proposal.
Our goal is to replace the Standard dialect completely. This will help eliminate implicit expectations of better support and privileged status of one dialect, as well as prevent the associated feature creep in the hitherto standard ops. This will reduce the pressure for other dialects to target or otherwise support hitherto standard ops, which creates a risk of having duplicate work on operations and conversions. This risk can be partially mitigated by ensuring new dialects have few overlapping concerns and provide better documentation on the overall upstream dialect ecosystem. We believe that the modularization benefits in terms of code size and general support effort required (today, virtually every contributor is a stakeholder in Standard, but they may not have a stake in all individual components) outweigh the risks.
We propose to identify prospective dialects by finding groups of operations that are frequently used together (for example, “simple” floating point arithmetic operations such as add
and sub
) as well as abstractions common to a group of operations (for example, the tensor
type or CFG-related control flow).
Discussion
Arguments for splitting
-
lib/StandardOps/Ops.cpp
is the single largest file in the code base (without the ODS-generated parts!), followed byStandardToLLVM.cpp
that needs to handle most of the standard ops. - Lack of contextual connection between operations: it has got standard integer/FP arithmetics, complex arithmetics, trigonometric functions, memref/view construction and casts, tensor construction and casts, DMA, etc. It is not clear that somebody needs all of those together. This has actually actively led to the duplication of operations in several downstream projects that have opted to redefine simple operations (e.g. return/cond_br/br) because the cost of including standard is so high.
- This dialect does not correspond to the guidelines on components (no clear objective), yet it is the most likely source of inspiration for other dialects.
- Simultaneous privileged status conferred by the notion of “standard”, and experimental-level quality because this dialect ended up to be a default choice for ops that don’t belong anywhere else.
- The lack of clear scope of the Standard dialect leads to confusion among users and developers. For example, the issues discussed in this thread could have been avoided had the tensor “component” of the Standard dialect been separate.
Arguments against splitting
- Having many smaller dialects make it hard to navigate the ecosystem.
- This can be solved by technical means that do not rely on having a huge monolithic library, e.g., search in op documentation.
- This is something to address and improve regardless, and splitting the standard dialect can be a forcing function here. Otherwise the pain points of manipulating multiple dialects will exist with scf for example. It is a claim of MLIR that dialects can mix and match seamlessly.
- Ultimately, the size of dialects is a trade-off. Having few huge dialects will lead to the same navigation problem within a dialect as one could have across dialects.
- The contextual connection between ops is that they all operate on standard types.
- This does not hold as a general guideline for including an op to the dialect unless dialect == type system. There exist ops that operate on standard types, e.g. in TensorFlow, that don’t belong to the Standard dialect. Some Standard ops can operate on non-standard types, e.g.
std.constant
withopaque
value and tensor-of-custom type. Thetuple
type is standard, yet we explicitly decided not to have standard ops to operate on values of this type.
- This does not hold as a general guideline for including an op to the dialect unless dialect == type system. There exist ops that operate on standard types, e.g. in TensorFlow, that don’t belong to the Standard dialect. Some Standard ops can operate on non-standard types, e.g.
- Concerns related to increasing build system complexity.
- These are justified and can be partially addressed by maintaining a clean code tree structure and build system compartmentalization.
Miscellaneous concerns
- The privileged status of the Standard dialect allows one to omit the
std.
prefix in the custom syntax. This will not be the case if there are multiple dialects.- MLIR favors mix-of-dialects [Section 6.2 in the MLIR paper]. In this context, not prefixing some operations with their dialects stands out by breaking the common pattern. In practice, handwritten IR that heavily relies on multiple dialects often uses an explicit
std
prefix, e.g. [1]. - The verbosity is not problematic as long as dialect names are short (and pronounceable).
- MLIR favors mix-of-dialects [Section 6.2 in the MLIR paper]. In this context, not prefixing some operations with their dialects stands out by breaking the common pattern. In practice, handwritten IR that heavily relies on multiple dialects often uses an explicit
- Standard dialect serves as the central point of the “hourglass”-shape lowering graph: higher-level dialects funnel into Standard, and lower-level dialects fan out from it. With many dialects, it becomes harder to configure the lowerings.
- This is partly the legacy of Standard being a generalization of LLVM IR and LLVM IR being the main lowering target. Neither of these is true anymore.
- In fact, the Standard dialect being lowerable into other representations is a misconception given the growing amount of Standard ops that need to be expanded within Standard before being suitable for further lowering.
- Better documentation and a cohesive story about how upstream dialects fit together largely mitigates this concern, as is something we should do anyway.
One Possible Splitting
As an example and to support our point about mostly independent groups of operations, we propose one possible splitting of the operations currently in Standard into multiple dialects.
We propose to split the Standard dialect into multiple, smaller dialects according to the groups of semantically-connected operations, listed below, that are likely to get used together.
One of the grouping principles is the common data abstraction (type, or set of types) that the operations operate upon. In particular, we separate out the complex
, memref
and tensor
dialects with the associated operations. Note that the corresponding types remain builtin, i.e. registered in the always-available built-in dialect. This separation practically reinforces two tendencies that have naturally appeared in the ecosystem: (1) the vector
dialect contains most of the operations on vectors and was able to grow fast and gather adoption; (2) the naming scheme for ops in the Standard dialect that increasingly tends to include the abstraction name in the op name: memref_reshape
, tensor_from_elements
, or even the dual ops with type names as disambiguation mechanism: subview
/subtensor
, load
/tensor_load
. For the latter, having dedicated dialects would help reduce ambiguity and verbosity, e.g. memref.reshape
, tensor.from_elements
, memref.subview
, tensor.subview
. Specifically, the splitting can look as follows.
Control flow - cf
dialect - br
, cond_br
, return
, call
, call_indirect
; also move func
in here.
Integer arithmetic - int
dialect - addi
, cmpi
, subi
, muli
, divi
/remi
signed and unsigned, and
, or
, xor
, shift_*
sexti
, zexti
, trunci
, index_cast
; also remove the trailing i
.
Basic float arithmetic- float
dialect - addf
, cmpf
, subf
, mulf
, divf
, remf
, absf
, copysign
, negf
, ceilf
, floorf
, fpext
, fptrunc
, fptos/ui
, u/sitofp
; also remove the trailing f
.
Trigo/math special- math
dialect - ceildivi
, floordivi
; also add the mod
equivalent, cos
, sin
, atan
, tanh
, exp
, exp2
, log
, log10
, log2
, rsqrt
, sqrt
. These can have expansions into basic integer and float arithmetic as a conversion.
Complex numbers - complex
dialect - addcf
, subcf
, create_complex
, im
, re.
MemRef operations - memref
dialect - load
, store
, prefetch
, atomic_rmw
, generic_atomic_rmw
, atomic_yield
, global_memref.
Tensor Operations - tensor
dialect - tensor_cast
, extract_element
, tensor_from_elements
, subtensor / subtensor_insert
, dynamic_tensor_from_elements
, tensor_load
, tensor_store
.
Needs further discussion - yield
, constant
, select
, dim
, rank
, splat
- we may choose to duplicate some things and add a common interface/trait, have a “utility” dialect, etc.
We are looking for consensus on the idea and process of splitting, NOT on the specific example proposed above.