[RFC] `index` dialect

The index Dialect

Overview

This RFC proposes the addition of an index dialect to MLIR.

The Index dialect contains operations for manipulating index values. The index type models target-specific values of pointer width, like intptr_t. The operations in this dialect operate exclusively on scalar index types. The dialect and its operations treat the index type as signless and contains signed and unsigned versions of certain operations where the distinction is meaningful. In particular, the operations and transformations are careful to be aware of the target-independent-ness of the index type, such as when folding.

This dialect contains a subset of the operations in the arith dialect, including binary arithmetic and comparison operations but excluding bitwise operations, with a few notable differences.

  1. casts and castu are used to convert between index type and builtin integer types
  2. index.bool.constant is used to materialize i1 constants that result from folding index.cmp
  3. @stellaraccident also proposed that the dialect include an index.sizeof, which returns the size of the index type for the current target

Most importantly, operations are only folded when the results would be the same on 32-bit and 64-bit targets. In short, operations are only folded when

trunc(f(a, b)) = f(trunc(a), trunc(b))

Some ops, like add and mul, satisfy this property for all values of a and b. They can always be folded. Other ops are checked on a case-by-case basis. When materializing target-specific code, constants just need to be truncated as appropriate.

Motivation

The standard way of manipulating index types in MLIR is with the arith dialect. This is deficient for a variety of reasons

  1. The index type has key differences from fixed-width integer and floating point types which are not handled by the arith dialect. The most important of these is that index is folded with 64-bit arithmetic, which can result in miscompiles on 32-bit targets for ops like divu.
  2. The arith dialect brings in a bunch of potentially unnecessary dependencies, like operations on multidimensional vectors and tensors.
  3. index_cast always treats index as signed when extending (trivial to remedy but I thought I’d mention it anyways)

It’s not clear that issue 1 should be solved by casing each op folder on whether the operand types are index. Given issue 2 (why do I need so many dependencies just to add index values?), the best solution felt like introducing a new dialect with well-defined goals and semantics.

Full Op List

  • add
  • sub
  • mul
  • divs
  • divu
  • ceildivs
  • ceildivu
  • floordivs (floordivu = divu)
  • rems
  • remu
  • maxs
  • maxu
  • casts
  • castu
  • cmp with eq, ne, slt, sle, sgt, sge, ult, ule, ugt, uge
  • constant
  • bool.constant
  • sizeof

ceil/floordivs/u were included for historical reasons (affine), but they are much heavier than the other ops in the dialect and @clattner is in favour of not including them.

Any other ideas? :slight_smile:

Why index.bool.constant?

The dialect needs to materialize i1 constants for folding index.cmp. Given that one of the main goals of the dialect is to provide a low-dependency dialect for manipulating index types, using arith.constant was not an option. Constant ops all have the same semantics, more or less, so from the perspective of op folding, it doesn’t matter whether an i1 constant comes from arith.constant or index.bool.constant.

MLIR Dependencies

Just MLIRIR and some interfaces.

Not Part of the RFC

What specifically to do with arith dialect. The inclusion of this dialect does not need to prescribe a particular fate to the arith dialect (like removing index types from its operations), but that can be discussed as part of the RFC.

Current Status

The dialect exists with lowerings to LLVM.

1 Like

Regarding ceildiv/floordiv, supposing they’re not included in the new dialect then what’s the proposal for handling the cases where they’re currently generated? That is, if we move the analogous ops out of the arith dialect and into the affine dialect, then would the affine dialect need/want to have separate ops for the index vs integer variants?

I’m sympathetic to @clattner’s desire to move ceildiv/floordiv out of the arith/index dialects. But they are mathematical operations that arise quite often; and because they can be tricky to get right, ironically the fact that they’re expensive is all the more reason to make sure they’re readily available (even if shunted off to some “expensive_arith” dialect or what have you).

1 Like

We do have the math dialect which contains specific “subroutine” type math ops.

1 Like

I’d forgotten about math :slight_smile: That looks like an excellent home for the ceildiv/floordiv uses I had in mind.

1 Like

Just to call it out (but not necessarily endorse it), it does seem like a viable alternative is to “fix” the arith dialect:

  1. Special case arith folders on whether the type is fixed width.
  2. Remove the tensor/vector support from the arith ops (you say “unnecessary dependencies, like…”: are there others?).
  3. Fix index_cast.

As has been discussed multiple times for #2, I think the inclusion of tensor as a legal type for arith is a bug. I think this same reasoning may apply to vector but I have not done the analysis.

Even if we agreed to “fix” arith in this way, I remember it was a split decision back in the std dialect splitting days as to whether we wanted a dedicated index dialect, and we’ve had other cases crop up. I would buy that we just decided this wrong then and should correct it. The unspecified length nature of index comes up in a lot of cases and keeping it it’s own thing can help isolate that without a lot of sketchy predicates and such (I think).

I think it would be good to enunciate a recommended fate of index ops in the arith dialect (i.e. that we should remove that support but not prescribe when/how this is done). People will ask “future us” about this a year from now so might as well write it down.

+1. While this RFC should avoid getting bogged down with discussions of the when/how, I do think we should decide on an “official recommendation” for subsequent RFCs to follow up on. At the very least, having such a recommendation in mind should help to focus the discussion about whether adding the index dialect is the right move and about what exactly the dialect should look like.

+1 I was trying to word it in a way to suggest that this RFC isn’t to decided on what to do, but what could be done is within the scope of discussion.

This is the main reason why it makes sense to have a separate dialect. The differences between index and the fixed-width integer types could be considered “management enough” that handling them can be munged into the arith dialect, but I don’t think this is a good direction.

I’m in favour of doing this.

Are these ops only useful for fixed-width types? It would not make sense to make math work on index types.

+1

+1, I agree that we as a project should have a plan here. We shouldn’t just “take a dialect” and assume someone else decides what is right overall for the project later. We should be more deliberate about that.

I also don’t have a well formed opinion on vector types in arith. The harm is so much lower than the harm of tensors that I wouldn’t advocate for removing them. The only challenging question that comes up is what to do with comparisons: should you make compare of two vectors return a bool or a vector of bools, or a vector of integer values that have the sext of the bool value?

In any case, I’d recommend deferring that to be another discussion.

This is actually the core observation that (I believe) argues for the index dialect to be split out. It is materially different in nature from the rest of the arith dialect in a bunch of ways, including that it doesn’t allow you to use getIntOrFloatBitWidth() to reason about the behavior of conversions etc. Splitting the “unknown width” stuff out to its own world seems more principled and allows putting nicer accessors on arith. It also clarifies the behavior of a number of corner cases like arith.constant.

There is also another layering bonus in that SCF depends on “index” but not on arith, which is nice given that arith is a very large dependency.

-Chris

I tend to agree with this interpretation. Just wanted to make sure we called out the alternative (and others might have a different read).

This comment is actually very timely! The vector masking proposal that we shared recently includes a new Vector Predication dialect that will provide Arith-like operations with native support for vector masking and dynamic vector lengths. This is in line with the Vector Predication proposal that landed in LLVM with the same motivation. I think the introduction of the Vector Predication dialect would be a good time to remove the vector support from Arith. Other than masking and dynamic vector lengths, which are enough reasons by themselves, I wouldn’t say there is a strong reason to support the split for vectors, IMO. I think LLVM IR has been doing relatively well on that front but I might be missing something.

Having said that, I’m curious that nobody is worried about the level of duplication that this will bring. We would end up with multiple copies of almost identical operations in different dialects. This duplication would have a cost at quite a few levels: maintenance, compiler build time, compile time (dialect conversion) and code duplication (rewrite patterns). Where do you think the limit is? Should we follow the same approach for other vectorizable operations in, let’s say, the Math dialect?

1 Like

Having vector.add (multidimensional vectors), tensor.add, index.add, and arith.add (scalar + 1D vectors) is not what I would consider to be “duplication”. These are seemingly the “same” operation but they operate on types with very different semantics, and consequently the codegen for these ops can vary wildly.

While I agree with tensor being special and this RFC is convincing me about index, I’m not yet convinced on vector. Let’s discuss more.

(In general, I think we should use this RFC as a forcing function to arrive at an opinion about path forward for these things since in their current conception, they are commingled)

I don’t have direct need for this at the moment [1], but would it make sense to have a dedicated op indexing into buffers (think gep/LEA)?

%addr = index.address %base, %a x stride

That should be easier to look through than plain add + mul and avoid issues with overflows.


  1. And it’s always easier to ask than to do the work… ↩︎

There are potentially other “indexing” operations that could be interesting here (the op @cbate added comes to mind). Good question if those are target of this dialect too.

Agreed. Here is the thread. We were somewhat hunting for an index dialect when figuring out where to land them: [RFC] [Tensor] Extracting slices from `tensor.collapse_shape` - #8 by nicolasvasilache

Sorry, I’m not sure I follow why scalars and 1-D vectors should be part of arith and n-D vectors should be part of vector. Could you please elaborate? What about 0-D vectors?

I don’t think an operation with consistent and well defined semantics for different types and different lowerings or codegen transformations is a strong enough reason to introduce independent operations per type. The semantics of an operation shouldn’t be defined by its lowering but by what that operation represents at that specific level of abstraction. Based on that, at this point, I do not see any fundamental semantic differences among scalars, 1-D and n-D vectors when it comes to arithmetic operations. As I said, the introduction of masks and dynamic vector lengths on arithmetic operations will change that in the future.

I’m totally supportive of reducing dependencies between dialects but I think we should also take into account the cost of introducing a new operation. My concern about duplication is not only about the operation itself but about what comes with each new operation. I assume we want all the folding/canonicalization bells and whistles to be working for all the arith.*, index.*, vector.* arithmetic operations. What do we plan to about that? I would be worried if the answer to this question is to introduce an interface for every kind of arithmetic operation.

Just my two cents :slight_smile:

strong +1 for the dialect and motivation, this is a really nasty footgun, thanks for improving this!

There was also this prior discussion: [RFC] Arith dialect versions of affine.apply/min/max aiming at de-conflating affine considerations (related to ensuring a value evolves following an affine progression) from the ability to merely represent linear expressions on arbitrary index values in a compact and composable form.

However, I also see:

This suggests the index dialect does not aim at having such higher-level ops that are also subject to target-specific values of pointer width. We would need another dialect on top of that to represent such concepts; is this a correct assumption?

This was also my question, with a slightly different focus: do we want the index dialect to be “only simple operations on unknown-width integer types” or do we want it to be “operations necessary to do address/size computations”? IMO, the latter would be a better justification for the dialect to exist and not be seen as yet another copy of arithmetic ops (in addition to arith, llvm and some) but at the cost of greater complexity of the dialect itself (should we go as far as including datalayout here?).

SCF ops canonicalization depends on several other arith dialect ops like arith.select, arith.xori, arith.andi – none of these are in the list of @Mogball. If removing the SCF → Arith dependency is a key motivation, then the list will have to be complete.

2 Likes

+1 “Make everything as simple as possible, but not simpler.”

Can you elaborate on the transformations part?
I see foldings but what about other transforms and canonicalizations?
E.g. are we going to go towards supporting a subset of existing affine canonicalizations but on a lower-level IR or something more powerful?

Can you elaborate on the intended use cases for the dialect (beyond “don’t grossly miscompile”); in particular how is it meant to compose with other dialects?

One topic I am particularly interested in (related to @bondhugula’s point above):

  • is this expected to be the one true dialect that provides all loop-like constructs all the necessary operations on index quantities, in the fullness of time ? (in which case reviewing existing use cases and evolving/deprecating them will be necessary).
  • if not, how should one think about extensions such as the ones we previously identified (linked above)?