The index type is used to represent dimension sizes (memref.dim and tensor.dim) as well as pointers and strides (memref.extract_aligned_pointer_as_index and other metadata ops).
Lowering the index type (to LLVM or something else) seems to then enforce that all index types are treated equal.
This would then make the following problematic:
One could want to lower index that represent pointers into different bit widths depending on memref address space.
Same as above for memref.dim or values of type index that are involved in indexing expressions: one may want to optimize indexing calculations by using i32 if they know the constraint can be enforced. This could be done for all buffers or based on buffer address space.
A couple projects I have seen (that I don’t work on, so apologies if I got this wrong) use index purely for dimension sizes or similar and don’t use it for a pointer, allowing them to lower index to 32bit even when target uses 64bit pointers:
Triton seem to be forcing index bitwidth to be 32 in order to use 32bit integers for offsets and shapes even though one of their supported targets is nvptx64 .
Similarly, IREE VM dialect seems to allow index to lowered to 32 or 64, since index is only used for dimension sizes and similar.
In both cases, they have a different type for pointers. If they started to use ops like memref.extract_aligned_pointer_as_index, then that might break their workflow.
So my question is:
Should IndexType come to be parameterized like (IntegerType)? If you could attach an optional attribute (e.g. index<some_attr>), then you could represent more specific classes of unknown bitwidth integers that all have the same bitwidth.
One can lower the type differently within a given context (having a pass that converts explicitly during lowering) I think the type semantics allows that. For me I think of it as a dim or stride (e.g., usage of indexing into shaped type). So one can in the lowering even change the lower level type depending on the shapes type sizes being indexed into, now none of the lowerings you mentioned does that. Usage as a pointer seems a bit different, even in systems with small sized tensors one can have large memory spaces so one couldn’t use range analysis to determine a size alone, one would need target information too.
So I see the type more as a high level indication of a index computation, if one knew the size, one would just use the lower level type with exact integer width. I’d guess there may be some gaps wrt what ops allow here (allowing index only vs i32) and is the other need to be able to see what is index computation related?
Dialect conversion is setup with a type converter and an invariant is that for a given type it’ll always provide the same answer. There is “context” here.
We can work around the problem above when we want a different size between “device” and “host” and we’ll use dialect conversion configured differently on the GPU kernel and on the host side for example. However this isn’t enough: inside a kernel we may want to use i64 indexing for the global memory accesses and i32 for shared memory. We can also imagine some more annotation on the type (memref or similar) that switch per-buffer the indexing size requirements.
The infra does not help with this right now, I take the idea of index<some_attr> as a way to attach “tags” to a given index uses that acts as “hints” on the “class” it belongs to. Up to the type conversion to make use of it.
It’s not clear to me how intrusive this approach would be and how it would compose (or badly break composition) putting things together. Without some sketches of end-to-end flow with mixed-precision indexing it’ll be hard to figure I think.
Right, to clarify, I meant index<attr> would be something like index<"space1">, not index<i64> – you don’t know the bitwidth, but you want to enforce that all index<"space1"> really are the same type but may be different from index<"space2"> or just index.
I can give it a try over next couple months, but in the meantime, I’m wondering if we should just re-look the memref.extract_aligned_pointer_as_index op. memref.extract_aligned_pointer_as_index recently started to be used in the new buffer deallocation passes. It looks like the only other use in upstream dialects/passes is in SparseTensorConversion. It introduces a notion of index <-> pointer that otherwise wouldn’t be present. The documentation says
This operation is useful for lowering to lower-level dialects while still
avoiding the need to define a pointer type in higher-level dialects
such as the memref dialect.
I’m wondering if downstream users actually do arithmetic on the result or if index was just used to mean “unknown bit width integer”. In either case, we could define an intptr type. This would allow existing conversions like memref|arith-to-llvm to lower index to an integer with width smaller than pointer bitwidth. Seems like a much more straightforward change than adding a parameter to index.
We should think carefully about what index is supposed to represent. It has originally been a type usable in affine forms, but ended up being repurposed for many other things. If we think that index is the equivalent of intptr_t or generally a type for address computations, it would make sense to add an “address space” attribute to it IMO.
I think there’s good evidence covered above that index should not be equivalent to intptr_t. It’s unnecessarily constraining. What was already discussed above is for optimizations where one might want index arithmetic to lower to a size smaller than pointer bit-width. But I think there are some similar discussions in greater LLVM community that also lend evidence:
From what I understand, there are could be targets that have 128bit fat pointers specified in the data layout as native pointer bitwidth, but they may require only 64 bit indexing.
In C parlance, we probably don’t want to require that sizeof(intptr_t)==sizeof(size_t).
Maybe this op should be memref.extract_aligned_pointer_as_i64 and be lowered to a zext when the pointer size is <64?
The above seems to suggest that we should have an intptr type (perhaps with address space qualifier). That avoids overloading index, which is used everywhere else besides the memref.aligned_ptr_as_index to mean an index or dimensions size.
Then we could change LLVMTypeConverter to have indexBitwidth and pointerBitwidth. Fine to derive both from DataLayout, which is the status quo. But overriding indexBitwidth with size smaller than pointerBitwidth shouldn’t break ability to lower out of index (although the caller must ensure legality). Right now it is broken if you use memref.extract_aligned_ptr_as_index.
Then adding address spaces of intptr or classes of index could come later.
Maybe, but I was proposing an “immediate solution”: it solves the memref.extract_aligned_pointer (for <=64 bits pointers) without introducing a new type (which will requires some deep changes into the Arith dialect I think? Unless we introduce a new dialect dedicated to pointer arithmetic?)
I think a dialect for addresses in general would be useful. LLVM & SPIR-V pointers are too low level and they have many restrictions, for example the address space in LLVM needs to be integer, hence these pointers cannot be used in conjunction with the gpu dialect and gpu address space attributes.