Why do we allow arith.index_cast on memref / tensor?

Surfacing some discusison from discord, we narrowed down to looking at arith.index_cast.
The doc says:

// Index cast can convert between memrefs of signless integers and indices too.
def IndexCastTypeConstraint : TypeConstraint<Or<[
        SignlessIntegerLike.predicate,
        MemRefOf<[AnySignlessInteger, Index]>.predicate]>,
    "signless-integer-like or memref of signless-integer">;

def Arith_IndexCastOp
  : Arith_CastOp<"index_cast", IndexCastTypeConstraint, IndexCastTypeConstraint,
                 [DeclareOpInterfaceMethods<InferIntRangeInterface>]> {
  let summary = "cast between index and integer types";
  let description = [{
    Casts between scalar or vector integers and corresponding 'index' scalar or
    vectors. Index is an integer of platform-specific bit width. If casting to
    a wider integer, the value is sign-extended. If casting to a narrower
    integer, the value is truncated.
  }];

  let hasFolder = 1;
  let hasCanonicalizer = 1;
}

I believe the memref combined with

If casting to a wider integer, the value is sign-extended. 
If casting to a narrower integer, the value is truncated.

and/or vector will create nasty data layout bugs.

I didn’t do the spelunking to see why/what/when this was introduced and the blame points at a 1+y move when splitting std, so I would rather ask here: is anyone relying on this for something load-bearing ?

@Mogball @krzysz00 who touched this last, maybe ?

I don’t know why this is allowed, but index_cast can work on vectors without data layout problems? You just have to extend/truncate every individual element?

yes, the issue is if anyone every tries to put that in a memref.

Seems like a misinterpretation of the insufficiently tight verifier by the huge dialect-splitting change ⚙ D110797 [MLIR] Replace std ops with arith dialect ops. The original verifier was accepting any ShapedType, which could accidentally cover memref, but it should not.

The verifier relaxation was introduced here ⚙ D81611 Updated IndexCastOp to support tensor<index> cast. Given the description, it only intended to work for tensors, but for some reason added support for all shaped types. We should tighten it back to tensor as intended. One can make a case to support vectors as well.

Vectors are fine, it’s not a “reinterpret cast” of a vector register, but an operation that produces a new value of a different vector type. Same as vector.broadcast, for example.

Correct me if I’m wrong, but I feel like bufferization will still cause an issue for this op. If tensors are supported, but memrefs aren’t, then the current arith bufferization is wrong because it will just convert tensor to memref.

I believe there was some use of index_cast on memref somewhere that caused me to loosen the requirement to allow memref types. But I can’t remember where that was. I agree that this is a bad trap and we should tighten the requirement.

@mogball, does this intercept your work on the index dialect or do you consider this to be “out of scope” ?

memref<...xT> where the bitsize of T can change is a huge footgun.
OTOH maybe you guys don’t care about memref at all and prefer to think about ptr<T> which would have quite different tradeoffs.

Any signal you can share on the topic?

The bufferization for this one would definitely have to make a copy if the datalayout can’t be proven to be 1-1.

I’d expect %y = arith.index_cast %x : memref<...xT> to memref<...xU> to be exactly

for %index in indices(%x) {
  %load = memref.load %x[%index] : memref<...xT>
  %cast = arith.cast %load : T to U
  memref.store %y[%index] : memref<...xU>
}

You could optimize this down to reinterpret_cast if we know for sure that sizeof(T) == sizeof(U), but we don’t know that in general for index since its sizeof() isn’t known until someone specifies one in a lowering.

But I wouldn’t be surprised if arith.index_cast on memrefs always had the “make a copy” semantics and I’d expect a sufficiently reasonable backend to optimize out those copies if I was really worried about them.

Something has to allocate %y.

Looking deeper into ⚙ D81611 Updated IndexCastOp to support tensor<index> cast, I see that we do a “cast compatibility check” on the elemental type during verification. This could be an opportunity to involve a deeper DataLayout verification and tighten the op once and for all + proper doc.

We have no access to DataLayoutAnalysis in verification. I proposed that in Connecting op verifiers to analyses, but there was pushback originally, and nobody implemented this after there was more consensus.

Isn’t that what bufferization is for?