[RFC][arith] Should we support scalar <--> vector `arith.bitcast`s?

Hi folks,

I’m wondering if arith.bitcast should support non-elementwise casts, e.g., i64 to vector<2xi32>, vector<2xi32> to i64, or even vector<2xi32> to vector<1xi64>. IE allow for bitcasts between types with matching statically known (total) bitwidth.

Currently, arith.bitcast is an elementwise op that supports scalar-to-scalar, vector-to-vector, and tensor-to-tensor casts, as long us the shapes match and the underlying element types have the same bitwidths. This does not allow me to perform i2N to vector<2xiN> and vector<2xiN> to i2N casts in the wide integer emulation pass without using the bitcast ops from the LLVM or SPIR-V dialects.

Are there any reasons why we would not want to allow for such casts with arith.bticast? If dual elementwise vs. non-elementwise semantics would be confusing, I think we could either introduce something like arith.elementwise_cast or punt them to the vector and tensor dialects. The vector dialect already supports bitcasting.

Having thought about this a little bit more, another option could be to keep arith.bticast elementwise and instead extend vector.bitcast to accepts scalars. Currently vector.bitcast requires the same bitwidth and ranks, but perhaps we could relax this a bit and allow for 0- and 1-D vector-scalar bitcasts as well?

How would this work for nD vectors? There is on guarantee that they are bit-contiguous AFAIK.

Another thing to consider: on some HW f32 and vector<f32> can be executed on completely different parts of the HW and is much more intrusive than a simple bitcast.
Allowing bitcast to mix scalar and vector may be undesirable in that context, I think @_sean_silva made that point a while back?

I think arith.bitcast is meant to be a “reinterpret cast”. Crossing vector/scalar often requires an instruction to move from one register file to another. Sometimes it is actually impossible (I’ve worked on hardware where vector → scalar was not allowed without tediously going through memory).

1 Like

I would think that you’d either have to first bitcast to 0/1-D and then follow up with a vector.bitcast/vector.shapecast, or allow it as well.

Thanks @_sean_silva and @nicolasvasilache. This is surprising to me since vector.bitcast already allows for changing the type and size of the last dim, while llvm.bitcast and SPIR-V’s OpBitcast allows scalar → vector. What would make the scalar → vector case more complicated than vector<iKN> to vector<KxiN>? Crossing the scalar / vector register boundry?

I would rather add an operation to the vector dialect or overload vector.bitcast to do this conversion. But +1 on adding this since it’s something that’s been missing.

On many architectures the thing that is constant is the number of lanes (processing elements) rather than the overall bit width of the operand, so e.g. bitcast vector<16xi16> → vector<8xi32> would not even make sense on such architectures. But e.g. zext vector<16xi16> -> vector<16xi32> would.

Just providing the hardware perspective here, not saying the op has to be redesigned.

I think people usually think of bitcast as “free”, but often these register-file-crossing instructions can be quite expensive (on some archs scalar->vector is “free”, but not on all).

Overall, I think it depends on what we are trying to model. Probably being consistent with existing software abstractions that we lower to is the best path, but if we design our own, an understanding of the underlying hardware might be useful.

In this context, the existing arith.bitcast appears free while the existing vector.bitcast does not. Are there any reasons we would like not to extend vector.bitcst to operate on scalars?

Just my 2c, but it seems that there should be a broadcast %scalar : scalarT -> vector<8xscalarT> operation. (king of like “linalg.fill”).

From a low level perspective, I think it would make sense to delegate the scalar/vector boundary crossing responsibilities to only one or two instructions, such as vector.insertelement and vector.extractelement, and separate that concern from bitcasting or any other transformation. However, we already have other operations that are crossing the scalar/vector boundary, such as vector.broadcast. I think at the Vector level of abstraction, which is higher than LLVM-IR, it makes sense to couple the scalar/vector boundary crossing with other transformations in favor of having higher level operations that make pattern matching and other vector transformations easier.

1 Like